Working as Expected (Jul 13 2009) |
![]() |
| log in |
Message boards : Technical News : Working as Expected (Jul 13 2009)
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next
| Author | Message |
|---|---|
Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries. So after the delay BOINC would try x uploads, then when they fail x more, and so on. That sounds exactly like what happens right now when a frustrated user hits "Retry Now" for all his umpteen pending uploads. Not an improvement, in my opinion. ____________ | |
| ID: 918449 · | |
Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries. Just to clarify...(using the default setting of two) Yes ... ALL would be ready to attempt upload at the same time. IF the first 2 attempting upload failed, they would all back off again. or IF the first two succeeded, two more would immediately attempt upload... continueing until all uploaded ( or one failed, which would initiate a new back off) ____________ Flying high with Team Sicituradastra. | |
| ID: 918451 · | |
|
I have a simple mind and I am not following this discussion too well ;-\ | |
| ID: 918452 · | |
I have a simple mind and I am not following this discussion too well ;-\ Wecome to the message boards. Uploads are getting through, although it's still a bit patchy. No need to suspend processing,things should be improving with time. ____________ Flying high with Team Sicituradastra. | |
| ID: 918453 · | |
|
I have problem: | |
| ID: 918455 · | |
I have problem: Wecome to the message boards. Your work is slowly getting through, 16 tasks have uploaded and reported today (UTC time) After a few more have uploaded you will get some new work. ____________ Flying high with Team Sicituradastra. | |
| ID: 918459 · | |
But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive. Why would they need to be decompressed on Bruno? As zip they take up less space on the disk as well as in throughput. I would expect that once you really need the results that you can import them on whatever server you're using to look at them and decompress them there. Or is that too simple? ____________ Jord - BOINC FAQ Service - BOINC User Wiki Real is just a matter of perception. | |
| ID: 918460 · | |
|
you all see those short time work units? they take only a few minutes for the cpu or gpu to complete. why not stack those shorties into a big zipped file and send one single to a user. it would reduce the number of simultaneous connections to the server AND client if you had 40 or 50 work units in 1 large compressed file. when the client is done with the work units they can compress them back into one file and send it back to the server. | |
| ID: 918468 · | |
One thought I have had..... We were just talking about that last night in the panic thread here. Seems that something like that might be comming. :D ____________ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today! | |
| ID: 918474 · | |
But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive. The advantage of decompressing them on Bruno is that they end up in exactly the same place as they would have done under the existing upload handler: no change is required to all the cross-mounted complexity of the SETI file system. And the files are available immediately and individually: you don't have to teach the validator how to go fossicking around in any number of zip archives when it needs a file. And that also makes the whole system reversible: if something goes wrong with the remote concentrator, use DNS to point all of us back to Bruno. Gets sticky again, of course, but lets the project limp along until the concentrator is revived. | |
| ID: 918475 · | |
One thought I have had..... This idea has already been checked in by the Boinc developer. It will probably be incorporated in the next version released for testing. As I understand it, this was tried a couple of years ago with some negative effect that will need to be looked at again. EDIT:Ooops, should have read the top of the thread before replying. | |
| ID: 918484 · | |
We were just talking about that last night in the panic thread here. Seems that something like that might be comming. :D Thanks HAL9000, I had not seen that thread yet. Looks like may be good news. :) ____________ Flying high with Team Sicituradastra. | |
| ID: 918485 · | |
|
Since CUDA was introduced into the project, the Boinc servers have been taking on an ever increasing load, as people populate there spare PCIE slots with extra GPU's adding an extra 112 or more cores each time they do that, your bandwidth woes will only increase exponentially. | |
| ID: 918488 · | |
you don't have to teach the validator how to go fossicking around in any number of zip archives when it needs a file. Install Windows XP or above to do the looking into that archive. ;-) Just kidding! I forgot about the validator needing to be able to check the contents. Smacks head. ____________ Jord - BOINC FAQ Service - BOINC User Wiki Real is just a matter of perception. | |
| ID: 918489 · | |
|
Matt mentioned that they are starting to run low on work. I wonder how much of the old tapes are available to run Astropulse. Obviously the regular S@H has been run on the old (1999-start of astropulse) tapes but there are years of tapes to run astropulse on. Does this have to do with the RFI and other factors previously mentioned that they aren't available? | |
| ID: 918490 · | |
Matt mentioned that they are starting to run low on work. I wonder how much of the old tapes are available to run Astropulse. Obviously the regular S@H has been run on the old (1999-start of astropulse) tapes but there are years of tapes to run astropulse on. Does this have to do with the RFI and other factors previously mentioned that they aren't available? Yes, it does have to do with the RFI/radar. A while back, the recorder was set up so that one of the 14 channels of data holds the "chirping" of the radar, and the remaining channels have the data that gets cut into WUs. That's what I think I remember reading a long time ago. Actually, what I remember reading is that we were only using 12 channels at the time, so there were two free channels left, so one was used for the radar chirping, and the other was still available for future use. No idea where to even try to find that reference now. But at any rate, before that 13th channel was used for radar chirping, there is no way of knowing where the chirps actually are, and that's where the software radar blanker that Matt has been working on comes into play. Once he gets that up and running, it can pre-process the older tapes, find where it thinks the radar is, and fill that 13th channel with the chirps so the splitters can do what they normally do. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 918493 · | |
It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error. It's possible for the scheduler to ignore reports if the file wasn't uploaded yet; and the client would just keep them queued for a while longer and try reporting them later. This can be done for individual workunits (there is a separate 'ack' for each which the client must receive before it gets rid of the task locally). ____________ Contribute to the Wiki! | |
| ID: 918523 · | |
Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff. Already implemented, see http://boinc.berkeley.edu/trac/changeset/18593. There hasn't yet been a new version release since then, though. ____________ Contribute to the Wiki! | |
| ID: 918525 · | |
|
You know, some people had pointed that out already in this same thread... ;-) | |
| ID: 918528 · | |
Would an set of remote upload servers as "data aggregators" work? It is an interesting idea. The biggest single issue as I see it: Work is uploaded, and the moment the upload completes it is available for processing on the upload server. Then, the result is reported. At this point, it is marked in the database as received, and subject to validation. The validator doesn't have to check to see if the result is in local storage, because it is in local storage by definition. This change means you have a new state: reported but not in local storage. BOINC would have to know about that, and have some way of dealing with it (rescanning the database and checking to see if the result is actually here), probably by making the "unzip" process on the upload server report. There is also a chance that the result gets lost between the off-site server and the "true" upload server. I like the idea of doing just one, near Berkeley. What I'm not sure about: the change that Eric made to shorten the "pending connection" queue suggests that the number of simultaneous connections is a big issue, this just moves that issue from the upload server to the server near the edge. ... but, a better idea (related to the thread which I haven't worked my way through) might be to zip all of the pending uploads into one file. All the client really needs to know is what is in the zip -- then let that go to Bruno. The downside is that you have to push all of the work through in one session, and the bigger the .zip file the more bytes/packets you have to push through in a row.... ____________ | |
| ID: 918557 · | |
Message boards : Technical News : Working as Expected (Jul 13 2009)
| Copyright © 2013 University of California |