Message boards :
Technical News :
Working as Expected (Jul 13 2009)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next
Author | Message |
---|---|
Dave Send message Joined: 13 Jul 09 Posts: 1 Credit: 5,218 RAC: 0 |
I have a simple mind and I am not following this discussion too well ;-\ What is the estimate for me to be able to upload completed work units?? Or should I suspend processing until uploads are again functioning?? Many thanks, Dave G. "Per Ardua, ad Astra"[/img] |
Virtual Boss* Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0 |
I have a simple mind and I am not following this discussion too well ;-\ Wecome to the message boards. Uploads are getting through, although it's still a bit patchy. No need to suspend processing,things should be improving with time. Flying high with Team Sicituradastra. |
lobozmarcin Send message Joined: 19 Jul 02 Posts: 1 Credit: 1,421 RAC: 0 |
I have problem: 2009-07-16 15:27:18|SETI@home|Backing off 3 hr 39 min 10 sec on upload of 01dc08ad.14412.20113.8.8.66_1_0 2009-07-16 15:27:19||Internet access OK - project servers may be temporarily down. 2009-07-16 15:27:36|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks 2009-07-16 15:27:41|SETI@home|Scheduler request completed: got 0 new tasks 3 days this same problem |
Virtual Boss* Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0 |
I have problem: Wecome to the message boards. Your work is slowly getting through, 16 tasks have uploaded and reported today (UTC time) After a few more have uploaded you will get some new work. Flying high with Team Sicituradastra. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive. Why would they need to be decompressed on Bruno? As zip they take up less space on the disk as well as in throughput. I would expect that once you really need the results that you can import them on whatever server you're using to look at them and decompress them there. Or is that too simple? |
CryptokiD Send message Joined: 2 Dec 00 Posts: 150 Credit: 3,216,632 RAC: 0 |
you all see those short time work units? they take only a few minutes for the cpu or gpu to complete. why not stack those shorties into a big zipped file and send one single to a user. it would reduce the number of simultaneous connections to the server AND client if you had 40 or 50 work units in 1 large compressed file. when the client is done with the work units they can compress them back into one file and send it back to the server. if a client got bored and only managed to crunch half the work units, it could still compress the half it finished into one file and send that. when the file gets to the server it would decompress the file into the invidivual work units, and count and file away the ones that made it back, and re issue work units for the ones which the client did not complete or error ed out. this would greatly reduce the amount of hammering the seti servers see from clients begging to upload or download. instead of seeing thousands of little files comming and going, we would see hundreds of large files which means reduced inbound and outbound connections. the 100mbit pipe would still get saturated to capacity, but at least the number of connections would decrease. you could even go so far as to limit the number of downloads an individual client is allowed per day. set it at for example, 2. only twice a day can a client request a new compressed stack of work units, and only if he has sent the previous ones back already. the boinc app only lets you download up to 100 wu's a day when you first join. why not compress those 100 into 1 file, send it out and hope you get some of it back a week later. and just like the current boinc app, if a client shows that it can handle 100 a day, then the number could gradually increase. again, it wouldnt solve the bandwidth issue, but it would greatly reduce the number of connection attempts, which are in and of themselve bandwidth hogs. on the computers on my account alone they are trying to upload a completed work unit every few seconds. this could be reduced to 2x per computer per day. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
One thought I have had..... We were just talking about that last night in the panic thread here. Seems that something like that might be comming. :D SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive. The advantage of decompressing them on Bruno is that they end up in exactly the same place as they would have done under the existing upload handler: no change is required to all the cross-mounted complexity of the SETI file system. And the files are available immediately and individually: you don't have to teach the validator how to go fossicking around in any number of zip archives when it needs a file. And that also makes the whole system reversible: if something goes wrong with the remote concentrator, use DNS to point all of us back to Bruno. Gets sticky again, of course, but lets the project limp along until the concentrator is revived. |
Aurora Borealis Send message Joined: 14 Jan 01 Posts: 3075 Credit: 5,631,463 RAC: 0 |
One thought I have had..... This idea has already been checked in by the Boinc developer. It will probably be incorporated in the next version released for testing. As I understand it, this was tried a couple of years ago with some negative effect that will need to be looked at again. EDIT:Ooops, should have read the top of the thread before replying. |
Virtual Boss* Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0 |
We were just talking about that last night in the panic thread here. Seems that something like that might be comming. :D Thanks HAL9000, I had not seen that thread yet. Looks like may be good news. :) Flying high with Team Sicituradastra. |
Anthony Liggins Send message Joined: 23 Aug 99 Posts: 14 Credit: 609,816 RAC: 0 |
Since CUDA was introduced into the project, the Boinc servers have been taking on an ever increasing load, as people populate there spare PCIE slots with extra GPU's adding an extra 112 or more cores each time they do that, your bandwidth woes will only increase exponentially. Seti@home has become a victim of it’s own success where CUDA is concerned, the best thing to do here is to limit the amount each GPU can download each day through the web interface, cutting it by one third or one half will free up a good portion of bandwidth. This will also decrease the load on the backend as you will not need to create so many multibeam Wu's, increasing the chirp rate will affect the slower CPU’s far more than GPU’s. :-( I have been browsing stats, and looking at computers attached to Boinc, and I have notice fellow participants who have between 1500 to 5000 wu's downloaded onto there pc's, I would consider this somewhat excessive, this is why I am making this suggestion. Once implemented you should notice a difference within 24 hours, then hopefully people will not feel so frustrated when trying to upload there finished Wu's. This then will limit the amount of people going red in the face and blowing off steam on this forum, well maybe until the next managed emergency comes along, participants need to remember science does not hurt anyone if it is running late. Anthony. A 10 year veteran of Seti@home :-) |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
you don't have to teach the validator how to go fossicking around in any number of zip archives when it needs a file. Install Windows XP or above to do the looking into that archive. ;-) Just kidding! I forgot about the validator needing to be able to check the contents. Smacks head. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
Matt mentioned that they are starting to run low on work. I wonder how much of the old tapes are available to run Astropulse. Obviously the regular S@H has been run on the old (1999-start of astropulse) tapes but there are years of tapes to run astropulse on. Does this have to do with the RFI and other factors previously mentioned that they aren't available? In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Matt mentioned that they are starting to run low on work. I wonder how much of the old tapes are available to run Astropulse. Obviously the regular S@H has been run on the old (1999-start of astropulse) tapes but there are years of tapes to run astropulse on. Does this have to do with the RFI and other factors previously mentioned that they aren't available? Yes, it does have to do with the RFI/radar. A while back, the recorder was set up so that one of the 14 channels of data holds the "chirping" of the radar, and the remaining channels have the data that gets cut into WUs. That's what I think I remember reading a long time ago. Actually, what I remember reading is that we were only using 12 channels at the time, so there were two free channels left, so one was used for the radar chirping, and the other was still available for future use. No idea where to even try to find that reference now. But at any rate, before that 13th channel was used for radar chirping, there is no way of knowing where the chirps actually are, and that's where the software radar blanker that Matt has been working on comes into play. Once he gets that up and running, it can pre-process the older tapes, find where it thinks the radar is, and fill that 13th channel with the chirps so the splitters can do what they normally do. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Nicolas Send message Joined: 30 Mar 05 Posts: 161 Credit: 12,985 RAC: 0 |
It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error. It's possible for the scheduler to ignore reports if the file wasn't uploaded yet; and the client would just keep them queued for a while longer and try reporting them later. This can be done for individual workunits (there is a separate 'ack' for each which the client must receive before it gets rid of the task locally). Contribute to the Wiki! |
Nicolas Send message Joined: 30 Mar 05 Posts: 161 Credit: 12,985 RAC: 0 |
Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff. Already implemented, see http://boinc.berkeley.edu/trac/changeset/18593. There hasn't yet been a new version release since then, though. Contribute to the Wiki! |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
You know, some people had pointed that out already in this same thread... ;-) |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Would an set of remote upload servers as "data aggregators" work? It is an interesting idea. The biggest single issue as I see it: Work is uploaded, and the moment the upload completes it is available for processing on the upload server. Then, the result is reported. At this point, it is marked in the database as received, and subject to validation. The validator doesn't have to check to see if the result is in local storage, because it is in local storage by definition. This change means you have a new state: reported but not in local storage. BOINC would have to know about that, and have some way of dealing with it (rescanning the database and checking to see if the result is actually here), probably by making the "unzip" process on the upload server report. There is also a chance that the result gets lost between the off-site server and the "true" upload server. I like the idea of doing just one, near Berkeley. What I'm not sure about: the change that Eric made to shorten the "pending connection" queue suggests that the number of simultaneous connections is a big issue, this just moves that issue from the upload server to the server near the edge. ... but, a better idea (related to the thread which I haven't worked my way through) might be to zip all of the pending uploads into one file. All the client really needs to know is what is in the zip -- then let that go to Bruno. The downside is that you have to push all of the work through in one session, and the bigger the .zip file the more bytes/packets you have to push through in a row.... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
What I'm not sure about: the change that Eric made to shorten the "pending connection" queue suggests that the number of simultaneous connections is a big issue, this just moves that issue from the upload server to the server near the edge. Previous observations, over numerous surges/dips, is that the number of simultaneous connections only becomes a problem when it coincides with an extremely heavy (93+ Mbit, 98% utilisation) download demand. Supposition has been that this is link saturation with protocol packets instead of data packets. If the protocol packets can be intercepted at the bottom of the hill, the theorey is that there's some gain to be had. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
What I'm not sure about: the change that Eric made to shorten the "pending connection" queue suggests that the number of simultaneous connections is a big issue, this just moves that issue from the upload server to the server near the edge. The interesting thing that we saw when Eric made his change was a sudden, dramatic increase in bandwidth used, from somewhere around 40 megabits to something near 90 megabits -- Eric said "tripled." In other words, we were under 50% utilization when the servers were flooded with queued connections. I'm not really disagreeing, I'm just saying that the server out on the edge is going to be subject to all of the problems Bruno faces now -- and be more accessible. One change from your design that I would make: I would try to keep two connections going at speed at all times, so that if one connection stalled for any reason the other could use that bandwidth -- and each time a transfer completes, I'd start making a new .zip file, instead of doing it hourly or somesuch. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.