Message boards :
Number crunching :
Unable to Upload again
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
mjmcg Send message Joined: 1 Nov 03 Posts: 7 Credit: 54,686 RAC: 0 |
I have been noticing over the last few weeks on an increasing level that my completed WU's won't upload until they're damned good and ready either. I can download new ones when it needs / wants them, but upon completion all they do is sit in the transfers tab and keep "retying". I have restarted both the application and the system(s)as suggested to no avail. The system restarts and the app restarts and there sit the completed work units: "uploading...retry in xx:xx:xx". Wasn't there 100,000 machines trying to upload work a couple weeks ago too? why no bottle necks then? Plus, wouldn't it stand to reason that if there were 100,000 systems uploading there is also 100,000 downloading? Why no problems getting work, just sending it back? The retry feature is flawed. Sometimes you see it delays the next attempt by a few minutes and sometimes it's hours. When it delays by 3 hours, my system completes 3 more work units and now I have 4 sitting here going nowhere. No I have 4 units delayed by a couple hours and meanwhile 2 more complete and join the bunch all happily sitting here going nowhere. My point is that while yeah they will eventually go, I had an AP unit that was 'lost' after 300+ hours of time being crunched on my system that went over the report deadline. I never saw credit for it. After having watched that thing progress day in and day out I was pretty pissed when it just fizzled into nowhere. I also don't get what the program is for assigning or distributing work to systems. I have 2 system active, an AMD dual core 7750 and an INtel server with 2x 2.4 Xeons (4 cores) however due to it's speed the AMD can crunch circles around the Intel despite having 2 less cores, yet BOINC downloads craploads more WU's to the server than the one thats actually hammering them out like mad. I got 3 AP work units all at once on the server and it took it 300+/- hours for each whereas the AMD system most likely would have knocked them out in half that time, but I get no AP's for that system..... |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Wasn't there 100,000 machines trying to upload work a couple weeks ago too? why no bottle necks then? Plus, wouldn't it stand to reason that if there were 100,000 systems uploading there is also 100,000 downloading? Why no problems getting work, just sending it back? It's not just 100,000 machines, but 100,000 machines times the number of work units each one has to upload. If you're interested, there is a lot of good reading in "Computer Networks" by Andrew Tannenbaum, including information on random backoffs. From where I'm sitting, the back-off should be a lot longer -- so it spreads the load. Spread out the load, get more successful uploads with less contention, and throughput goes up. Trouble is, the average user doesn't get "speeding up by slowing down" and instead wants to hit "Retry Now" which just makes the load worse. There exists some rate of uploads per second where nearly all attempts are successful. When you go past that value, efficiency drops like a narcoleptic hailing a cab. Uploading completed results and assigning/downloading are related, but today's new work is tomorrows completed work. They don't necessarily track 100%. If BOINC gets too far behind on uploads, it will stop requesting work (because adding more results to upload just makes the problem worse). |
MartinBen Send message Joined: 16 May 99 Posts: 20 Credit: 1,594,164 RAC: 0 |
Having been a member of Seti since May 1999 I consider myself to be a grizzled veteran however even I am beginning to loose patience with the Seti project with constant problems uploading/downloading and servers going down. Now I'm hoping this is just a "blip" but after 2 weeks of problems I am switching my machines over to other Boinc projects until they get themselves sorted out. |
Vistro Send message Joined: 6 Aug 08 Posts: 233 Credit: 316,549 RAC: 0 |
All of this arguing.... so hard to sort through! I (maybe some others too) just need the following questions answered: 1. We are not able to upload. We know this. We don't know the intricate parts of this amazing system, however, so, please tell us where in the line these issues are occurring. 2. If we blow through our caches, and can't upload any of it, can we still get more work, or is the server expecting that work back? 2a. If we can't get more work, is it like a "No seconds on meat until you finish your veggies!" kind of thing, or is it like a "10 dollars?! What happened to the 10 dollars I have you yesterday?!" kind of thing? 3. How many times will BOINC retry the uploads before it finally throws in the towel? 4. Where can we see the server status pertaining to things like "results waiting to upload", "upload server stress/load", etc? 5. Recently, I have seen messages in red saying "Project has no work available". Are the two problems related? How so? 6. For as many of these as you can, please give us a simple analogy, so that it's easier to understand, for those of us who are not very technologically inclined. 7. More eggs. 8. More bacon. 9. Thank you. (Thanks for tolerating my questions) |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I'm still seeing RED on this! Boinc....Boinc....Boinc....Boinc.... |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Having been a member of Seti since May 1999 I consider myself to be a grizzled veteran however even I am beginning to loose patience with the Seti project with constant problems uploading/downloading and servers going down. I came in near the tail-end of SETI Classic's shutdown, but I'm told by other veterans to the project that Classic had plenty of server downtime for weeks on end - only that most participants didn't notice because workunits took a week for more people to crunch. I do remember a couple server outages before Classic shut down, and I didn't run SETIqueue like many did, so I didn't have a cache of workunits to use while the servers were down. My cruncher would just sit there, idle, waiting for the server to come back up. Sometimes my cruncher would go into standby, which means it couldn't keep trying to contact the servers, and I wouldn't notice this until a week later. I thought patience and wisdom were supposed to grow with age? :) |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I'm still seeing RED on this! Love the new avatar! Looks really cool. |
MartinBen Send message Joined: 16 May 99 Posts: 20 Credit: 1,594,164 RAC: 0 |
I thought patience and wisdom were supposed to grow with age? Well I consider myself to have the patience of Job from the bible but even he would have got fed up by now! |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I thought patience and wisdom were supposed to grow with age? Does that mean I have more patience than Job? :) |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
Well Seti has hit the wall again. So what, its not like other projects dont crash and burn too. Im in it because I thought from day one that it was a great idea. I still do, even through these hard times the past few months. Its in times like these that when i run out of work that i shut off the computer and blow out the dust. Speaking of which is it allowable for me to open up my Mac and blow out the dust bunnies? I have a few WU from the old P4 that wont upload, they will when the can, until then i will be in the cafe waiting for some one to win in Beets give a caption. [/quote] Old James |
Docs Beast Send message Joined: 21 Jul 01 Posts: 2 Credit: 237,400 RAC: 0 |
I take it from reading the posts there this happens quite often? |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I take it from reading the posts there this happens quite often? Not really. It happens from time to time, some issues take longer than others. The one thing you can take away from the posts here is that when it does happen, everyone freaks out about it, and a flurry of posts ensue about what should be done about it. Most often the issues subside while we're discussing and everyone goes back into relax mode. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
All of this arguing.... so hard to sort through! 1) The upload server is a unique server (not sure if there are one or two), the download server is a unique server (or perhaps two) and the scheduler is a unique server. Problems on one don't always affect the other two. 2) At some point, BOINC will stop trying to get work until uploads start flowing: this is to keep the upload queues from growing without bound. 2a) There is a limit -- at some point BOINC will stop getting more work because it knows that there is a problem. As work uploads (and it is uploading) it will go back to getting new work -- it won't wait for all of them to be uploaded. 3) I don't know the exact number, but it's fairly long (more than a week). Someone else will correct me, because I'm sure I'm wrong. 4) On the home page, server status. Note in particular the "received in last hour" number because that shows the ongoing flow. 5) Indirectly. I think we've got a run of "short" multibeam, which increases the load on everything. This also helps spread out the uploads a little bit. 6) I saw a poster a long time ago, and I'd love to find it. It showed a wide herd of sheep crossing a narrow stone country bridge -- it was a good visual. L.A. Freeways on a holiday weekend might work: the freeways are "up" but the cars are barely moving because there are so many headed out to the river, or to go camping. Incoming lanes are moving fine because most people are headed out of town. That'll be different on Monday night when they all try to come back. 7) Don't forget your cholesterol.... 8) Help yourself, there is plenty more where that came from. 9) My pleasure. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
The damn dam broke here and I completed all my uploads and am now requesting more work. So.....I am no longer seeing red. Boinc....Boinc....Boinc....Boinc.... |
Vistro Send message Joined: 6 Aug 08 Posts: 233 Credit: 316,549 RAC: 0 |
A few more: 1. What started this influx of uploads? 2. How long has it been going on? 3. What can cure this? 4. WHO WILL SAVE US?! 5. Will it happen again? 6. And what will happen to Dr. Smith?! |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
A few more: 1. The regularly scheduled Tuesday outage from 3 months ago. (The recovery period tends to last until the following Monday.) 2. Longer than I can remember but my brain is 62 years old. 3. A bigger pipe at Berkeley but it costs MONEY. 4. I don't know but we are all looking at Matt. 5. You can be certain of it. 6. Who? Boinc....Boinc....Boinc....Boinc.... |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
The damn dam broke here and I completed all my uploads and am now requesting more work. So.....I am no longer seeing red. In the words of Valentine Michael Smith: waiting is. |
Vistro Send message Joined: 6 Aug 08 Posts: 233 Credit: 316,549 RAC: 0 |
So we have been seeing this for three months? And the only thing that can make sure it does not happen again is a larger pipe to the server? How much money would be needed for a big bandwidth increase? |
Westsail and *Pyxey* Send message Joined: 26 Jul 99 Posts: 338 Credit: 20,544,999 RAC: 0 |
Last quote I saw to pull 1gb fiber to the server closet; somewhere north of 100k. "The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov |
Vistro Send message Joined: 6 Aug 08 Posts: 233 Credit: 316,549 RAC: 0 |
Yeah, that's not gonna happen at least until the economy improves. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.