Unable to Upload again

Message boards : Number crunching : Unable to Upload again
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile mjmcg

Send message
Joined: 1 Nov 03
Posts: 7
Credit: 54,686
RAC: 0
United States
Message 913659 - Posted: 3 Jul 2009, 18:29:50 UTC - in response to Message 913466.  

I have been noticing over the last few weeks on an increasing level that my completed WU's won't upload until they're damned good and ready either. I can download new ones when it needs / wants them, but upon completion all they do is sit in the transfers tab and keep "retying". I have restarted both the application and the system(s)as suggested to no avail. The system restarts and the app restarts and there sit the completed work units: "uploading...retry in xx:xx:xx".
What gives? Either you want the things or you don't...if you don't want them I'll stop wasting my electricity and blowing heat into my room for nothing.


The Retry is a feature of Boinc...

If you cant report for any reason, and there are many possible reasons, it backs off and tries again later.

OK, the last week there has been plenty of that happening, again for various reasons.

Current problem is that there is 100,000 machines all trying to report work and get new stuff. = Overloaded network link

Your machine will back off and try again later if things are too busy, doesn't matter to anyone if your result is returned rignt now, or tomorrow, or even the next day.

The servers dont need to run at 100% availabilty to run the project successfully. Any more than your machine has to run 100% or have a continous internet connnection.

Relax and just let it retry

Ian



Wasn't there 100,000 machines trying to upload work a couple weeks ago too? why no bottle necks then? Plus, wouldn't it stand to reason that if there were 100,000 systems uploading there is also 100,000 downloading? Why no problems getting work, just sending it back?
The retry feature is flawed. Sometimes you see it delays the next attempt by a few minutes and sometimes it's hours. When it delays by 3 hours, my system completes 3 more work units and now I have 4 sitting here going nowhere. No I have 4 units delayed by a couple hours and meanwhile 2 more complete and join the bunch all happily sitting here going nowhere. My point is that while yeah they will eventually go, I had an AP unit that was 'lost' after 300+ hours of time being crunched on my system that went over the report deadline. I never saw credit for it. After having watched that thing progress day in and day out I was pretty pissed when it just fizzled into nowhere.
I also don't get what the program is for assigning or distributing work to systems.
I have 2 system active, an AMD dual core 7750 and an INtel server with 2x 2.4 Xeons (4 cores) however due to it's speed the AMD can crunch circles around the Intel despite having 2 less cores, yet BOINC downloads craploads more WU's to the server than the one thats actually hammering them out like mad. I got 3 AP work units all at once on the server and it took it 300+/- hours for each whereas the AMD system most likely would have knocked them out in half that time, but I get no AP's for that system.....
ID: 913659 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 913667 - Posted: 3 Jul 2009, 19:16:59 UTC - in response to Message 913659.  

Wasn't there 100,000 machines trying to upload work a couple weeks ago too? why no bottle necks then? Plus, wouldn't it stand to reason that if there were 100,000 systems uploading there is also 100,000 downloading? Why no problems getting work, just sending it back?

It's not just 100,000 machines, but 100,000 machines times the number of work units each one has to upload.

If you're interested, there is a lot of good reading in "Computer Networks" by Andrew Tannenbaum, including information on random backoffs. From where I'm sitting, the back-off should be a lot longer -- so it spreads the load. Spread out the load, get more successful uploads with less contention, and throughput goes up.

Trouble is, the average user doesn't get "speeding up by slowing down" and instead wants to hit "Retry Now" which just makes the load worse.

There exists some rate of uploads per second where nearly all attempts are successful. When you go past that value, efficiency drops like a narcoleptic hailing a cab.

Uploading completed results and assigning/downloading are related, but today's new work is tomorrows completed work. They don't necessarily track 100%.

If BOINC gets too far behind on uploads, it will stop requesting work (because adding more results to upload just makes the problem worse).
ID: 913667 · Report as offensive
MartinBen

Send message
Joined: 16 May 99
Posts: 20
Credit: 1,594,164
RAC: 0
United Kingdom
Message 913670 - Posted: 3 Jul 2009, 19:31:46 UTC
Last modified: 3 Jul 2009, 19:32:13 UTC

Having been a member of Seti since May 1999 I consider myself to be a grizzled veteran however even I am beginning to loose patience with the Seti project with constant problems uploading/downloading and servers going down.

Now I'm hoping this is just a "blip" but after 2 weeks of problems I am switching my machines over to other Boinc projects until they get themselves sorted out.
ID: 913670 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913674 - Posted: 3 Jul 2009, 19:53:02 UTC

All of this arguing.... so hard to sort through!

I (maybe some others too) just need the following questions answered:

1. We are not able to upload. We know this. We don't know the intricate parts of this amazing system, however, so, please tell us where in the line these issues are occurring.

2. If we blow through our caches, and can't upload any of it, can we still get more work, or is the server expecting that work back?

2a. If we can't get more work, is it like a "No seconds on meat until you finish your veggies!" kind of thing, or is it like a "10 dollars?! What happened to the 10 dollars I have you yesterday?!" kind of thing?

3. How many times will BOINC retry the uploads before it finally throws in the towel?

4. Where can we see the server status pertaining to things like "results waiting to upload", "upload server stress/load", etc?

5. Recently, I have seen messages in red saying "Project has no work available". Are the two problems related? How so?

6. For as many of these as you can, please give us a simple analogy, so that it's easier to understand, for those of us who are not very technologically inclined.

7. More eggs.

8. More bacon.

9. Thank you.


(Thanks for tolerating my questions)
ID: 913674 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 913677 - Posted: 3 Jul 2009, 19:55:38 UTC

I'm still seeing RED on this!
Boinc....Boinc....Boinc....Boinc....
ID: 913677 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 913678 - Posted: 3 Jul 2009, 19:57:01 UTC - in response to Message 913670.  

Having been a member of Seti since May 1999 I consider myself to be a grizzled veteran however even I am beginning to loose patience with the Seti project with constant problems uploading/downloading and servers going down.


I came in near the tail-end of SETI Classic's shutdown, but I'm told by other veterans to the project that Classic had plenty of server downtime for weeks on end - only that most participants didn't notice because workunits took a week for more people to crunch.

I do remember a couple server outages before Classic shut down, and I didn't run SETIqueue like many did, so I didn't have a cache of workunits to use while the servers were down. My cruncher would just sit there, idle, waiting for the server to come back up. Sometimes my cruncher would go into standby, which means it couldn't keep trying to contact the servers, and I wouldn't notice this until a week later.

I thought patience and wisdom were supposed to grow with age? :)
ID: 913678 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 913679 - Posted: 3 Jul 2009, 19:58:25 UTC - in response to Message 913677.  

I'm still seeing RED on this!


Love the new avatar! Looks really cool.
ID: 913679 · Report as offensive
MartinBen

Send message
Joined: 16 May 99
Posts: 20
Credit: 1,594,164
RAC: 0
United Kingdom
Message 913680 - Posted: 3 Jul 2009, 20:08:31 UTC


I thought patience and wisdom were supposed to grow with age?

Well I consider myself to have the patience of Job from the bible but even he would have got fed up by now!
ID: 913680 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 913683 - Posted: 3 Jul 2009, 20:14:05 UTC - in response to Message 913680.  

I thought patience and wisdom were supposed to grow with age?


Well I consider myself to have the patience of Job from the bible but even he would have got fed up by now!


Does that mean I have more patience than Job? :)
ID: 913683 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 913688 - Posted: 3 Jul 2009, 20:26:12 UTC

Well Seti has hit the wall again. So what, its not like other projects dont crash and burn too. Im in it because I thought from day one that it was a great idea. I still do, even through these hard times the past few months.
Its in times like these that when i run out of work that i shut off the computer and blow out the dust. Speaking of which is it allowable for me to open up my Mac and blow out the dust bunnies?

I have a few WU from the old P4 that wont upload, they will when the can, until then i will be in the cafe waiting for some one to win in Beets give a caption.
[/quote]

Old James
ID: 913688 · Report as offensive
Profile Docs Beast
Volunteer tester

Send message
Joined: 21 Jul 01
Posts: 2
Credit: 237,400
RAC: 0
United States
Message 913692 - Posted: 3 Jul 2009, 20:31:21 UTC

I take it from reading the posts there this happens quite often?

ID: 913692 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 913694 - Posted: 3 Jul 2009, 20:34:03 UTC - in response to Message 913692.  

I take it from reading the posts there this happens quite often?


Not really. It happens from time to time, some issues take longer than others. The one thing you can take away from the posts here is that when it does happen, everyone freaks out about it, and a flurry of posts ensue about what should be done about it. Most often the issues subside while we're discussing and everyone goes back into relax mode.
ID: 913694 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 913697 - Posted: 3 Jul 2009, 20:41:29 UTC - in response to Message 913674.  

All of this arguing.... so hard to sort through!

I (maybe some others too) just need the following questions answered:

1. We are not able to upload. We know this. We don't know the intricate parts of this amazing system, however, so, please tell us where in the line these issues are occurring.

2. If we blow through our caches, and can't upload any of it, can we still get more work, or is the server expecting that work back?

2a. If we can't get more work, is it like a "No seconds on meat until you finish your veggies!" kind of thing, or is it like a "10 dollars?! What happened to the 10 dollars I have you yesterday?!" kind of thing?

3. How many times will BOINC retry the uploads before it finally throws in the towel?

4. Where can we see the server status pertaining to things like "results waiting to upload", "upload server stress/load", etc?

5. Recently, I have seen messages in red saying "Project has no work available". Are the two problems related? How so?

6. For as many of these as you can, please give us a simple analogy, so that it's easier to understand, for those of us who are not very technologically inclined.

7. More eggs.

8. More bacon.

9. Thank you.


(Thanks for tolerating my questions)

1) The upload server is a unique server (not sure if there are one or two), the download server is a unique server (or perhaps two) and the scheduler is a unique server. Problems on one don't always affect the other two.

2) At some point, BOINC will stop trying to get work until uploads start flowing: this is to keep the upload queues from growing without bound.

2a) There is a limit -- at some point BOINC will stop getting more work because it knows that there is a problem. As work uploads (and it is uploading) it will go back to getting new work -- it won't wait for all of them to be uploaded.

3) I don't know the exact number, but it's fairly long (more than a week). Someone else will correct me, because I'm sure I'm wrong.

4) On the home page, server status. Note in particular the "received in last hour" number because that shows the ongoing flow.

5) Indirectly. I think we've got a run of "short" multibeam, which increases the load on everything. This also helps spread out the uploads a little bit.

6) I saw a poster a long time ago, and I'd love to find it. It showed a wide herd of sheep crossing a narrow stone country bridge -- it was a good visual.

L.A. Freeways on a holiday weekend might work: the freeways are "up" but the cars are barely moving because there are so many headed out to the river, or to go camping. Incoming lanes are moving fine because most people are headed out of town. That'll be different on Monday night when they all try to come back.

7) Don't forget your cholesterol....

8) Help yourself, there is plenty more where that came from.

9) My pleasure.
ID: 913697 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 913707 - Posted: 3 Jul 2009, 20:56:55 UTC
Last modified: 3 Jul 2009, 20:58:20 UTC

The damn dam broke here and I completed all my uploads and am now requesting more work. So.....I am no longer seeing red.
Boinc....Boinc....Boinc....Boinc....
ID: 913707 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913709 - Posted: 3 Jul 2009, 21:00:22 UTC

A few more:

1. What started this influx of uploads?

2. How long has it been going on?

3. What can cure this?

4. WHO WILL SAVE US?!

5. Will it happen again?

6. And what will happen to Dr. Smith?!
ID: 913709 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 913716 - Posted: 3 Jul 2009, 21:06:08 UTC - in response to Message 913709.  

A few more:

1. What started this influx of uploads?

2. How long has it been going on?

3. What can cure this?

4. WHO WILL SAVE US?!

5. Will it happen again?

6. And what will happen to Dr. Smith?!


1. The regularly scheduled Tuesday outage from 3 months ago.
(The recovery period tends to last until the following Monday.)

2. Longer than I can remember but my brain is 62 years old.

3. A bigger pipe at Berkeley but it costs MONEY.

4. I don't know but we are all looking at Matt.

5. You can be certain of it.

6. Who?
Boinc....Boinc....Boinc....Boinc....
ID: 913716 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 913717 - Posted: 3 Jul 2009, 21:06:26 UTC - in response to Message 913707.  

The damn dam broke here and I completed all my uploads and am now requesting more work. So.....I am no longer seeing red.

In the words of Valentine Michael Smith: waiting is.
ID: 913717 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913719 - Posted: 3 Jul 2009, 21:09:49 UTC

So we have been seeing this for three months? And the only thing that can make sure it does not happen again is a larger pipe to the server?

How much money would be needed for a big bandwidth increase?
ID: 913719 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 913722 - Posted: 3 Jul 2009, 21:16:53 UTC - in response to Message 913719.  
Last modified: 3 Jul 2009, 21:18:00 UTC


How much money would be needed for a big bandwidth increase?

Last quote I saw to pull 1gb fiber to the server closet; somewhere north of 100k.
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 913722 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913724 - Posted: 3 Jul 2009, 21:19:05 UTC - in response to Message 913722.  

Yeah, that's not gonna happen at least until the economy improves.
ID: 913724 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Unable to Upload again


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.