I can U/L & D/L

Message boards : Number crunching : I can U/L & D/L
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile KaJiCkY
Volunteer tester
Avatar

Send message
Joined: 22 Nov 04
Posts: 23
Credit: 73,336
RAC: 0
United Kingdom
Message 161289 - Posted: 1 Sep 2005, 10:26:26 UTC
Last modified: 1 Sep 2005, 10:27:48 UTC

After this long outage, I got d/l's back yesturday but no uploads, I sat this morning and kept hitting the retry transfer button for the WU's waiting for U/L and just got them to go through - its obvious that the server can not take all of the U/L from everyone at the same time, so when the next 2 WU finish Crunching I'm guessing the problem will be the same, until the Ready to Send gets ALOT lower (250,000+ Computers Ready to Send currently).
The problem now is the Waiting for Validation has gone up again on the Server Stats Page. Are we going to see another long outage while those systems have to catch up again?




Kai
ID: 161289 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 161291 - Posted: 1 Sep 2005, 10:28:00 UTC

Please go back and read the Technical News, it is right above the Server Status link. It will answer all your questions.

ID: 161291 · Report as offensive
Profile Dominique
Volunteer tester
Avatar

Send message
Joined: 3 Mar 05
Posts: 1628
Credit: 74,745
RAC: 0
United States
Message 161296 - Posted: 1 Sep 2005, 11:03:36 UTC

Hmm, no problems on D/L or U/L. Maybe because I just let it do it's own thing instead of hammering the Update button.

-Mr. anon
ID: 161296 · Report as offensive
Profile jshenry1963

Send message
Joined: 17 Nov 04
Posts: 182
Credit: 68,878
RAC: 0
United States
Message 161305 - Posted: 1 Sep 2005, 11:56:51 UTC - in response to Message 161296.  

This is the best information that anyone can give.
LET IT DO ITs OWN THING, and it will eventually clear out.
I'm sure 90% of the users now are at the point where they are frustrated and want to try to push theirs through so that theirs gets counted,
BUT
Each time each of us hit the retry buttons, you add unnecessary traffic, and that is one reason why so many can't get through.
There are even some who have modified it so that their retries happen quicker than the normal boinc timeouts... shame on you. All that does is bog down the server with multiple unnecessary hits and therefore FAILS.

Give it a break, let it do its own thing, and everyone will be happier in a day or so.

Patience, Persistence, Truth

Hmm, no problems on D/L or U/L. Maybe because I just let it do it's own thing instead of hammering the Update button.

-Mr. anon


Thanks, and Keep on crunchin'
John Henry KI4JPL
Sevierville TN

I started with nothing,
and I still have some of it left.
<img src="http://www.boincstats.com/stats/banner.php?cpid=989478996ebd8eadba8f0809051cdde2">
ID: 161305 · Report as offensive
James Nelson
Volunteer tester
Avatar

Send message
Joined: 23 Mar 02
Posts: 381
Credit: 4,806,382
RAC: 0
United States
Message 161323 - Posted: 1 Sep 2005, 12:57:21 UTC - in response to Message 161289.  

After this long outage, I got d/l's back yesturday but no uploads, I sat this morning and kept hitting the retry transfer button for the WU's waiting for U/L and just got them to go through - its obvious that the server can not take all of the U/L from everyone at the same time, so when the next 2 WU finish Crunching I'm guessing the problem will be the same, until the Ready to Send gets ALOT lower (250,000+ Computers Ready to Send currently).
The problem now is the Waiting for Validation has gone up again on the Server Stats Page. Are we going to see another long outage while those systems have to catch up again?




Kai

the ready to send on the status page is the number of WU ready to be sent out not the number of WU ready to send back.
ID: 161323 · Report as offensive
Profile Martin A. Boegelund
Volunteer tester
Avatar

Send message
Joined: 4 Jul 00
Posts: 292
Credit: 387,485
RAC: 1
Denmark
Message 161328 - Posted: 1 Sep 2005, 13:14:32 UTC - in response to Message 161305.  

This is the best information that anyone can give.
LET IT DO ITs OWN THING, and it will eventually clear out.
I'm sure 90% of the users now are at the point where they are frustrated and want to try to push theirs through so that theirs gets counted,
BUT
Each time each of us hit the retry buttons, you add unnecessary traffic, and that is one reason why so many can't get through.
There are even some who have modified it so that their retries happen quicker than the normal boinc timeouts... shame on you. All that does is bog down the server with multiple unnecessary hits and therefore FAILS.

Give it a break, let it do its own thing, and everyone will be happier in a day or so.

Patience, Persistence, Truth

Hmm, no problems on D/L or U/L. Maybe because I just let it do it's own thing instead of hammering the Update button.

-Mr. anon



When I tried to push an upload, it failed. When I just left it alone to upload whenever the client felt like it, it worked.

So let me repeat what the others said:
Let the software do its thing, and everything will clear up!

"Are you suggesting coconuts migrate?"

ID: 161328 · Report as offensive
Profile Mosaix

Send message
Joined: 28 Dec 99
Posts: 114
Credit: 419,427
RAC: 0
United Kingdom
Message 161335 - Posted: 1 Sep 2005, 13:21:36 UTC - in response to Message 161305.  


There are even some who have modified it so that their retries happen quicker than the normal boinc timeouts...


Out of interest, how do you know this?
ID: 161335 · Report as offensive
Profile Martin A. Boegelund
Volunteer tester
Avatar

Send message
Joined: 4 Jul 00
Posts: 292
Credit: 387,485
RAC: 1
Denmark
Message 161337 - Posted: 1 Sep 2005, 13:24:09 UTC - in response to Message 161335.  
Last modified: 1 Sep 2005, 13:25:32 UTC


There are even some who have modified it so that their retries happen quicker than the normal boinc timeouts...


Out of interest, how do you know this?


Out of interest, do you really think it's a good idea to post this info in a public forum?

;-)
"Are you suggesting coconuts migrate?"

ID: 161337 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 161340 - Posted: 1 Sep 2005, 13:34:15 UTC - in response to Message 161337.  
Last modified: 1 Sep 2005, 13:35:47 UTC


There are even some who have modified it so that their retries happen quicker than the normal boinc timeouts...

Out of interest, how do you know this?

Out of interest, do you really think it's a good idea to post this info in a public forum?;-)

NO it is NOT! BUT if you have not already done so you could post it in the developers email list
List-Archive: http://ssl.berkeley.edu/pipermail/boinc_projects
List-Post: mailto:boinc_projects@ssl.berkeley.edu
List-Help: mailto:boinc_projects-request@ssl.berkeley.edu?subject=help
List-Subscribe: http://www.ssl.berkeley.edu/mailman/listinfo/boinc_projects



ID: 161340 · Report as offensive
Profile Mosaix

Send message
Joined: 28 Dec 99
Posts: 114
Credit: 419,427
RAC: 0
United Kingdom
Message 161369 - Posted: 1 Sep 2005, 14:26:19 UTC - in response to Message 161340.  


There are even some who have modified it so that their retries happen quicker than the normal boinc timeouts...

Out of interest, how do you know this?

Out of interest, do you really think it's a good idea to post this info in a public forum?;-)

NO it is NOT! BUT if you have not already done so you could post it in the developers email list
List-Archive: http://ssl.berkeley.edu/pipermail/boinc_projects
List-Post: mailto:boinc_projects@ssl.berkeley.edu
List-Help: mailto:boinc_projects-request@ssl.berkeley.edu?subject=help
List-Subscribe: http://www.ssl.berkeley.edu/mailman/listinfo/boinc_projects




I wasn't interested in how it was done but how he knew it was done.

And yes I do know that this will create an interest in how it was done and no I don't think he should supply the info.

So:

Out of interest, how do you know this?
ID: 161369 · Report as offensive
Don Hughes

Send message
Joined: 3 Jun 99
Posts: 64
Credit: 139,995
RAC: 0
United States
Message 161375 - Posted: 1 Sep 2005, 14:36:08 UTC

Frankly, I think that there is a problem with the U/L D/L retry code.

When the systems first came back up, all of my WUs in 'ready to report' status uploaded almost immediately. As new WU's complete, they upload. None of my WUs that are stuck in 'retry' status have ever uploaded. Eventually, after some number of retries, they will revert to 'ready to report' and then they upload. Downloads in 'retry' status were not completing and blocking additional attempts. When they finally abort, a new D/L is started and completes.

This does not seem to be a 'contention' problem, because the chances of a single 'ready to report' W/U making it in a single attempt while 30 or so 'retry' WU's cannot after several hundred attempts over several days seems unlikely.

I have noticed similar behaviour after previous outages.
...don
ID: 161375 · Report as offensive
Profile Kajunfisher
Volunteer tester
Avatar

Send message
Joined: 29 Mar 05
Posts: 1407
Credit: 126,476
RAC: 0
United States
Message 161377 - Posted: 1 Sep 2005, 14:38:59 UTC - in response to Message 161375.  

Frankly, I think that there is a problem with the U/L D/L retry code.

When the systems first came back up, all of my WUs in 'ready to report' status uploaded almost immediately. As new WU's complete, they upload. None of my WUs that are stuck in 'retry' status have ever uploaded. Eventually, after some number of retries, they will revert to 'ready to report' and then they upload. Downloads in 'retry' status were not completing and blocking additional attempts. When they finally abort, a new D/L is started and completes.

This does not seem to be a 'contention' problem, because the chances of a single 'ready to report' W/U making it in a single attempt while 30 or so 'retry' WU's cannot after several hundred attempts over several days seems unlikely.

I have noticed similar behaviour after previous outages.


From the "Technical News": "Well, we uncovered another problem - the upload/download server was so busy it randomly lost NFS mounts, including necessary things like /usr/local. So the file_upload_handler was flailing throughout the course of the evening. This morning (after the usual Wednesday database backup outage) we determined this was an automounter problem, put in some hard mounts for required partitions, and so far it's been working pretty well (though still very far from catching up. We're dropping hundreds of connections per second - only a lucky 20-30 RPCs/sec are getting through)."

ID: 161377 · Report as offensive
Profile Mosaix

Send message
Joined: 28 Dec 99
Posts: 114
Credit: 419,427
RAC: 0
United Kingdom
Message 161380 - Posted: 1 Sep 2005, 14:42:52 UTC - in response to Message 161377.  

Frankly, I think that there is a problem with the U/L D/L retry code.

When the systems first came back up, all of my WUs in 'ready to report' status uploaded almost immediately. As new WU's complete, they upload. None of my WUs that are stuck in 'retry' status have ever uploaded. Eventually, after some number of retries, they will revert to 'ready to report' and then they upload. Downloads in 'retry' status were not completing and blocking additional attempts. When they finally abort, a new D/L is started and completes.

This does not seem to be a 'contention' problem, because the chances of a single 'ready to report' W/U making it in a single attempt while 30 or so 'retry' WU's cannot after several hundred attempts over several days seems unlikely.

I have noticed similar behaviour after previous outages.


From the "Technical News": "Well, we uncovered another problem - the upload/download server was so busy it randomly lost NFS mounts, including necessary things like /usr/local. So the file_upload_handler was flailing throughout the course of the evening. This morning (after the usual Wednesday database backup outage) we determined this was an automounter problem, put in some hard mounts for required partitions, and so far it's been working pretty well (though still very far from catching up. We're dropping hundreds of connections per second - only a lucky 20-30 RPCs/sec are getting through)."



That doesn't seem to be relevant to his particular experiences.
ID: 161380 · Report as offensive

Message boards : Number crunching : I can U/L & D/L


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.