Panic Mode On (6) Server Problems!

Message boards : Number crunching : Panic Mode On (6) Server Problems!
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 12 · Next

AuthorMessage
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 694316 - Posted: 24 Dec 2007, 20:19:09 UTC

Ok Here's a New thread as the old one was getting a bit BIG!

Also Merry Christmas all!
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 694316 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 694328 - Posted: 24 Dec 2007, 20:51:21 UTC

Merry Xmas BATMAN!
ID: 694328 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 694330 - Posted: 24 Dec 2007, 21:00:21 UTC

May the new thread stay small for some time to come......
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 694330 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 694427 - Posted: 25 Dec 2007, 2:25:50 UTC - in response to Message 694328.  

Merry Xmas BATMAN!

Batman? Who's zat? ;)

And a Happy New Year to You, Merry Christmas Satan, Stick a pitchfork in Me, I'm done.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 694427 · Report as offensive
Profile Logan
Volunteer tester
Avatar

Send message
Joined: 26 Jan 07
Posts: 743
Credit: 918,353
RAC: 0
Spain
Message 694518 - Posted: 25 Dec 2007, 13:07:42 UTC

ID: 694518 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 695448 - Posted: 28 Dec 2007, 22:16:00 UTC


The system is certainly struggling; getting new work allocated but it can take several hours for a Work Unit to download. Download speeds of only a few hundred Bytes seem to be the norm at the moment (990B/s has been the best, 200B/s would be about average).
I've got a couple of Work Units that are only 50% downloaded after 35min of elapsed time.
Grant
Darwin NT
ID: 695448 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 695523 - Posted: 29 Dec 2007, 2:43:41 UTC - in response to Message 695448.  


The system is certainly struggling; getting new work allocated but it can take several hours for a Work Unit to download. Download speeds of only a few hundred Bytes seem to be the norm at the moment (990B/s has been the best, 200B/s would be about average).
I've got a couple of Work Units that are only 50% downloaded after 35min of elapsed time.


I have 1 that has 1 hour 20 minutes and is only at 20%. On another machine I must have 200 so far, they just keep coming...
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 695523 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 695551 - Posted: 29 Dec 2007, 4:02:35 UTC - in response to Message 695448.  

I've got a couple of Work Units that are only 50% downloaded after 35min of elapsed time.

Those last ones have finally downloaded, and another 40 or so are now in the queue to download.
Just checked the network graphs- download traffic has increased still further, inbound traffic has finally started to drop to more normal levels.
The new work queued up is all 2hrs+, unlike most of the present work i've got (and had been receiving) which was 30min at the most.

So if most of the work to come for the next day or so takes that long (or longer) the system should finally get a chance to recover.
Grant
Darwin NT
ID: 695551 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 695568 - Posted: 29 Dec 2007, 5:40:06 UTC - in response to Message 695551.  


So if most of the work to come for the next day or so takes that long (or longer) the system should finally get a chance to recover.


RDCF has been inflated. When the longer running results start completing, RDCF will drop, inducing another round of attempts to retrieve work due to new estimates of what is in the local work queue.

There will be "no work from project", the only question is, how often will that happen?

Web performance is now better, which would seem to indicate that since download numbers went up, but uploads went down, uploading is the more demanding of the two functions, despite there being less bytes involved.

Oh well. Musings by moi. Enjoy...
ID: 695568 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 695585 - Posted: 29 Dec 2007, 6:55:09 UTC
Last modified: 29 Dec 2007, 7:04:25 UTC

I've had 7 work units that have been downloading for 26 hours now. The best one is at 80%, the worst at 25%.

Good thing I have about 3 days worth of other work still queued up.

Looks like the servers are starting to catch up. Hopefully, knock on silicon, as long as we don't get another clump of short duration units we should be back to normal by New Year's eve.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 695585 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 695591 - Posted: 29 Dec 2007, 7:31:21 UTC

On just PC1 I have about 25 WU's that vary between 1.83% and nearly 98.64% and have been downloading for an unknown amount of time. It's like most of what It's doing is trying to just download these and a whole lot more WU's, Looks like about 100-200 and It's just a guess. PC3 has 26 varying between 1.07% to 99.79% download completed and also about 200 or so WU's to download that are at 0% downloaded so far.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 695591 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 695596 - Posted: 29 Dec 2007, 8:06:52 UTC - in response to Message 695551.  

So if most of the work to come for the next day or so takes that long (or longer) the system should finally get a chance to recover.

Hmm, latest batch in the queue are 2hrs+, but there's still a few 30min ones in there as well.

Grant
Darwin NT
ID: 695596 · Report as offensive
[ue] J. Johansson

Send message
Joined: 10 Aug 02
Posts: 27
Credit: 2,048,346
RAC: 0
Finland
Message 695602 - Posted: 29 Dec 2007, 8:46:48 UTC

Just noted, that the "backing of for 1 minute" is not working like it should, because with these short Work Units there are so many WUs, suspending download attempts of a single WU is not doing anything. With the long download queues timers are allways reset to zero before another attempt.

If you would want some of the hosts to back of for some time, this is really not working. So I were just wondering, would it help, if BOINC would suspend all downloads after X number of failed attempts?
ID: 695602 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 695614 - Posted: 29 Dec 2007, 10:39:16 UTC - in response to Message 695602.  

Just noted, that the "backing of for 1 minute" is not working like it should, because with these short Work Units there are so many WUs, suspending download attempts of a single WU is not doing anything. With the long download queues timers are allways reset to zero before another attempt.

?
Many of my downloads are taking so long with so many attempts the backoffs are up to the 3.5-4 hour mark.



Unfortunately more & more of the Work Units in the queue are short ones again, so it doesn't look like the servers are going to get a break any time soon.
Grant
Darwin NT
ID: 695614 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 695691 - Posted: 29 Dec 2007, 17:32:36 UTC - in response to Message 695585.  

I've had 7 work units that have been downloading for 26 hours now. The best one is at 80%, the worst at 25%.

Good thing I have about 3 days worth of other work still queued up.

Looks like the servers are starting to catch up. Hopefully, knock on silicon, as long as we don't get another clump of short duration units we should be back to normal by New Year's eve.

Ah crap, 11dc06ad is also producing short duration units. Just got assigned 3 more of these. Now I'm up to 10 short units and one long unit partially downloaded. Being on dial-up isn't helping either since I can't leave it connected permanently to keep retrying.

In the immortal words of Bender, "We're boned."
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 695691 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 695726 - Posted: 29 Dec 2007, 19:48:45 UTC - in response to Message 695691.  


Ah crap, 11dc06ad is also producing short duration units. Just got assigned 3 more of these. Now I'm up to 10 short units and one long unit partially downloaded. Being on dial-up isn't helping either since I can't leave it connected permanently to keep retrying.

In the immortal words of Bender, "We're boned."


Yeah, I had one waiting to download so I decided to retry. While it was trying to download five more shorties popped up trying to download. :(



PROUD MEMBER OF Team Starfire World BOINC
ID: 695726 · Report as offensive
[ue] J. Johansson

Send message
Joined: 10 Aug 02
Posts: 27
Credit: 2,048,346
RAC: 0
Finland
Message 695750 - Posted: 29 Dec 2007, 22:27:31 UTC - in response to Message 695614.  

Just noted, that the "backing of for 1 minute" is not working like it should, because with these short Work Units there are so many WUs, suspending download attempts of a single WU is not doing anything. With the long download queues timers are allways reset to zero before another attempt.

?
Many of my downloads are taking so long with so many attempts the backoffs are up to the 3.5-4 hour mark.


Ok, perhaps the "allways" was a bit exaggerated, but it will take a lot of attempted downloads before boinc will actually give the servers a break.

If boinc keeps on hammering the servers with requests for hours, before giving it one minute of rest, the current system isn't really making a difference. But what if boinc would suspend all downloads (of a given project) if, say 10 downloads fail without a single success?
ID: 695750 · Report as offensive
Jim Volfan

Send message
Joined: 22 May 99
Posts: 52
Credit: 24,239,706
RAC: 90
United States
Message 695779 - Posted: 30 Dec 2007, 1:13:17 UTC - in response to Message 695750.  

I've had the download issue too, but with a complication. I've now got 19 workunits that completed downloading, but failed due to download errors.

12/29/2007 8:06:54 PM|SETI@home|[error] MD5 check failed for 17no06ad.21676.2237909.5.6.22
12/29/2007 8:06:54 PM|SETI@home|[error] expected 4f9114cc8fb6f82985ca8716a587e7a1, got 543e99cfab89a109ef91b9ecd38959b6
12/29/2007 8:06:54 PM|SETI@home|[error] Checksum or signature error for 17no06ad.21676.2237909.5.6.22



ID: 695779 · Report as offensive
Jim Volfan

Send message
Joined: 22 May 99
Posts: 52
Credit: 24,239,706
RAC: 90
United States
Message 695810 - Posted: 30 Dec 2007, 3:39:00 UTC - in response to Message 695779.  

My total is now up to 40 workunits failed on download.
Is anyone getting good downloads?
ID: 695810 · Report as offensive
Profile Logan
Volunteer tester
Avatar

Send message
Joined: 26 Jan 07
Posts: 743
Credit: 918,353
RAC: 0
Spain
Message 695812 - Posted: 30 Dec 2007, 3:45:13 UTC - in response to Message 695810.  
Last modified: 30 Dec 2007, 3:50:17 UTC

My total is now up to 40 workunits failed on download.
Is anyone getting good downloads?


I have good donwloads with patient... The last was a few minutes ago...

:)

[edit]Another one now[/edit]
Logan.

BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish)
ID: 695812 · Report as offensive
1 · 2 · 3 · 4 . . . 12 · Next

Message boards : Number crunching : Panic Mode On (6) Server Problems!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.