Panic Mode On (35) Server problems

Message boards : Number crunching : Panic Mode On (35) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 12 · Next

AuthorMessage
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1012348 - Posted: 5 Jul 2010, 19:43:47 UTC

Something just doesn't add up right...

Results ready to send 183,652 1,640 9m
Current result creation rate 27.8027/sec 0.2383/sec 5m

7/5/2010 12:40:16 PM SETI@home Requesting new tasks for GPU
7/5/2010 12:40:20 PM SETI@home Scheduler request completed: got 0 new tasks
7/5/2010 12:40:20 PM SETI@home Message from server: Project has no tasks available
7/5/2010 12:40:36 PM SETI@home Sending scheduler request: To fetch work.
7/5/2010 12:40:36 PM SETI@home Requesting new tasks for GPU
7/5/2010 12:40:43 PM SETI@home Scheduler request completed: got 0 new tasks
7/5/2010 12:40:43 PM SETI@home Message from server: Project has no tasks available
7/5/2010 12:40:58 PM SETI@home Sending scheduler request: To fetch work.
7/5/2010 12:40:58 PM SETI@home Requesting new tasks for GPU
7/5/2010 12:41:07 PM SETI@home Scheduler request completed: got 0 new tasks
7/5/2010 12:41:07 PM SETI@home Message from server: Project has no tasks available

Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1012348 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 1012349 - Posted: 5 Jul 2010, 19:45:52 UTC

I have downloaded some WUS just checked and I have got client detached when I have not is there a problem?
ID: 1012349 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1012352 - Posted: 5 Jul 2010, 19:51:43 UTC - in response to Message 1012348.  
Last modified: 5 Jul 2010, 19:52:21 UTC

Something just doesn't add up right...

Results ready to send 183,652 1,640 9m
Current result creation rate 27.8027/sec 0.2383/sec 5m

7/5/2010 12:40:16 PM SETI@home Requesting new tasks for GPU
7/5/2010 12:40:20 PM SETI@home Scheduler request completed: got 0 new tasks
7/5/2010 12:40:20 PM SETI@home Message from server: Project has no tasks available
7/5/2010 12:40:36 PM SETI@home Sending scheduler request: To fetch work.
7/5/2010 12:40:36 PM SETI@home Requesting new tasks for GPU
7/5/2010 12:40:43 PM SETI@home Scheduler request completed: got 0 new tasks
7/5/2010 12:40:43 PM SETI@home Message from server: Project has no tasks available
7/5/2010 12:40:58 PM SETI@home Sending scheduler request: To fetch work.
7/5/2010 12:40:58 PM SETI@home Requesting new tasks for GPU
7/5/2010 12:41:07 PM SETI@home Scheduler request completed: got 0 new tasks
7/5/2010 12:41:07 PM SETI@home Message from server: Project has no tasks available

Hiamps buddy.....this has been discussed sooooooooo many times here before. Surprised you have never caught it.

There are lots of WUs ready to send, but they only get loaded into the feeder one batch at a time. And are being sent out as fast as the bandwidth will allow. Then the feeder has to be refilled. When you connect and the feeder is between cycles, you get the no tasks available message. You just have to happen to connect at the moment that the feeder has just been refilled to get some for your rig.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1012352 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1012353 - Posted: 5 Jul 2010, 19:51:59 UTC - in response to Message 1012348.  

Something just doesn't add up right...

Results ready to send 183,652 1,640 9m
Current result creation rate 27.8027/sec 0.2383/sec 5m

Something about those Results ready to send, are send to the download servers at a rate where it can't keep up with demand and therefor are empty at times.

I think the status for that is missing, I think be course it changes so quick all the time it wouldn't make sense showing it.

(This is my attempt on explaning something I know just about nothing about :) )
ID: 1012353 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1012386 - Posted: 5 Jul 2010, 21:05:49 UTC

Okay, I tried reading through the past 50 posts and just skipped to the end. I suppose since I'm running AP-only, I could possibly provide some insight to those spikes. I was able to very rarely get work (0 and 1 tasks) during the low periods, but during the spikes is when I was able to get three new WUs at a time, and could just about get one every time I asked for one. The Catch-22 is since the bandwidth is maxed, there are HTTP errors and slow download rates, so BOINC won't ask for new work until the downloads are actually in progress instead of "connect failed."

So it's possible that those spikes were in fact AP distribution since it was a lot easier to get them during the spikes.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1012386 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1012392 - Posted: 5 Jul 2010, 21:21:14 UTC - in response to Message 1012352.  
Last modified: 5 Jul 2010, 21:21:38 UTC

Something just doesn't add up right...

Results ready to send 183,652 1,640 9m
Current result creation rate 27.8027/sec 0.2383/sec 5m

7/5/2010 12:40:16 PM SETI@home Requesting new tasks for GPU
7/5/2010 12:40:20 PM SETI@home Scheduler request completed: got 0 new tasks
7/5/2010 12:40:20 PM SETI@home Message from server: Project has no tasks available
7/5/2010 12:40:36 PM SETI@home Sending scheduler request: To fetch work.
7/5/2010 12:40:36 PM SETI@home Requesting new tasks for GPU
7/5/2010 12:40:43 PM SETI@home Scheduler request completed: got 0 new tasks
7/5/2010 12:40:43 PM SETI@home Message from server: Project has no tasks available
7/5/2010 12:40:58 PM SETI@home Sending scheduler request: To fetch work.
7/5/2010 12:40:58 PM SETI@home Requesting new tasks for GPU
7/5/2010 12:41:07 PM SETI@home Scheduler request completed: got 0 new tasks
7/5/2010 12:41:07 PM SETI@home Message from server: Project has no tasks available

Hiamps buddy.....this has been discussed sooooooooo many times here before. Surprised you have never caught it.

There are lots of WUs ready to send, but they only get loaded into the feeder one batch at a time. And are being sent out as fast as the bandwidth will allow. Then the feeder has to be refilled. When you connect and the feeder is between cycles, you get the no tasks available message. You just have to happen to connect at the moment that the feeder has just been refilled to get some for your rig.

Went in and reset my DCF for the millionth time and started getting some work.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1012392 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1012396 - Posted: 5 Jul 2010, 21:27:41 UTC
Last modified: 5 Jul 2010, 21:33:56 UTC

Nothing even turned in and once again I am at 99 hours for completion....PLEASE FIX THIS.

EDIT: This time I changed the DCF in old and previous client states also, wonder how long it will last?
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1012396 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1012411 - Posted: 5 Jul 2010, 22:04:13 UTC
Last modified: 5 Jul 2010, 22:11:35 UTC

Seems my E8500 host has had a miscommunication and detached itself,
after i abort all the now useless tasks, one of the next communications gives me 46 Ghost Wu's,
was it really a good idea to limit everyone to 20 tasks at a time over the weekend?
now even more hosts need to refill their cache's, making this more likely.

Claggy
ID: 1012411 · Report as offensive
Profile Odan

Send message
Joined: 8 May 03
Posts: 91
Credit: 15,331,177
RAC: 0
United Kingdom
Message 1012416 - Posted: 5 Jul 2010, 22:13:30 UTC - in response to Message 1012047.  
Last modified: 5 Jul 2010, 22:14:15 UTC

Anybody have a clue what the bandwidth cycles on the Cricket Graph are all about? I don't think I have ever seen such a well defined pattern before....

There appears to be correlation with when the splitters are boosting the "Results ready to send" and when they're idle. Compare Scarecrow's graphs, though it's hard to really match the time scales. It's a case where sampling the server status once an hour isn't quite enough to pin down the relationship, but my guess is the high rate download bursts are occurring just after the splitters have stopped for awhile. Or it could just be coincidence...
                                                                 Joe

And "ready to send" doesn't drop to zero when splitters are idle, but bandwidth load drops hugely nevertheless.


My suspicions-
The Ready to Send buffer probably drops quite rapidly when badnwidth is maxed out, then continues to drop gradually once the network traffic drops off till such time as the splitters fire up & top up the buffer; the graphs aren't updated frequently enough to see accurately what's happening.


Extremely wild supposition-
The traffic bursts may be related to odd work request behaviour.
I noticed one or 2 threads where people commented about the client not requesting new work, even though they had less than 20 in their cache. After a while, it does request work & that's when you're getting those bursts in network traffic. Lots of clients running down their buffer below 20 Work Units before requesting more, resulting in short bursts of network traffic.


EDIT- just had a look at the Astropulse graphs. It shows a full Ready to Send buffer, with ups & downs similar to MB, but the slope of the waveform is different.
MB- buffer fills quickly, drains slowly.
AP- buffer fills slowly, but drains quickly.

Looks like the spkies could be very much AP related. Just odd with their fairly consistent frequency.


The spike interval & duration I can't explain any better than above but if you compare the cricket graph with scarecrow's "AP in progress" you can see a beautiful correlation with a ratcheting up of AP in progress every time a data burst occurs. Very pretty!

[/img]
ID: 1012416 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1012418 - Posted: 5 Jul 2010, 22:17:44 UTC

to add insult to injury, there seems to be some internet connectivity problems towards the San Francisco bay area in general (not SETI/Berkeley/etc specific).

This could be compounding errors while downloading and amplifying spikes.
Janice
ID: 1012418 · Report as offensive
Ianab
Volunteer tester

Send message
Joined: 11 Jun 08
Posts: 732
Credit: 20,635,586
RAC: 5
New Zealand
Message 1012419 - Posted: 5 Jul 2010, 22:26:05 UTC - in response to Message 1012411.  

Well the plan seemed to be do a "Soft Start".

Limit the work units to 20 until all the slower hosts had full caches, then increase the limit in steps over a day to 2 and get the big caches full in stages.

But when they opened the tap a bit more, the database crashed. So things were left on trickle feed for the weekend as that was a lot better than being dead in the water with the database down.

Now the tap has been turned on, data has is flowing as fast as the system is able. But for my slower PCs things are worse as the overload means they cant refill their 10 or 20 WU caches at the moment.

And I suspect the guys looking for 500 units are not getting them reliably either.

So now we are just now seeing the congestion that we expected on Friday.

The Gradual Startup was a good plan, hope they can get the bugs worked out of. Separate quota for GPU and CPU would quiet a lot of the Whining from the power crunchers that have been on short rations, and get some work to everyone.

Ian
ID: 1012419 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1012426 - Posted: 5 Jul 2010, 22:32:25 UTC

and now another 17 Ghost Wu's,

Claggy
ID: 1012426 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1012476 - Posted: 6 Jul 2010, 0:02:55 UTC - in response to Message 1012396.  

Nothing even turned in and once again I am at 99 hours for completion....PLEASE FIX THIS.

EDIT: This time I changed the DCF in old and previous client states also, wonder how long it will last?

Well this is the longest my times have stayed normal since this started. Anyone else have the DCF problem it would be cool to know if this helps. I changed the DCF to .2 in client state, client state_old and client state_previous.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1012476 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1012482 - Posted: 6 Jul 2010, 0:11:03 UTC - in response to Message 1012476.  

Nothing even turned in and once again I am at 99 hours for completion....PLEASE FIX THIS.

EDIT: This time I changed the DCF in old and previous client states also, wonder how long it will last?

Well this is the longest my times have stayed normal since this started. Anyone else have the DCF problem it would be cool to know if this helps. I changed the DCF to .2 in client state, client state_old and client state_previous.

I have never edited the prev file...I didn't think Boinc referred to it except in the case of some sort of corruption or crash of the working file.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1012482 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1012485 - Posted: 6 Jul 2010, 0:27:35 UTC - in response to Message 1012482.  
Last modified: 6 Jul 2010, 0:28:02 UTC

Nothing even turned in and once again I am at 99 hours for completion....PLEASE FIX THIS.

EDIT: This time I changed the DCF in old and previous client states also, wonder how long it will last?

Well this is the longest my times have stayed normal since this started. Anyone else have the DCF problem it would be cool to know if this helps. I changed the DCF to .2 in client state, client state_old and client state_previous.

I have never edited the prev file...I didn't think Boinc referred to it except in the case of some sort of corruption or crash of the working file.

Neither had I, but I was desperate...LOL I just stole 60 non Vlars from my CPU with the rescheduler and my times stayed stable. Not sure which folder did it or if I am just lucky right now, thats why I am hoping someone else will try.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1012485 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1012488 - Posted: 6 Jul 2010, 0:35:07 UTC

and another 56 Ghosts,

Claggy
ID: 1012488 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1012489 - Posted: 6 Jul 2010, 0:36:11 UTC - in response to Message 1012488.  

and another 56 Ghosts,

Claggy

How do you tell if they are ghosts?
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1012489 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1012490 - Posted: 6 Jul 2010, 0:38:26 UTC - in response to Message 1012485.  
Last modified: 6 Jul 2010, 0:41:19 UTC

Nothing even turned in and once again I am at 99 hours for completion....PLEASE FIX THIS.

EDIT: This time I changed the DCF in old and previous client states also, wonder how long it will last?

Well this is the longest my times have stayed normal since this started. Anyone else have the DCF problem it would be cool to know if this helps. I changed the DCF to .2 in client state, client state_old and client state_previous.

I have never edited the prev file...I didn't think Boinc referred to it except in the case of some sort of corruption or crash of the working file.

Neither had I, but I was desperate...LOL I just stole 60 non Vlars from my CPU with the rescheduler and my times stayed stable. Not sure which folder did it or if I am just lucky right now, thats why I am hoping someone else will try.

Well never mind I was just lucky for a bit....back to 88 hours...

EDIT: I think a VLar did me in, as I don't have any errors but noticed the last batch of downloads for GPU was all Vlars that wanted to start NOW!
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1012490 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1012493 - Posted: 6 Jul 2010, 0:43:55 UTC - in response to Message 1012489.  
Last modified: 6 Jul 2010, 1:21:50 UTC

and another 56 Ghosts,

Claggy

How do you tell if they are ghosts?

06/07/2010 01:26:47 SETI@home update requested by user
06/07/2010 01:26:51 SETI@home [sched_op_debug] Starting scheduler request
06/07/2010 01:26:51 SETI@home Sending scheduler request: Requested by user.
06/07/2010 01:26:51 SETI@home Requesting new tasks for GPU
06/07/2010 01:26:51 SETI@home [sched_op_debug] CPU work request: 0.00 seconds; 0.00 CPUs
06/07/2010 01:26:51 SETI@home [sched_op_debug] NVIDIA GPU work request: 295929.11 seconds; 0.00 GPUs
06/07/2010 01:26:51 SETI@home [sched_op_debug] ATI GPU work request: 309850.73 seconds; 0.00 GPUs
06/07/2010 01:28:35 Project communication failed: attempting access to reference site
06/07/2010 01:28:35 SETI@home Scheduler request failed: Transferred a partial file
06/07/2010 01:28:35 SETI@home [sched_op_debug] Deferring communication for 1 min 0 sec
06/07/2010 01:28:35 SETI@home [sched_op_debug] Reason: Scheduler request failed
06/07/2010 01:28:36 Internet access OK - project servers may be temporarily down.
06/07/2010 01:28:43 SETI@home work fetch suspended by user

and i've got fresh tasks listed in my task list at 6 Jul 2010 0:26:51 UTC, but Boinc never got them :-(

Claggy
ID: 1012493 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 1012496 - Posted: 6 Jul 2010, 0:47:23 UTC

Just when you think things are getting better:

BOINC asked for about 30 tasks, finally enough to fill my 5 day queue. Two downloaded about 20%, now all say "project back off..."

Oh wait, one came back, up to 51% downloaded!

ID: 1012496 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 12 · Next

Message boards : Number crunching : Panic Mode On (35) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.