Panic Mode On (73) Server problems?

Message boards : Number crunching : Panic Mode On (73) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

AuthorMessage
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1215240 - Posted: 7 Apr 2012, 19:40:48 UTC - in response to Message 1215229.  

12 hours since anyone had a panic attack. That can't be right, it just has to be something to panic about. :-)

No panic! Murphy's law has become true!


Ah.
But.
As soon as Murphy`s law come`s true it sue`s itself in court for more than it was ever worth and does something totaly unpredictable,
which is in it`s self predictable, though that outcome is also impossible to predict.
I think i had better go and have forty winks to recover from that bit of quantum theory.


How about having to hit update 7 time in order to force a contact with the server.. No panic as such, just extremely jarred off with this perpetual comms snafu..
Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1215240 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1215241 - Posted: 7 Apr 2012, 19:45:08 UTC

Cliff

Try downgrading to 6.10.60 to avoid th 6.12.xx backoffs. Made a massive world of difference this end. Suddenly it's like the olde dayse!
ID: 1215241 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1215264 - Posted: 7 Apr 2012, 20:48:11 UTC - in response to Message 1215241.  

Cliff

Try downgrading to 6.10.60 to avoid th 6.12.xx backoffs. Made a massive world of difference this end. Suddenly it's like the olde dayse!


Yup, did that myself a few months ago cos i was totaly backed off with the crazy backoff`s.
ID: 1215264 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1215271 - Posted: 7 Apr 2012, 20:56:37 UTC - in response to Message 1215241.  
Last modified: 7 Apr 2012, 20:58:16 UTC

Cliff

Try downgrading to 6.10.60 to avoid th 6.12.xx backoffs. Made a massive world of difference this end. Suddenly it's like the olde dayse!


I think I'll wait it out:-) After all its v7 next is'nt it? Then its lunatics kit and the fun and games start again:-)

The backoffs believe it or not, are not the problem so much as the very patchy comms between boinc and its home turf..

Backoffs are cured by highlighting the entire download queue and then hitting retry whenever 1 of them stalls.. The 1st backoff makes it easy to gather the lot and hammer the d/l until all are in.

So far today I've had about 3 WU stall.. no more but failed updates are another story.. But if I repeatedly hit update, when boinc times out the first time, then by update hit 3 or 4 or 7 its in contact and the d/l proceeds smoothly.

If I'm asleep then Ray's SIV will handle the backoffs, more slowly then me, but it gets it done.

Cheers,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1215271 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1215299 - Posted: 7 Apr 2012, 21:37:19 UTC - in response to Message 1215209.  

12 hours since anyone had a panic attack. That can't be right, it just has to be something to panic about. :-)

A lightbulb blew earlier, AND I DON'T HAVE A SPARE! :) Other than that I'm still waiting for my astropulse times to get down to what they actually are. At the moment they're at 197hrs :)

Member of the People Encouraging Niceness In Society club.

ID: 1215299 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1215409 - Posted: 8 Apr 2012, 0:31:56 UTC - in response to Message 1215299.  
Last modified: 8 Apr 2012, 0:32:13 UTC

A lightbulb blew earlier, AND I DON'T HAVE A SPARE! :) Other than that I'm still waiting for my astropulse times to get down to what they actually are. At the moment they're at 197hrs :)

It'll get there. You need 10 validated APs that did not early-exit and had less than 10% blanking. Mine took a while for that to happen. Right about 44 total APs to get 10 that met the criteria. Of course by then, the DCF mechanism was just about to provide reasonable rates anyway.

Because of the way DCF works, it took ~20 to go from a little over 200 down to ~175, 15 more to drop down to ~125, and five to get down to ~75. I imagine it would have only taken 5-7 more for the ETA to be pretty close, but I hit that magic 10 and server-side knocked the ETA down to 04:41:47. One task finished after that and now they're all showing ~11.5h like it should be.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1215409 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1215416 - Posted: 8 Apr 2012, 0:54:10 UTC - in response to Message 1215409.  


Now if they could just get DCF to work for MB on CPU & GPU systems.
Grant
Darwin NT
ID: 1215416 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1215836 - Posted: 8 Apr 2012, 22:00:54 UTC

For the last hour or so, all of my scheduler requests on all 3 rigs are ending in timeout. Cricket and status page both look OK. Uploads going through normally. Anyone else?
ID: 1215836 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1215837 - Posted: 8 Apr 2012, 22:06:10 UTC - in response to Message 1215836.  
Last modified: 8 Apr 2012, 22:06:42 UTC

Yup, same here for past 20 hrs takes 3 or more attempts to get sorted.

Cheers,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1215837 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1215850 - Posted: 8 Apr 2012, 22:40:44 UTC - in response to Message 1215836.  
Last modified: 8 Apr 2012, 22:46:08 UTC

For the last hour or so, all of my scheduler requests on all 3 rigs are ending in timeout. Cricket and status page both look OK. Uploads going through normally. Anyone else?

Just checked my logs. Been OK (at least on one machine) for the last hour or 2, but for most of the night the majority of Scheduler requests were timing out.


EDIT- just checked my other machine. Still getting the odd Scheduler timeout there. Looks like it's just the luck of the draw. You might get a Scheduler timeout. Even if that doesn't happen you might get a "Project has no tasks available" message. If you're really lucky you might get some work.
Grant
Darwin NT
ID: 1215850 · Report as offensive
Profile Britz

Send message
Joined: 1 Apr 12
Posts: 1
Credit: 267,233
RAC: 0
United States
Message 1215910 - Posted: 9 Apr 2012, 0:47:09 UTC

I got about 16 GPU and 1 AP WUs today, and that was around 3:00 EDT. Took a long while for the GPU WUs to get returned. Hope it all gets sorted out soon.
ID: 1215910 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1215956 - Posted: 9 Apr 2012, 3:15:06 UTC

My apologies for clogging the pipe, but my 10-day AP-only cache is now full. 3 crunching with 63 "ready to start."

What were the limits again? Wasn't it 50 per [physical] CPU, or did it go back to counting cores? I know for a while it didn't matter if it was 1-16 cores, it was 1 physical CPU and there was a limit.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1215956 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1215978 - Posted: 9 Apr 2012, 3:42:01 UTC

I don't know if this is related to server problems or what, but my i7's tasks in progress that was at its maximum of 800 earlier today is now 699.

This may or may not have anything to do with the fact that I finally pulled the trigger and installed Lunatics around noon today.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1215978 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1215992 - Posted: 9 Apr 2012, 4:37:28 UTC - in response to Message 1215978.  

I don't know if this is related to server problems or what, but my i7's tasks in progress that was at its maximum of 800 earlier today is now 699.

This may or may not have anything to do with the fact that I finally pulled the trigger and installed Lunatics around noon today.

My tasks in progress continues to drop, and the server has been saying I've reached my limit for about 7 hours now. Is this anything to do with the switch to Lunatics? Boinc is running GPU work on high priority, even though it's not due for a week or more. (If it matters, I set app_info to do 2 GPUs at a time.)

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1215992 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1216001 - Posted: 9 Apr 2012, 5:03:30 UTC - in response to Message 1215992.  

I don't know if this is related to server problems or what, but my i7's tasks in progress that was at its maximum of 800 earlier today is now 699.

This may or may not have anything to do with the fact that I finally pulled the trigger and installed Lunatics around noon today.

My tasks in progress continues to drop, and the server has been saying I've reached my limit for about 7 hours now. Is this anything to do with the switch to Lunatics? Boinc is running GPU work on high priority, even though it's not due for a week or more. (If it matters, I set app_info to do 2 GPUs at a time.)


I guess the limit was reached only on CPU tasks, if the GPU's are in panic mode, BOINC wont ask for more GPU work, and if you are not using the flops tags then the scheduller is still learning what the speed for the new app's is and meanwhile is using a (slower) default value which is forcing the panic mode...
(or something like that...)
ID: 1216001 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1216014 - Posted: 9 Apr 2012, 5:42:09 UTC

Scheduler still seems borked.

I've got over 300 to report, and I'm guessing about 100 or more ghosted, waiting for a good scheduler response or five to start receiving them as resends.

Still getting 99% timeouts.
ID: 1216014 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1216023 - Posted: 9 Apr 2012, 5:59:35 UTC - in response to Message 1216014.  


Most of today the Scheduler has been hit & miss, but for about the last hour it's all been miss. Every attempt has timed out.
Grant
Darwin NT
ID: 1216023 · Report as offensive
AndrewM
Volunteer tester

Send message
Joined: 5 Jan 08
Posts: 369
Credit: 34,275,196
RAC: 0
Australia
Message 1216031 - Posted: 9 Apr 2012, 6:31:46 UTC

To borrow a line from Wiggo, I'm bouncing off the limits
AndrewM
ID: 1216031 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1216033 - Posted: 9 Apr 2012, 7:04:07 UTC - in response to Message 1216031.  

To borrow a line from Wiggo, I'm bouncing off the limits

So am i, now.
But for several hours there i was getting further & further from the limits with each Scheduler timeout.
Grant
Darwin NT
ID: 1216033 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22149
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1216036 - Posted: 9 Apr 2012, 7:31:53 UTC

The schedulers must have had a surfeit of Chocolate Easter Eggs and dozed off in the arm chair...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1216036 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

Message boards : Number crunching : Panic Mode On (73) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.