Panic Mode On (79) Server Problems?

Message boards : Number crunching : Panic Mode On (79) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 1321117 - Posted: 29 Dec 2012, 4:19:22 UTC - in response to Message 1321103.  
Last modified: 29 Dec 2012, 4:36:45 UTC

Scheduler request errors are now occuring within a couple of seconds.
If that doesn't happen then a timeout is more likely than a response. Add to that the shorties & the continuing upload issues.


EDIT- and these shorties are *really* short.
Running 3 at a time on my GTX 560Ti they're being done in just over 4 minutes.
Grant
Darwin NT
ID: 1321117 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51583
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1321119 - Posted: 29 Dec 2012, 4:25:41 UTC - in response to Message 1321117.  
Last modified: 29 Dec 2012, 4:26:43 UTC


Scheduler request errors are now occuring within a couple of seconds.
If that doesn't happen then a timeout is more likely than a response. Add to that the shorties & the continuing upload issues.

I have been seeing my caches, such as were stingily allowed recently, diminish greatly.
As is the temperature in the crunching den of the kitties.

When I walked in after work this afternoon and saw the room temperature, I knew something was wrong/. It had already gone to crap hours before, while I was at work.
Multi thousand watts of Seti crunching power not being dissipated by hard working GPUs into the cold winter air.

Meowsigh.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1321119 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 1321131 - Posted: 29 Dec 2012, 5:03:24 UTC - in response to Message 1321122.  
Last modified: 29 Dec 2012, 5:04:45 UTC

Just ran out of work on one GPU.
Even if i can upload the results, i get nothing but Scheduler errors or timeouts. By then, i can't get new work because of all the uploads that have backed up again. When i finally get them all uploaded, i get a Scheduler error or timeout again.
Etc, etc, etc,
Grant
Darwin NT
ID: 1321131 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51583
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1321133 - Posted: 29 Dec 2012, 5:06:26 UTC - in response to Message 1321131.  


Just ran out of work on one GPU.
Even if i can upload the results, i get nothing but Scheduler errors or timeouts. By then, i can't get new work because of all the uploads that have backed up again.

I am ditching soon for the night.......
Hope the kitties and myself don't freeze overnight due to the LACK OF FREAKING CRUNCHING>>>>>>>>>ARRGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHH.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1321133 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38637
Credit: 261,360,520
RAC: 489
Australia
Message 1321148 - Posted: 29 Dec 2012, 5:58:04 UTC - in response to Message 1321131.  

Just ran out of work on one GPU.
Even if i can upload the results, i get nothing but Scheduler errors or timeouts. By then, i can't get new work because of all the uploads that have backed up again. When i finally get them all uploaded, i get a Scheduler error or timeout again.
Etc, etc, etc,

Having 2 or 3 good backup projects at times like these helps (I must see about getting 2 of them added to our Team's participating projects list). ;)

Cheers.
ID: 1321148 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38637
Credit: 261,360,520
RAC: 489
Australia
Message 1321150 - Posted: 29 Dec 2012, 6:13:03 UTC - in response to Message 1321148.  

Seems as if something has received a fix as my uploads are suddenly start to clear. :)

Cheers.
ID: 1321150 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1321151 - Posted: 29 Dec 2012, 6:16:04 UTC

It may be about over. My Uploads aren't hanging anymore, most of the 4 minute MB Shorties have been replaced by newly downloaded 20 min Longies, and the Update is working more than not. Maybe...
ID: 1321151 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1321153 - Posted: 29 Dec 2012, 6:21:51 UTC - in response to Message 1321133.  

Uploads are still working but it takes a while and can be very temperamental. However I can't get a scheduler to not timeout and without that I can't see if downloads are equally temperamental.

Something is stopping up the pipes. Cricket now shows a nasty spike of incoming packets with a similar spike on the outgoing packets. Incoming packets now around 1/3 higher than is was yesterday and outgoing packets at nearly 10 kpkt/s.

Looks as if the chart started to go off "normal" about 24 hours ago. I imagine we are seeing some kind of cascading failure condition, clogging the pipes with retries that then trigger more hosts to fail connecting which then triggers more retires that clogs the pipes even more, etc.

This is what I'm seeing in my log

12/29/2012 1:09:16 AM | SETI@home | Reporting 6 completed tasks, requesting new tasks for CPU and ATI
12/29/2012 1:09:18 AM |  | Project communication failed: attempting access to reference site
12/29/2012 1:09:18 AM | SETI@home | Scheduler request failed: Server returned nothing (no headers, no data)
12/29/2012 1:09:20 AM |  | Internet access OK - project servers may be temporarily down.


"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1321153 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 1321158 - Posted: 29 Dec 2012, 6:30:08 UTC - in response to Message 1321155.  
Last modified: 29 Dec 2012, 6:30:59 UTC

Question is, can anybody report?

Very, very, very occasionally.
Very.


And that's with No New Tasks Set.
Even less often when trying for more work.
Grant
Darwin NT
ID: 1321158 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1321160 - Posted: 29 Dec 2012, 6:31:04 UTC - in response to Message 1321155.  
Last modified: 29 Dec 2012, 6:32:47 UTC

Honestly it's becoming SOP that when I complain it starts working. Well sort of.

12/29/2012 1:23:30 AM | SETI@home | Reporting 7 completed tasks, requesting new tasks for CPU and ATI
12/29/2012 1:24:54 AM | SETI@home | Scheduler request completed: got 7 new tasks
12/29/2012 1:24:56 AM | SETI@home | Started download of 22dc09ac.3555.167212.140733193388041.10.92
12/29/2012 1:24:56 AM | SETI@home | Started download of 08oc12aa.26644.119686.4.10.1.vlar
12/29/2012 1:25:05 AM | SETI@home | Computation for task 26no12ad.24534.17654.140733193388043.10.56_1 finished
12/29/2012 1:25:05 AM | SETI@home | Starting task 22no12ad.3861.7429.140733193388038.10.76.vlar_1 using setiathome_enhanced version 603 in slot 0
12/29/2012 1:25:07 AM | SETI@home | Started upload of 26no12ad.24534.17654.140733193388043.10.56_1_0
12/29/2012 1:25:29 AM |  | Project communication failed: attempting access to reference site
12/29/2012 1:25:29 AM | SETI@home | Temporarily failed upload of 26no12ad.24534.17654.140733193388043.10.56_1_0: connect() failed
12/29/2012 1:25:29 AM | SETI@home | Backing off 2 min 13 sec on upload of 26no12ad.24534.17654.140733193388043.10.56_1_0
12/29/2012 1:25:31 AM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 1:27:43 AM | SETI@home | Started upload of 26no12ad.24534.17654.140733193388043.10.56_1_0
12/29/2012 1:28:58 AM |  | Project communication failed: attempting access to reference site
12/29/2012 1:28:58 AM | SETI@home | Temporarily failed upload of 26no12ad.24534.17654.140733193388043.10.56_1_0: connect() failed
12/29/2012 1:28:58 AM | SETI@home | Backing off 4 min 50 sec on upload of 26no12ad.24534.17654.140733193388043.10.56_1_0
12/29/2012 1:28:59 AM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 1:29:59 AM |  | Project communication failed: attempting access to reference site
12/29/2012 1:29:59 AM | SETI@home | Temporarily failed download of 08oc12aa.26644.119686.4.10.1.vlar: transient HTTP error
12/29/2012 1:29:59 AM | SETI@home | Backing off 3 min 5 sec on download of 08oc12aa.26644.119686.4.10.1.vlar
12/29/2012 1:29:59 AM | SETI@home | Started download of 22dc09ac.3555.167212.140733193388041.10.98
12/29/2012 1:30:00 AM |  | Internet access OK - project servers may be temporarily down.


And the answer to my previous post is yes, downloads are equally cranky.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1321160 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 1321162 - Posted: 29 Dec 2012, 6:32:35 UTC - in response to Message 1321150.  

Seems as if something has received a fix as my uploads are suddenly start to clear. :)

Still takes multiple clicks of the Retry button for me.

Grant
Darwin NT
ID: 1321162 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38637
Credit: 261,360,520
RAC: 489
Australia
Message 1321163 - Posted: 29 Dec 2012, 6:36:05 UTC - in response to Message 1321150.  

Seems as if something has received a fix as my uploads are suddenly start to clear. :)

Cheers.

Well that only lasted long enough to clear 2 rigs. :(

Cheers.
ID: 1321163 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 1321170 - Posted: 29 Dec 2012, 6:48:15 UTC - in response to Message 1321162.  

Seems as if something has received a fix as my uploads are suddenly start to clear. :)

Still takes multiple clicks of the Retry button for me.


Although it's not taking as many clicks as it was.
However this upload improvement (minor as it is) appears to have come at the expense of Scheduler contact. As difficult as it has been, it's even more borked than it was before.
Grant
Darwin NT
ID: 1321170 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 1321180 - Posted: 29 Dec 2012, 7:38:43 UTC - in response to Message 1321170.  


Not getting many Scheduler timeouts now.
Now they just fail.

I can upload, but not contact the Schedluer- or it times out. Or i can upload, but the Scheduler just fails.
Either way, getting work just isn't possible.
Grant
Darwin NT
ID: 1321180 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51583
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1321184 - Posted: 29 Dec 2012, 7:43:49 UTC

It's just the alien's way of trying to know that not too m;any of us shall find them too soon.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1321184 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 1321187 - Posted: 29 Dec 2012, 7:59:38 UTC - in response to Message 1321184.  


And now uploads have gone from bad to worse, but Scheduler contacts haven't improved at all.
It'll probably all come crashing down before morning.
Grant
Darwin NT
ID: 1321187 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1321192 - Posted: 29 Dec 2012, 8:25:29 UTC

:-(

Secondary projects in my BOINC are 'happy' about ..

I guess (worry) S@h is down, until after the first power outage ..

After a fresh start .. - at Jan/7, 8 should run 'everything' again well.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
ID: 1321192 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1321217 - Posted: 29 Dec 2012, 10:16:30 UTC

It's not completely dead. I got 3 or 4 uploads through and got one scheduler contact to report some completions in the last half hour. Can't get other uploads through or get new work, so gpu is crunching Einstein on 0 resource share to keep the house warm.

It looks like a long weekend + holiday. Wish they'd loosen the limits if they are even needed - I'm not convinced of that.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1321217 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51583
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1321221 - Posted: 29 Dec 2012, 10:32:06 UTC - in response to Message 1321187.  


And now uploads have gone from bad to worse, but Scheduler contacts haven't improved at all.
It'll probably all come crashing down before morning.

Every once in a while, the kitty claws get through to the servers letting them know they still want work......
It's still all basically fukayed.
Mostly by the server problems.
But also by the underpinnings of Boinc's 'don't bother the servers' attitude.
Server contacts borked, don't try any more.
Too many uploads pending, don't ask for work.
Even ONE download backed off, don't ask for work.
Things not running smoothly, don't even ask.
A fart in the carload...back off.

Any excuse to back off.
And, if things work, stop at 100 tasks for your GPUs.
Even if they can return 100 tasks in a few hours or less.

This is rather maddening when one has 9 rigs online begging to do as much work possible for this project as they can. THIS project, not Einstein or some other backup. THIS project. The one I care about.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1321221 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 1321236 - Posted: 29 Dec 2012, 11:00:36 UTC - in response to Message 1321221.  
Last modified: 29 Dec 2012, 11:01:34 UTC

Whatever has stuffed things up, it's an odd one. There have been a couple of spikes in inbound traffic, but uploads & Scheduler requests have still been stuffed even during those periods. There's been the odd very slight dip in outbound traffic, yet most Scheduler requests result in an error, those that aren't an error timeout- It'd would be 5% or less of the requests that actually get a result.
Still the network traffic (inbound & outbound) is right up there.
Grant
Darwin NT
ID: 1321236 · Report as offensive
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?


 
©2026 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.