Panic Mode On (47) Server problems?

Message boards : Number crunching : Panic Mode On (47) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

AuthorMessage
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1112799 - Posted: 3 Jun 2011, 20:46:58 UTC

Floating on a sea of green?

Surely floating on the blue line.

In a green atmosphere...

Maybe we have already found alien life - in the form of a duck that breaths green air!!!
ID: 1112799 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6657
Credit: 121,090,076
RAC: 0
United States
Message 1112819 - Posted: 3 Jun 2011, 21:44:12 UTC

I really like that duck!

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1112819 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14672
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1112840 - Posted: 3 Jun 2011, 23:45:53 UTC - in response to Message 1112637.  

I'm surprised about the following message of BOINC which come from time to time since ~ 00:42 UTC: 'Scheduler request failed: Timeout was reached' - after ~ 60 secs, IIRC normal this happened ~ 600 secs in past..

I tested this with my 420M laptop, also running v6.12.28:

04/06/2011 00:33:22 | SETI@home | Sending scheduler request: To fetch work.
04/06/2011 00:33:22 | SETI@home | Reporting 7 completed tasks, requesting new tasks for CPU and NVIDIA GPU
04/06/2011 00:33:22 | SETI@home | [sched_op] CPU work request: 142292.84 seconds; 0.00 CPUs
04/06/2011 00:33:22 | SETI@home | [sched_op] NVIDIA GPU work request: 18929.68 seconds; 0.00 GPUs
04/06/2011 00:35:22 | SETI@home | Scheduler request completed: got 48 new tasks
04/06/2011 00:35:22 | SETI@home | [sched_op] estimated total CPU task duration: 157672 seconds
04/06/2011 00:35:22 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 21611 seconds

So extactly two minutes, but no timeout. Of the 48 tasks, 39 were shorties.
ID: 1112840 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1112863 - Posted: 4 Jun 2011, 1:56:21 UTC - in response to Message 1112840.  


6 days down & now into the 7th day of full network load.
Usually the Results returned per hour are around 50,000 or so. Since the network storage problem outage it hasn't been below 60,000. For the last day it's been above 70,000 & it almost hit 80,000 at one stage.
The hardware is certainly taking a beating, and no end in sight. The last batch of work i got still contained a lot of shorties.
Grant
Darwin NT
ID: 1112863 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1112890 - Posted: 4 Jun 2011, 5:01:52 UTC - in response to Message 1112863.  


6 days down & now into the 7th day of full network load.
Usually the Results returned per hour are around 50,000 or so. Since the network storage problem outage it hasn't been below 60,000. For the last day it's been above 70,000 & it almost hit 80,000 at one stage.
The hardware is certainly taking a beating, and no end in sight. The last batch of work i got still contained a lot of shorties.


I wonder how much damage the project would suffer if the air conditioning went away under these circumstances?

Currently, "project backoff: 05:16:00"


ID: 1112890 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1112896 - Posted: 4 Jun 2011, 5:42:18 UTC - in response to Message 1112863.  


The hardware is certainly taking a beating, and no end in sight. The last batch of work i got still contained a lot of shorties.


Oooops.

Grant, you and I forgot that the hardware is only working about 1/10th as hard as it was probably designed-for.
ID: 1112896 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1112922 - Posted: 4 Jun 2011, 7:31:06 UTC - in response to Message 1112890.  

I wonder how much damage the project would suffer if the air conditioning went away under these circumstances?

No damage, but everything would shutdown.


Grant, you and I forgot that the hardware is only working about 1/10th as hard as it was probably designed-for.

Actually, due to the lack of the network storage, parts of it will be running pretty much at their limits.
Grant
Darwin NT
ID: 1112922 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1112925 - Posted: 4 Jun 2011, 7:47:40 UTC - in response to Message 1112819.  
Last modified: 4 Jun 2011, 7:50:37 UTC

I really like that duck!

Steve


& for those of you that remember Orville on BBC1:

"I 'aaate that duck" ;)!
ID: 1112925 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1112933 - Posted: 4 Jun 2011, 10:07:25 UTC - in response to Message 1112922.  
Last modified: 4 Jun 2011, 10:08:03 UTC


Actually, due to the lack of the network storage, parts of it will be running pretty much at their limits.

From my understanding an Overland tech logged into the storage server & got it operational again. (Thanks Overland) To my knowledge all data storage is online. If anythings hampering the server it'll be from tech news 1st June
There are some broken astropulse results clogging one of the validators (which is why it shows up on red on the status page). We'll have to figure out an automated way to detect these results and push them through (it's a real pain to do by hand). In the meantime, this is causing our workunit storage server to be quite full, and might hamper other workunit development sooner than later.

ID: 1112933 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1112954 - Posted: 4 Jun 2011, 11:50:30 UTC - in response to Message 1112933.  

From my understanding an Overland tech logged into the storage server & got it operational again. (Thanks Overland) To my knowledge all data storage is online.

The last i saw they had got it up & running again, but were in the process of determining what the actual cause of the fault/s was/were. There hasn't been any outage since then to transfer the data back to it, so it's still residing on Thumper.
Grant
Darwin NT
ID: 1112954 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1113120 - Posted: 4 Jun 2011, 17:48:12 UTC

The hardware is only working at about 1/10th of what it would be, because it is limited to 100mbit instead of gigabit.

Second, best I can tell, the overland storage IS functional and working.. but everything is still on thumper.

And the ridiculous amount of shorties probably coincides with the large number of the 100% blanked APs that I've gotten lately. Used to be there were only one or two of those per hundred, but now it seems there's 1-2 per 10, sometimes more than that (two days ago, I ran through eight in a row in my cache.. did some excellent stuff for the DCF..I ended up getting 10 more APs than I usually do). So there were a bunch of tapes recently that were essentially trash, or mostly trash.

I still think that after the software blanking runs through and does its thing, it should be able to just mark 100% blanked WUs and keep them from being sent out in the first place. MBs aren't so bad because they are small, but wasting 16MB on APs like that just doesn't make sense.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1113120 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1113129 - Posted: 4 Jun 2011, 18:27:14 UTC - in response to Message 1112840.  
Last modified: 4 Jun 2011, 18:34:04 UTC

I'm surprised about the following message of BOINC which come from time to time since ~ 00:42 UTC: 'Scheduler request failed: Timeout was reached' - after ~ 60 secs, IIRC normal this happened ~ 600 secs in past..

I tested this with my 420M laptop, also running v6.12.28:

04/06/2011 00:33:22 | SETI@home | Sending scheduler request: To fetch work.
04/06/2011 00:33:22 | SETI@home | Reporting 7 completed tasks, requesting new tasks for CPU and NVIDIA GPU
04/06/2011 00:33:22 | SETI@home | [sched_op] CPU work request: 142292.84 seconds; 0.00 CPUs
04/06/2011 00:33:22 | SETI@home | [sched_op] NVIDIA GPU work request: 18929.68 seconds; 0.00 GPUs
04/06/2011 00:35:22 | SETI@home | Scheduler request completed: got 48 new tasks
04/06/2011 00:35:22 | SETI@home | [sched_op] estimated total CPU task duration: 157672 seconds
04/06/2011 00:35:22 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 21611 seconds

So extactly two minutes, but no timeout. Of the 48 tasks, 39 were shorties.


This is the log of the time in question:
03-Jun-2011 02:41:05 [SETI@home] Reporting 2 completed tasks, requesting new tasks for CPU and NVIDIA GPU
03-Jun-2011 02:42:02 [SETI@home] Scheduler request failed: Timeout was reached


From ~ the time of your contact:
04-Jun-2011 00:25:50 [SETI@home] Reporting 5 completed tasks, requesting new tasks for CPU and NVIDIA GPU
04-Jun-2011 00:26:45 [SETI@home] Scheduler request failed: Timeout was reached

04-Jun-2011 00:42:00 [SETI@home] Reporting 7 completed tasks, requesting new tasks for CPU and NVIDIA GPU
04-Jun-2011 00:42:58 [SETI@home] Scheduler request completed: got 20 new tasks


And from a few minutes ago:
SETI@home 04.06.2011 20:00:43 Reporting 3 completed tasks, requesting new tasks for CPU and NVIDIA GPU
SETI@home 04.06.2011 20:01:43 Scheduler request failed: Timeout was reached

SETI@home 04.06.2011 20:19:11 Reporting 6 completed tasks, requesting new tasks for CPU and NVIDIA GPU
SETI@home 04.06.2011 20:20:03 Scheduler request completed: got 12 new tasks


All ~ 60 secs between request and answer.
I have only DSL light (384/64 kbit/s).
ID: 1113129 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1113263 - Posted: 5 Jun 2011, 0:37:06 UTC


Uh oh, uploads have stopped going through.
Grant
Darwin NT
ID: 1113263 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1113270 - Posted: 5 Jun 2011, 0:54:59 UTC - in response to Message 1113264.  

The upload server is just responding real slow right now, it took almost a minute before it showed the test page for me.

ID: 1113270 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1113272 - Posted: 5 Jun 2011, 0:59:02 UTC - in response to Message 1113263.  


Uh oh, uploads have stopped going through.

Panic over, must have been a minor glitch.
Grant
Darwin NT
ID: 1113272 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1113275 - Posted: 5 Jun 2011, 1:03:31 UTC - in response to Message 1113272.  


Uh oh, uploads have stopped going through.

Panic over, must have been a minor glitch.

Or maybe not. Another bunch have started to pile up. Couple of minutes down & none of them have even started to go through.
Grant
Darwin NT
ID: 1113275 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1113292 - Posted: 5 Jun 2011, 1:41:19 UTC

Uploads seem to be going away. Haven't had much success in the last hour or so. Cricket graph shows upload declining as well. Wasn't this related to a full storage situation in the past? With all these shorties coming back, that certainly would make sense.
ID: 1113292 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1113343 - Posted: 5 Jun 2011, 4:58:43 UTC
Last modified: 5 Jun 2011, 5:01:12 UTC

It looks like the UL server/service is completely down now.

Temporarily failed upload of xxxxxxxxxxxxxxxxxxx: HTTP error
Temporarily failed upload of xxxxxxxxxxxxxxxxxxx: connect() failed
and so on..

At least my BOINC can't UL.


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1113343 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19308
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1113347 - Posted: 5 Jun 2011, 5:11:32 UTC - in response to Message 1113343.  
Last modified: 5 Jun 2011, 5:12:27 UTC

Uploads are working, maybe a little slowly;

05/06/2011 06:05:46 SETI@home Started upload of 2ap11ac.7671.13564.7.10.33_0_0
05/06/2011 06:07:02 SETI@home Finished upload of 2ap11ac.7671.13564.7.10.33_0_0

Times BST i.e. UTC +1
ID: 1113347 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1113353 - Posted: 5 Jun 2011, 6:05:48 UTC - in response to Message 1113347.  
Last modified: 5 Jun 2011, 6:06:57 UTC

Uploads are working, maybe a little slowly;

Try extremely slowly. Anything from 1-12 attempts before it goes through, and at about 2kB/s if i'm lucky.
Network traffic & Scarecrow's Graphs show a significant drop off in the amount of work being returned. Something's broken, or seriously clogged.


EDIT- and even the forums are behaving sluggishly now & then.
Grant
Darwin NT
ID: 1113353 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (47) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.