Panic Mode On (80) Server Problems?

Message boards : Number crunching : Panic Mode On (80) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 25 · Next

AuthorMessage
Team kizb

Send message
Joined: 8 Mar 01
Posts: 219
Credit: 3,709,162
RAC: 0
Germany
Message 1321988 - Posted: 30 Dec 2012, 8:16:34 UTC

Things seem to be working better now, I woke up this morning and all my uploads had finally completed and I had 102 to crunch.
My Computers:
â–ˆ Blue Offline
â–ˆ Green Offline
â–ˆ Red Offline
ID: 1321988 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1322031 - Posted: 30 Dec 2012, 9:57:20 UTC
Last modified: 30 Dec 2012, 10:40:45 UTC

Maybe the maxed out internet connection of SAH is because of the stock AP GPU app for NV and ATI?

Now all GPUs out there can crunch AP WUs.

The app work on all systems correct?

Maybe the AP WUs fail (or the results are not equal with the wingman's results) and need to send to an other PC.

If this happen not only one time .. - you can imagine how the internet connection is maxed out, because of send again and again the same AP WUs to different PCs?

Just an idea.

(8 MB/AP WU)


[EDIT: 27 % of the AP WUs in my BOINC are > x_1 (x_2 & x_3).]


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
ID: 1322031 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1322118 - Posted: 30 Dec 2012, 11:13:10 UTC - in response to Message 1322031.  
Last modified: 30 Dec 2012, 11:17:16 UTC

ID: 1322118 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1322148 - Posted: 30 Dec 2012, 12:51:09 UTC - in response to Message 1321988.  

Things seem to be working better now, I woke up this morning and all my uploads had finally completed and I had 102 to crunch.

Just as I was about to go to bed, I noticed this new error;
12/29/2012 10:59:52 PM | SETI@home | Computation for task 08oc12ab.18183.8656.7.10.195_1 finished
12/29/2012 10:59:52 PM | SETI@home | Starting task 08oc12ab.18183.8656.7.10.76_0 using setiathome_enhanced version 609 (cuda23) in slot 3
12/29/2012 10:59:54 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:00:10 PM |  | Project communication failed: attempting access to reference site
12/29/2012 11:00:10 PM | SETI@home | Temporarily failed upload of 08oc12ab.18183.8656.7.10.195_1_0: can't resolve hostname
12/29/2012 11:00:10 PM | SETI@home | Backing off 3 min 22 sec on upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:00:11 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 11:03:04 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:03:20 PM |  | Project communication failed: attempting access to reference site
12/29/2012 11:03:20 PM | SETI@home | Temporarily failed upload of 08oc12ab.18183.8656.7.10.195_1_0: can't resolve hostname
12/29/2012 11:03:20 PM | SETI@home | Backing off 4 min 19 sec on upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:03:21 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 11:03:29 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:03:45 PM |  | Project communication failed: attempting access to reference site
12/29/2012 11:03:45 PM | SETI@home | Temporarily failed upload of 08oc12ab.18183.8656.7.10.195_1_0: can't resolve hostname
12/29/2012 11:03:45 PM | SETI@home | Backing off 13 min 17 sec on upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:03:46 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 11:06:44 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:07:15 PM |  | Project communication failed: attempting access to reference site
12/29/2012 11:07:15 PM | SETI@home | Temporarily failed upload of 08oc12ab.18183.8656.7.10.195_1_0: can't resolve hostname
12/29/2012 11:07:15 PM | SETI@home | Backing off 16 min 32 sec on upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:07:16 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 11:09:33 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:10:21 PM | SETI@home | Finished upload of 08oc12ab.18183.8656.7.10.195_1_0
12/29/2012 11:10:21 PM | SETI@home | Sending scheduler request: To fetch work.
12/29/2012 11:10:21 PM | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and NVIDIA and ATI
12/29/2012 11:10:23 PM | SETI@home | Computation for task ap_26no12ad_B1_P0_00062_20121227_08468.wu_0 finished
12/29/2012 11:10:23 PM | SETI@home | Starting task ap_25no12ad_B6_P1_00044_20121227_21707.wu_0 using astropulse_v6 version 604 (ati_opencl_100) in slot 2
12/29/2012 11:10:25 PM | SETI@home | Started upload of ap_26no12ad_B1_P0_00062_20121227_08468.wu_0_0
12/29/2012 11:11:02 PM | SETI@home | Finished upload of ap_26no12ad_B1_P0_00062_20121227_08468.wu_0_0
12/29/2012 11:11:15 PM | SETI@home | Scheduler request completed: got 1 new tasks
.....

Since then, all the Uploads have completed in less than around 30 seconds. It's almost as if someone gave Bruno the reboot. So far all is well.
ID: 1322148 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1322403 - Posted: 30 Dec 2012, 21:29:40 UTC - in response to Message 1322148.  
Last modified: 30 Dec 2012, 21:29:58 UTC

Still getting lots of Scheduler timeouts & the occasional no header or data, but not nearly as many as before.
The upload problem appears to be no more- uploads start within a couple of seconds & are at 10-15kB/s.
Grant
Darwin NT
ID: 1322403 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1322504 - Posted: 31 Dec 2012, 3:26:00 UTC
Last modified: 31 Dec 2012, 3:27:44 UTC

my 2 cents ...

It looked like a "shortie" storm to me ... all my local computers were running SHORT WU's

One of my computers had 100 WU's all running in 4 Min estimate time down from the usual 24 Min.

That would be a .... 6x i/o load increase??

Maybe a bad tape or 2 or 3 ...

Ed F
ID: 1322504 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1322506 - Posted: 31 Dec 2012, 3:43:00 UTC - in response to Message 1322504.  

It looked like a "shortie" storm to me ... all my local computers were running SHORT WU's

That just exacerbated the problems that already existed before the shorties started coming through.

Grant
Darwin NT
ID: 1322506 · Report as offensive
Rolf

Send message
Joined: 16 Jun 09
Posts: 114
Credit: 7,817,146
RAC: 0
Switzerland
Message 1322549 - Posted: 31 Dec 2012, 9:38:43 UTC

Very good news!
Everything works as it "should":
- Uploads without timeouts
- Downloads as requested
31.12.2012 10:32:19 | SETI@home | Sending scheduler request: To fetch work.
31.12.2012 10:32:19 | SETI@home | Requesting new tasks for ATI
31.12.2012 10:34:10 | SETI@home | Scheduler request completed: got 20 new tasks
31.12.2012 10:34:12 | SETI@home | Started download of 09oc12ad.8746.67.12.10.98

Great last day of this year. Let the next year start like this!
ID: 1322549 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1322644 - Posted: 31 Dec 2012, 13:26:47 UTC - in response to Message 1322549.  

Great last day of this year. Let the next year start like this!

You are dreaming, next year starts like this :( NEWS
at least the servers will not be sending us any weird and uninteligable messages ......
ID: 1322644 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1322658 - Posted: 31 Dec 2012, 14:16:47 UTC - in response to Message 1322644.  

...next year starts like this :( NEWS


If next year started any differently, I'd think I had entered the Twilight Zone!

Situation Normal;)

Happy New Year everybody!
ID: 1322658 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1322671 - Posted: 31 Dec 2012, 15:03:14 UTC
Last modified: 31 Dec 2012, 15:14:42 UTC

Even though the Upload Stalls appear to have been corrected, transfer stalls are still a pain. This morning I woke up to a page of 'Ready to Reports'. I had six stalled downloads with a 32 minute wait time and 22 files waiting to be reported and replaced. The machine had gone through over 20% of it's GPU cache in a few hours waiting on stalled downloads. I don't think there should be a transfer Timeout of over around 10 minutes. After 10 minutes the stalled activity starts becoming a problem. Maybe a rework of the transfer timeouts is in order? I kinda like timeouts of 2, 4, 6, and 10, with 10 minutes being the maximum timeout period.

Fortunately, everything corrected itself rather quickly after the Retry button was used a few times...
ID: 1322671 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 1322721 - Posted: 31 Dec 2012, 17:16:02 UTC - in response to Message 1322693.  

One of the troublesome things for me with this project is that when the scheduler gets sick (which it does periodically) one of its 'sick modes' can be obstructive of other projects in terms of reporting, updating, uploading, etc. That is, the scheduler sometimes in its failure mode holds the workstation in 'reporting mode' exclusively (no other project can communicate with the workstation) for as much as 10 minutes. Ideally when the scheduler is in 'I'm confused' mode, it would simply issue a quick time out (say at 1 minute or 2 minutes) instead of putting things on hold for 10 minutes.

There have been times when (if I have no SETI work on a workstation to work on or report), that after several minutes, I simply detach so other projects can report without SETI in obstructive mode. When I do that, eventually I'll rejoin that workstation to SETI. However, 'eventually' is defined as a solid week for SETI -- and that often doesn't happen. Looks like it might not happen at all this coming month with the nth effort to correct lab electrical issues, plus the nth effort to correct the air conditioning in the server closet.
ID: 1322721 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1322742 - Posted: 31 Dec 2012, 17:51:59 UTC - in response to Message 1322721.  

One of the troublesome things for me with this project is that when the scheduler gets sick (which it does periodically) one of its 'sick modes' can be obstructive of other projects in terms of reporting, updating, uploading, etc. That is, the scheduler sometimes in its failure mode holds the workstation in 'reporting mode' exclusively (no other project can communicate with the workstation) for as much as 10 minutes. Ideally when the scheduler is in 'I'm confused' mode, it would simply issue a quick time out (say at 1 minute or 2 minutes) instead of putting things on hold for 10 minutes.

Actually, the server doesn't hold on to anything - it simply doesn't send a reply at all. The timeout is how long your client is prepared to wait - and (in recent versions), it's configurable.

If you're running v6.12.27 or later, check out <http_transfer_timeout> in client configuration - options.

Note that this will affect uploads/downloads as well, and that sometimes both scheduler contacts and data transfers do eventually work after a long pause. Use at your own discretion.
ID: 1322742 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1322974 - Posted: 1 Jan 2013, 3:13:54 UTC
Last modified: 1 Jan 2013, 3:18:44 UTC

are we up??

Ed F

[edit] I guess the post got here ok but the graph looks like we are down

Ed F

12/31/2012 10:17:05 PM | SETI@home | Reporting 2 completed tasks, requesting new tasks for NVIDIA GPU
12/31/2012 10:17:11 PM | | Project communication failed: attempting access to reference site
12/31/2012 10:17:11 PM | SETI@home | Scheduler request failed: Failure when receiving data from the peer
12/31/2012 10:17:12 PM | | Internet access OK - project servers may be temporarily down.
ID: 1322974 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1323003 - Posted: 1 Jan 2013, 4:01:07 UTC

Its broken,
Yup the cricket graph has run out of green ink.
the cricket is dead.
Is that the servers way of saying `happy new year` #'"^&:(*&!.......
ID: 1323003 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 18996
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1323023 - Posted: 1 Jan 2013, 4:31:39 UTC - in response to Message 1323003.  

Its broken,
Yup the cricket graph has run out of green ink.
the cricket is dead.
Is that the servers way of saying `happy new year` #'"^&:(*&!.......

But the servers stopped speaking at 02:50, couldn't last the course at the New Years Party.
ID: 1323023 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1323026 - Posted: 1 Jan 2013, 4:39:23 UTC

The SSP still looks good so eye dunO whats up with it all.
Its the milenium bug just a bit late
or the unix bug or excel
or
ID: 1323026 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1323033 - Posted: 1 Jan 2013, 4:56:37 UTC - in response to Message 1323026.  

The SSP still looks good so eye dunO whats up with it all.
Its the milenium bug just a bit late
or the unix bug or excel
or

The SSP hasn't updated since 2:50:23 UTC, subtracting the 8 hour Berkeley offset gives 18:50:23 PST. That matches the time when Cricket fell quite closely.

Maybe it's the end of the world a little late?
                                                                    Joe
ID: 1323033 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1323041 - Posted: 1 Jan 2013, 5:16:42 UTC
Last modified: 1 Jan 2013, 5:17:01 UTC

The repairs were scheduled for the 4th so of course the plug is pulled now.

:D
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1323041 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1323057 - Posted: 1 Jan 2013, 6:04:15 UTC

Dive, Dive, Dive!
ID: 1323057 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.