Message boards :
Number crunching :
Panic Mode On (80) Server Problems?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 25 · Next
Author | Message |
---|---|
Team kizb Send message Joined: 8 Mar 01 Posts: 219 Credit: 3,709,162 RAC: 0 |
|
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Maybe the maxed out internet connection of SAH is because of the stock AP GPU app for NV and ATI? Now all GPUs out there can crunch AP WUs. The app work on all systems correct? Maybe the AP WUs fail (or the results are not equal with the wingman's results) and need to send to an other PC. If this happen not only one time .. - you can imagine how the internet connection is maxed out, because of send again and again the same AP WUs to different PCs? Just an idea. (8 MB/AP WU) [EDIT: 27 % of the AP WUs in my BOINC are > x_1 (x_2 & x_3).] * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Sutaru Tsureku wrote: (...) OK, I looked to all x_2 and x_3 AP WUs and found following wingmen which PCs make very much or only errors with the stock AP GPU app: http://setiathome.berkeley.edu/show_host_detail.php?hostid=5304693 http://setiathome.berkeley.edu/show_host_detail.php?hostid=5369208 http://setiathome.berkeley.edu/show_host_detail.php?hostid=5465293 http://setiathome.berkeley.edu/show_host_detail.php?hostid=5810180 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6028483 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6201705 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6247733 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6616302 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6704517 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6757607 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6795698 Why all this PCs make errors with the stock AP GPU app? If there are much more PCs out there like the above examples - no wonder that the SAH internet connection is continuously maxed out .. Two with a wrong CPU app? http://setiathome.berkeley.edu/show_host_detail.php?hostid=2750818 http://setiathome.berkeley.edu/show_host_detail.php?hostid=6708061 * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Things seem to be working better now, I woke up this morning and all my uploads had finally completed and I had 102 to crunch. Just as I was about to go to bed, I noticed this new error; 12/29/2012 10:59:52 PM | SETI@home | Computation for task 08oc12ab.18183.8656.7.10.195_1 finished 12/29/2012 10:59:52 PM | SETI@home | Starting task 08oc12ab.18183.8656.7.10.76_0 using setiathome_enhanced version 609 (cuda23) in slot 3 12/29/2012 10:59:54 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:00:10 PM | | Project communication failed: attempting access to reference site 12/29/2012 11:00:10 PM | SETI@home | Temporarily failed upload of 08oc12ab.18183.8656.7.10.195_1_0: can't resolve hostname 12/29/2012 11:00:10 PM | SETI@home | Backing off 3 min 22 sec on upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:00:11 PM | | Internet access OK - project servers may be temporarily down. 12/29/2012 11:03:04 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:03:20 PM | | Project communication failed: attempting access to reference site 12/29/2012 11:03:20 PM | SETI@home | Temporarily failed upload of 08oc12ab.18183.8656.7.10.195_1_0: can't resolve hostname 12/29/2012 11:03:20 PM | SETI@home | Backing off 4 min 19 sec on upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:03:21 PM | | Internet access OK - project servers may be temporarily down. 12/29/2012 11:03:29 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:03:45 PM | | Project communication failed: attempting access to reference site 12/29/2012 11:03:45 PM | SETI@home | Temporarily failed upload of 08oc12ab.18183.8656.7.10.195_1_0: can't resolve hostname 12/29/2012 11:03:45 PM | SETI@home | Backing off 13 min 17 sec on upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:03:46 PM | | Internet access OK - project servers may be temporarily down. 12/29/2012 11:06:44 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:07:15 PM | | Project communication failed: attempting access to reference site 12/29/2012 11:07:15 PM | SETI@home | Temporarily failed upload of 08oc12ab.18183.8656.7.10.195_1_0: can't resolve hostname 12/29/2012 11:07:15 PM | SETI@home | Backing off 16 min 32 sec on upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:07:16 PM | | Internet access OK - project servers may be temporarily down. 12/29/2012 11:09:33 PM | SETI@home | Started upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:10:21 PM | SETI@home | Finished upload of 08oc12ab.18183.8656.7.10.195_1_0 12/29/2012 11:10:21 PM | SETI@home | Sending scheduler request: To fetch work. 12/29/2012 11:10:21 PM | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and NVIDIA and ATI 12/29/2012 11:10:23 PM | SETI@home | Computation for task ap_26no12ad_B1_P0_00062_20121227_08468.wu_0 finished 12/29/2012 11:10:23 PM | SETI@home | Starting task ap_25no12ad_B6_P1_00044_20121227_21707.wu_0 using astropulse_v6 version 604 (ati_opencl_100) in slot 2 12/29/2012 11:10:25 PM | SETI@home | Started upload of ap_26no12ad_B1_P0_00062_20121227_08468.wu_0_0 12/29/2012 11:11:02 PM | SETI@home | Finished upload of ap_26no12ad_B1_P0_00062_20121227_08468.wu_0_0 12/29/2012 11:11:15 PM | SETI@home | Scheduler request completed: got 1 new tasks ..... Since then, all the Uploads have completed in less than around 30 seconds. It's almost as if someone gave Bruno the reboot. So far all is well. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13715 Credit: 208,696,464 RAC: 304 |
Still getting lots of Scheduler timeouts & the occasional no header or data, but not nearly as many as before. The upload problem appears to be no more- uploads start within a couple of seconds & are at 10-15kB/s. Grant Darwin NT |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
my 2 cents ... It looked like a "shortie" storm to me ... all my local computers were running SHORT WU's One of my computers had 100 WU's all running in 4 Min estimate time down from the usual 24 Min. That would be a .... 6x i/o load increase?? Maybe a bad tape or 2 or 3 ... Ed F |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13715 Credit: 208,696,464 RAC: 304 |
It looked like a "shortie" storm to me ... all my local computers were running SHORT WU's That just exacerbated the problems that already existed before the shorties started coming through. Grant Darwin NT |
Rolf Send message Joined: 16 Jun 09 Posts: 114 Credit: 7,817,146 RAC: 0 |
Very good news! Everything works as it "should": - Uploads without timeouts - Downloads as requested 31.12.2012 10:32:19 | SETI@home | Sending scheduler request: To fetch work. 31.12.2012 10:32:19 | SETI@home | Requesting new tasks for ATI 31.12.2012 10:34:10 | SETI@home | Scheduler request completed: got 20 new tasks 31.12.2012 10:34:12 | SETI@home | Started download of 09oc12ad.8746.67.12.10.98 Great last day of this year. Let the next year start like this! |
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69 |
Great last day of this year. Let the next year start like this! You are dreaming, next year starts like this :( NEWS at least the servers will not be sending us any weird and uninteligable messages ...... |
shizaru Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0 |
...next year starts like this :( NEWS If next year started any differently, I'd think I had entered the Twilight Zone! Situation Normal;) Happy New Year everybody! |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Even though the Upload Stalls appear to have been corrected, transfer stalls are still a pain. This morning I woke up to a page of 'Ready to Reports'. I had six stalled downloads with a 32 minute wait time and 22 files waiting to be reported and replaced. The machine had gone through over 20% of it's GPU cache in a few hours waiting on stalled downloads. I don't think there should be a transfer Timeout of over around 10 minutes. After 10 minutes the stalled activity starts becoming a problem. Maybe a rework of the transfer timeouts is in order? I kinda like timeouts of 2, 4, 6, and 10, with 10 minutes being the maximum timeout period. Fortunately, everything corrected itself rather quickly after the Retry button was used a few times... |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
One of the troublesome things for me with this project is that when the scheduler gets sick (which it does periodically) one of its 'sick modes' can be obstructive of other projects in terms of reporting, updating, uploading, etc. That is, the scheduler sometimes in its failure mode holds the workstation in 'reporting mode' exclusively (no other project can communicate with the workstation) for as much as 10 minutes. Ideally when the scheduler is in 'I'm confused' mode, it would simply issue a quick time out (say at 1 minute or 2 minutes) instead of putting things on hold for 10 minutes. There have been times when (if I have no SETI work on a workstation to work on or report), that after several minutes, I simply detach so other projects can report without SETI in obstructive mode. When I do that, eventually I'll rejoin that workstation to SETI. However, 'eventually' is defined as a solid week for SETI -- and that often doesn't happen. Looks like it might not happen at all this coming month with the nth effort to correct lab electrical issues, plus the nth effort to correct the air conditioning in the server closet. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14645 Credit: 200,643,578 RAC: 874 |
One of the troublesome things for me with this project is that when the scheduler gets sick (which it does periodically) one of its 'sick modes' can be obstructive of other projects in terms of reporting, updating, uploading, etc. That is, the scheduler sometimes in its failure mode holds the workstation in 'reporting mode' exclusively (no other project can communicate with the workstation) for as much as 10 minutes. Ideally when the scheduler is in 'I'm confused' mode, it would simply issue a quick time out (say at 1 minute or 2 minutes) instead of putting things on hold for 10 minutes. Actually, the server doesn't hold on to anything - it simply doesn't send a reply at all. The timeout is how long your client is prepared to wait - and (in recent versions), it's configurable. If you're running v6.12.27 or later, check out <http_transfer_timeout> in client configuration - options. Note that this will affect uploads/downloads as well, and that sometimes both scheduler contacts and data transfers do eventually work after a long pause. Use at your own discretion. |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
are we up?? Ed F [edit] I guess the post got here ok but the graph looks like we are down Ed F 12/31/2012 10:17:05 PM | SETI@home | Reporting 2 completed tasks, requesting new tasks for NVIDIA GPU |
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69 |
Its broken, Yup the cricket graph has run out of green ink. the cricket is dead. Is that the servers way of saying `happy new year` #'"^&:(*&!....... |
W-K 666 Send message Joined: 18 May 99 Posts: 18996 Credit: 40,757,560 RAC: 67 |
Its broken, But the servers stopped speaking at 02:50, couldn't last the course at the New Years Party. |
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69 |
The SSP still looks good so eye dunO whats up with it all. Its the milenium bug just a bit late or the unix bug or excel or |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
The SSP still looks good so eye dunO whats up with it all. The SSP hasn't updated since 2:50:23 UTC, subtracting the 8 hour Berkeley offset gives 18:50:23 PST. That matches the time when Cricket fell quite closely. Maybe it's the end of the world a little late? Joe |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
The repairs were scheduled for the 4th so of course the plug is pulled now. :D "Life is just nature's way of keeping meat fresh." - The Doctor |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Dive, Dive, Dive! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.