Message boards :
Number crunching :
Panic Mode On (79) Server Problems?
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 22 · Next
Author | Message |
---|---|
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
So, are these files transferred for each request? And are they being transferred as-is or with some compression? Any change to compress sched_request files would have to be done in the client, so considerable delay. In addition, the upload side of the pipe isn't congested, though having fewer packets to ack through the download side could improve the reliability. OTOH, sending sched_reply with gzip or deflate compression by using mod_gzip or mod_deflate on the scheduling server might be a very minimal change. Clients already advertise they can accept that: 20/11/2012 00:05:54 | SETI@home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip That's from an earlier post in this thread by Richard Haselgrove, the BOINC 6.10.58 I'm using also sends that header when contacting the Scheduler. An interesting Taka's RunLoop blog post gives some idea of the time taken to compress a file versus how long it takes to transfer it on a 100Mbps link. I think Synergy would take less time for compression than the Atom being used as a server there :^) Joe |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
If anyone wishes to use it you can find it here: Every user may have different format for Date (I needed to change that part, my Date format is: 27.11.2012) (obviously this .bat will not work very useful in the start of a month (it can't go back to the previous month) so use it now, it will be too late in December ;) ) Since I changed too many places here is my edited .bat @ECHO OFF cls pushd %~dp0 set check_days=20 @ECHO %date% ::: --- Change the following lines to match your DATE format: set cdy=%date:~0,2% set cmo=%date:~3,2% set cyr=%date:~6,4% set edychk1=- set edychk2=0 set Start_date=%cyr%-%cmo%-%cdy% @ECHO Start date: %Start_date% set Out_file=$com_check_full__%computername%__%Start_date%.txt @ECHO Out file: %Out_file% @ECHO. pause REM.> %Out_file% findstr /C:"[SETI@home] Scheduler request failed: " stdoutdae.txt >$sched_failures-%computername%.txt findstr /C:"[SETI@home] Scheduler request failed: Timeout was reached" stdoutdae.txt >$sched_failures_timeout-%computername%.txt findstr /C:"[SETI@home] Scheduler request completed: " stdoutdae.txt >$sched_successes-%computername%.txt :start set /a lct=%lct%+1 findstr /C:"%cdy%-" $sched_failures-%computername%.txt >sched_failures-%computername%_%cyr%-%cmo%-%cdy%.txt findstr /C:"%cdy%-" $sched_failures_timeout-%computername%.txt >sched_failures_timeout-%computername%_%cyr%-%cmo%-%cdy%.txt findstr /C:"%cdy%-" $sched_successes-%computername%.txt >sched_successes-%computername%_%cyr%-%cmo%-%cdy%.txt REM ---start output senction for /f %%a in ('Find /V /C "" ^< sched_failures-%computername%_%cyr%-%cmo%-%cdy%.txt') do set fails=%%a for /f %%a in ('Find /V /C "" ^< sched_failures_timeout-%computername%_%cyr%-%cmo%-%cdy%.txt') do set time_fails=%%a for /f %%a in ('Find /V /C "" ^< sched_successes-%computername%_%cyr%-%cmo%-%cdy%.txt') do set success=%%a @ECHO Results for calendar date: %cyr%-%cmo%-%cdy% @ECHO Results for calendar date: %cyr%-%cmo%-%cdy% >> %Out_file% set /a schdreqcnt=%fails%+%success% @ECHO Scheduler Requests: %schdreqcnt% @ECHO Scheduler Requests: %schdreqcnt% >> %Out_file% set /a schdsuccesspct=%success%*100/%schdreqcnt% @ECHO Scheduler Success: %schdsuccesspct% %% @ECHO Scheduler Success: %schdsuccesspct% %% >> %Out_file% set /a schdfailspct=%fails%*100/%schdreqcnt% @ECHO Scheduler Failure: %schdfailspct% %% @ECHO Scheduler Failure: %schdfailspct% %% >> %Out_file% set /a schdtofailspct=%time_fails%*100/%schdreqcnt% @ECHO Scheduler Timeout: %schdtofailspct% %% of total @ECHO Scheduler Timeout: %schdtofailspct% %% of total >> %Out_file% set /a schdtmfailspct=%time_fails%*100/%fails% @ECHO Scheduler Timeout: %schdtmfailspct% %% of failures @ECHO Scheduler Timeout: %schdtmfailspct% %% of failures >> %Out_file% REM ---end output senction set /a cdy=%cdy%-1 if %cdy:~0,1%==%edychk1% @ECHO Date makes no sense. I Can't math! & goto end if %cdy:~0,1%==%edychk2% @ECHO Date makes no sense. I Can't math! & goto end if %lct%==%check_days% goto end goto start :end @ECHO. @ECHO End date: %cyr%-%cmo%-%cdy% @ECHO. @ECHO *** All the above text you can find now in file: @ECHO %Out_file% pause del sched_*-%computername%_*.txt %Out_file% :pause And my Results for 20 days: Results for calendar date: 2012-11-27 Scheduler Requests: 52 Scheduler Success: 67 % Scheduler Failure: 32 % Scheduler Timeout: 0 % of total Scheduler Timeout: 0 % of failures Results for calendar date: 2012-11-26 Scheduler Requests: 193 Scheduler Success: 25 % Scheduler Failure: 74 % Scheduler Timeout: 1 % of total Scheduler Timeout: 1 % of failures Results for calendar date: 2012-11-25 Scheduler Requests: 117 Scheduler Success: 17 % Scheduler Failure: 82 % Scheduler Timeout: 1 % of total Scheduler Timeout: 2 % of failures Results for calendar date: 2012-11-24 Scheduler Requests: 90 Scheduler Success: 0 % Scheduler Failure: 100 % Scheduler Timeout: 1 % of total Scheduler Timeout: 1 % of failures Results for calendar date: 2012-11-23 Scheduler Requests: 78 Scheduler Success: 44 % Scheduler Failure: 55 % Scheduler Timeout: 0 % of total Scheduler Timeout: 0 % of failures Results for calendar date: 2012-11-22 Scheduler Requests: 103 Scheduler Success: 80 % Scheduler Failure: 19 % Scheduler Timeout: 0 % of total Scheduler Timeout: 0 % of failures Results for calendar date: 2012-11-21 Scheduler Requests: 134 Scheduler Success: 45 % Scheduler Failure: 54 % Scheduler Timeout: 13 % of total Scheduler Timeout: 24 % of failures Results for calendar date: 2012-11-20 Scheduler Requests: 185 Scheduler Success: 72 % Scheduler Failure: 27 % Scheduler Timeout: 4 % of total Scheduler Timeout: 18 % of failures Results for calendar date: 2012-11-19 Scheduler Requests: 29 Scheduler Success: 82 % Scheduler Failure: 17 % Scheduler Timeout: 6 % of total Scheduler Timeout: 40 % of failures Results for calendar date: 2012-11-18 Scheduler Requests: 72 Scheduler Success: 61 % Scheduler Failure: 38 % Scheduler Timeout: 38 % of total Scheduler Timeout: 100 % of failures Results for calendar date: 2012-11-17 Scheduler Requests: 25 Scheduler Success: 48 % Scheduler Failure: 52 % Scheduler Timeout: 48 % of total Scheduler Timeout: 92 % of failures Results for calendar date: 2012-11-16 Scheduler Requests: 38 Scheduler Success: 65 % Scheduler Failure: 34 % Scheduler Timeout: 34 % of total Scheduler Timeout: 100 % of failures Results for calendar date: 2012-11-15 Scheduler Requests: 39 Scheduler Success: 61 % Scheduler Failure: 38 % Scheduler Timeout: 33 % of total Scheduler Timeout: 86 % of failures Results for calendar date: 2012-11-14 Scheduler Requests: 56 Scheduler Success: 82 % Scheduler Failure: 17 % Scheduler Timeout: 17 % of total Scheduler Timeout: 100 % of failures Results for calendar date: 2012-11-13 Scheduler Requests: 53 Scheduler Success: 77 % Scheduler Failure: 22 % Scheduler Timeout: 9 % of total Scheduler Timeout: 41 % of failures Results for calendar date: 2012-11-12 Scheduler Requests: 217 Scheduler Success: 92 % Scheduler Failure: 7 % Scheduler Timeout: 5 % of total Scheduler Timeout: 75 % of failures Results for calendar date: 2012-11-11 Scheduler Requests: 35 Scheduler Success: 54 % Scheduler Failure: 45 % Scheduler Timeout: 8 % of total Scheduler Timeout: 18 % of failures Results for calendar date: 2012-11-10 Scheduler Requests: 19 Scheduler Success: 89 % Scheduler Failure: 10 % Scheduler Timeout: 10 % of total Scheduler Timeout: 100 % of failures Results for calendar date: 2012-11-9 Scheduler Requests: 41 Scheduler Success: 70 % Scheduler Failure: 29 % Scheduler Timeout: 19 % of total Scheduler Timeout: 66 % of failures Results for calendar date: 2012-11-8 Scheduler Requests: 87 Scheduler Success: 59 % Scheduler Failure: 40 % Scheduler Timeout: 37 % of total Scheduler Timeout: 94 % of failures  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Most of the time i'm now getting a response from the Scheduler within 7 seconds, sometimes within 3. However those 3 second responses are to tell me there is no work available. And there are still the odd Scheduler "Couldn't connect to server" errors. The server status page hasn't updated many of the numbers in 13 hours. The one it does appear to be updating is the rate of MB splitting. So close to 0 we might as well call it that. Grant Darwin NT |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
Okay, Tuesday maintenance has been done, Server Status shows almost all up and running. Let's see what happens in coming week. Hmmm... Server Status shows that there are no AP splitters on Synergy. Vader, lando, marvin and georgem are now splitters. "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Server staus page shows one of the MB splitters isn't running, however result creation rate is still only slightly more than 0. So something's wrong with those that are actually running. Scheduler requests no longer result in an error, they just result in me being told there is no work available. Grant Darwin NT |
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
I guess shoving that wad of $20.00s under the lab door didn't hurt. :) 11/28/2012 12:15:13 AM | SETI@home | Sending scheduler request: To fetch work. 11/28/2012 12:15:13 AM | SETI@home | Reporting 2 completed tasks 11/28/2012 12:15:13 AM | SETI@home | Requesting new tasks for NVIDIA 11/28/2012 12:15:15 AM | SETI@home | Scheduler request completed: got 6 new tasks 11/28/2012 12:15:18 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.87 11/28/2012 12:15:18 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.93 11/28/2012 12:15:20 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.87 11/28/2012 12:15:20 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.93 11/28/2012 12:15:20 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.99 11/28/2012 12:15:20 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.105 11/28/2012 12:15:21 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.99 11/28/2012 12:15:21 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.105 11/28/2012 12:15:21 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.111 11/28/2012 12:15:21 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.117 11/28/2012 12:15:23 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.111 11/28/2012 12:15:23 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.117 11/28/2012 12:20:22 AM | SETI@home | Sending scheduler request: To fetch work. 11/28/2012 12:20:22 AM | SETI@home | Requesting new tasks for CPU and NVIDIA 11/28/2012 12:20:24 AM | SETI@home | Scheduler request completed: got 13 new tasks 11/28/2012 12:20:26 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.182 11/28/2012 12:20:26 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.107 11/28/2012 12:20:29 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.182 11/28/2012 12:20:29 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.107 11/28/2012 12:20:29 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.1 11/28/2012 12:20:29 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.249 11/28/2012 12:20:33 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.1 11/28/2012 12:20:33 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.249 11/28/2012 12:20:33 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.113 11/28/2012 12:20:33 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.79 11/28/2012 12:20:35 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.113 11/28/2012 12:20:35 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.221 11/28/2012 12:20:36 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.79 11/28/2012 12:20:36 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.227 11/28/2012 12:20:39 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.221 11/28/2012 12:20:39 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.233 11/28/2012 12:20:40 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.227 11/28/2012 12:20:40 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.239 11/28/2012 12:20:42 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.233 11/28/2012 12:20:42 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.239 11/28/2012 12:20:42 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.245 11/28/2012 12:20:42 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.119 11/28/2012 12:20:45 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.119 11/28/2012 12:20:45 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.251 11/28/2012 12:20:47 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.245 11/28/2012 12:20:48 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.251 |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
MB splitting has increased overnight, and my systems managed to reach the limits (as low as they are), but still not enough is being produced to build up any sort of Ready to Send buffer, and that's with longer running WUs going through the system. If another bunch of shorties come through, people will run out of work again even if the the rest of the system doesn't fall over. Grant Darwin NT |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
It seems that Oscar is not splitting MB units now, and several channels are with errors. Positive side: Cricket is not maxed up and Synergy responds really fast to scheduling requests :) "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Haven't had any Scheduler Timeouts (that i can see) for a couple of days now. Still getting patches of "Couldn't connect to server" and even one "Failure when receiving data from the peer". But so far sinc the weekly outage there's only been a few of thm, then they clear up for a while- long enough to get what little work is possible. Grant Darwin NT |
Sirius B Send message Joined: 26 Dec 00 Posts: 24879 Credit: 3,081,182 RAC: 7 |
& we're back! Thanks guys. |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
Yep, thanks guys, at least forums are up. "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Well this is exciting. I noticed the crickets were chirping again and allowed network comms again. Two failures were then followed by: 2012-12-05 13:54:40 SETI@home Reporting 38 completed tasks, requesting new tasks 2012-12-05 13:54:40 SETI@home [sched_op_debug] CPU work request: 2512099.31 seconds; 0.00 CPUs 2012-12-05 13:55:37 SETI@home Scheduler request completed: got 4 new tasks 2012-12-05 13:55:37 SETI@home [sched_op_debug] estimated total CPU job duration: 165149 seconds Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30650 Credit: 53,134,872 RAC: 32 |
Beta is down hard 12/5/2012 11:04:01 AM | SETI@home Beta Test | [error] No scheduler URLs found in master file |
S@NL Etienne Dokkum Send message Joined: 11 Jun 99 Posts: 212 Credit: 43,822,095 RAC: 0 |
Beta is down hard yep, noticed that too... well at least I had 400+ beta tasks to keep the main cruncher happy during this outage. As far as luck goes : I would run out of beta tasks by tomorrow morning. Don't bother abusing the retry button as that won't help Eric glueing the database together again ;-) |
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
Well this is exciting. I noticed the crickets were chirping again and allowed network comms again. Two failures were then followed by: Looks like my home machine managed to get allocated 97 GPU WUs -- but the response never got through so the ghost population just increased. Continue to get "Project communication failed": |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Glad we are back up, but all I'm getting from the Berkeley Boyz now is Scheduler Request Failed: Error 403 Any idea what that means? |
shizaru Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0 |
Looks like my home machine managed to get allocated 97 GPU WUs -- but the response never got through so the ghost population just increased. Continue to get "Project communication failed": Good catch there Ivan. After you posted that I noticed I had 51 ghosts that weren't there a while ago. I just set to No New Tasks... |
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
All my crunchers ran out of work during the outage and I switched them all off except one. If others too have done the same, then I guess the electricity consumption all over the world might have seen a dip during these days. Good to see that things are up and running, however, all I am seeing is - Scheduler request failed: Failure when receiving data from the peer, and Scheduler request failed: Timeout was reached. |
Belthazor Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13 |
I can't remember how to set limit for reporting completed tasks, cause if you have a lot of them, you can't report them at once... |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
Looks like my home machine managed to get allocated 97 GPU WUs -- but the response never got through so the ghost population just increased. Continue to get "Project communication failed": Well, all of my task "In progress" are sent 29 Nov 2012.... No of them are in my hosts... So now we are having Major ghost unit problem in hand..... ETA: Wrong info, now it all looks correct. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.