Panic Mode On (79) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 23 · Next
Author Message
Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3543
Credit: 46,137,942
RAC: 30,609
United States
Message 1310617 - Posted: 26 Nov 2012, 22:57:21 UTC

Start date: 2012-11-26
Results for calendar date: 2012-11-26
Scheduler Requests: 117
Scheduler Success: 53 %
Scheduler Failure: 46 %
Scheduler Timeout: 3 % of total
Scheduler Timeout: 7 % of failures
Results for calendar date: 2012-11-25
Scheduler Requests: 106
Scheduler Success: 44 %
Scheduler Failure: 55 %
Scheduler Timeout: 4 % of total
Scheduler Timeout: 8 % of failures
Results for calendar date: 2012-11-24
Scheduler Requests: 11
Scheduler Success: 9 %
Scheduler Failure: 90 %
Scheduler Timeout: 0 % of total
Scheduler Timeout: 0 % of failures
Results for calendar date: 2012-11-23
Scheduler Requests: 2
Scheduler Success: 0 %
Scheduler Failure: 100 %
Scheduler Timeout: 0 % of total
Scheduler Timeout: 0 % of failures
Results for calendar date: 2012-11-22
Scheduler Requests: 52
Scheduler Success: 75 %
Scheduler Failure: 25 %
Scheduler Timeout: 0 % of total
Scheduler Timeout: 0 % of failures
End date: 2012-11-21
Press any key to continue . . .
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 552
Credit: 120,167,460
RAC: 86,093
United Kingdom
Message 1310629 - Posted: 27 Nov 2012, 0:15:01 UTC
Last modified: 27 Nov 2012, 0:16:20 UTC

I find these files in my BOINC area:
----------+ 1 Compaq_Owner None 388941 Nov 26 23:46 sched_request_setiathome.berkeley.edu.xml
----------+ 1 Compaq_Owner None 46092 Nov 26 23:48 sched_reply_setiathome.berkeley.edu.xml

From comments I have seen it would appear that the former is sent with each request, to inform the scheduler which workunits the computer thinks it has, to be compared with the master database (to identify ghost units, for example) and the latter must be the reply sent out after a successful request.

My question is if these are sent "in the clear" or compressed? I was involved in sending configuration data to and from the configuration database for the MICE experiment two years ago, and I found that XML data is so redundant that it compresses well. Even gzip was great but bz2 doubled the compression ratio on the data I was sending. Trying gzip on the above two files I find:
----------+ 1 Compaq_Owner None 15665 Nov 26 23:46 sched_request_setiathome.berkeley.edu.xml.gz
----------+ 1 Compaq_Owner None 3432 Nov 26 23:57 sched_reply_setiathome.berkeley.edu.xml.gz

i.e. compression ratios of up to 25:1. Given that these are for a system with a current 100+100 WU limit, and when it was unfettered it had a stash of a few thousand WUs, the amount of data being transferred can be enormous, if uncompressed.

So, are these files transferred for each request? And are they being transferred as-is or with some compression?
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,941,356
RAC: 13,658
United Kingdom
Message 1310634 - Posted: 27 Nov 2012, 0:24:49 UTC - in response to Message 1310629.

So, are these files transferred for each request? And are they being transferred as-is or with some compression?

For each request? Yes, one in each direction.
Are they compressed? You would need to ask David Anderson, but I believe not.

Remember to factor in the time overhead for compressing and decompressing. At the volunteer's end? It doesn't matter, we provide the hardware and the time. At the server end? Choose an algorithm which runs efficiently on Linux.

It might make more sense to compress the request files than the reply files, given the sizes.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 552
Credit: 120,167,460
RAC: 86,093
United Kingdom
Message 1310641 - Posted: 27 Nov 2012, 0:42:04 UTC - in response to Message 1310634.
Last modified: 27 Nov 2012, 0:57:39 UTC

So, are these files transferred for each request? And are they being transferred as-is or with some compression?

For each request? Yes, one in each direction.

OK, so I'm not completely barking up the wrong tree.


Are they compressed? You would need to ask David Anderson, but I believe not.

Remember to factor in the time overhead for compressing and decompressing. At the volunteer's end? It doesn't matter, we provide the hardware and the time. At the server end? Choose an algorithm which runs efficiently on Linux.

Yes that's an obvious question to ask. At the moment "some" of us are more convinced that the problem is with network congestion rather than hardware response, so it might be that the reduction in network traffic outweighs the extra computation. Hard to say until you set up a test case.

It might make more sense to compress the request files than the reply files, given the sizes.

Undoubtedly, go for the lowest-hanging fruit first!

Unfortunately the code I worked with for MICE was rather gnarly; I don't remember how much of that was to do with the libraries involved and how much was about the transfer protocols. I can probably recover it (it might even be Googlable[1] if anyone cares) if there's interest.

[1] In fact this looks publicly available:
http://indico.cern.ch/getFile.py/access?contribId=94&sessionId=6&resId=1&materialId=slides&confId=116711
Apologies if it isn't.
____________

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4134
Credit: 1,003,719
RAC: 231
United States
Message 1310666 - Posted: 27 Nov 2012, 3:20:14 UTC - in response to Message 1310641.

So, are these files transferred for each request? And are they being transferred as-is or with some compression?

For each request? Yes, one in each direction.

OK, so I'm not completely barking up the wrong tree.


Are they compressed? You would need to ask David Anderson, but I believe not.

Remember to factor in the time overhead for compressing and decompressing. At the volunteer's end? It doesn't matter, we provide the hardware and the time. At the server end? Choose an algorithm which runs efficiently on Linux.

Yes that's an obvious question to ask. At the moment "some" of us are more convinced that the problem is with network congestion rather than hardware response, so it might be that the reduction in network traffic outweighs the extra computation. Hard to say until you set up a test case.

It might make more sense to compress the request files than the reply files, given the sizes.

Undoubtedly, go for the lowest-hanging fruit first!

Any change to compress sched_request files would have to be done in the client, so considerable delay. In addition, the upload side of the pipe isn't congested, though having fewer packets to ack through the download side could improve the reliability.

OTOH, sending sched_reply with gzip or deflate compression by using mod_gzip or mod_deflate on the scheduling server might be a very minimal change. Clients already advertise they can accept that:

20/11/2012 00:05:54 | SETI@home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip

That's from an earlier post in this thread by Richard Haselgrove, the BOINC 6.10.58 I'm using also sends that header when contacting the Scheduler.

An interesting Taka's RunLoop blog post gives some idea of the time taken to compress a file versus how long it takes to transfer it on a 100Mbps link. I think Synergy would take less time for compression than the Atom being used as a server there :^)
Joe

Profile BilBg
Avatar
Send message
Joined: 27 May 07
Posts: 2457
Credit: 5,412,556
RAC: 7,878
Bulgaria
Message 1310686 - Posted: 27 Nov 2012, 4:18:54 UTC - in response to Message 1310509.

If anyone wishes to use it you can find it here:
http://www.hal6000.com/seti/_com_check_full.txt
Right click & save, depending on your browser, rename to .bat. Then place in the folder where your stdoutdae.txt is located, or modify the script to point to the location of the file.

Change the number on the line "set check_days=" for the number of days you wish to view.


Every user may have different format for Date (I needed to change that part, my Date format is: 27.11.2012)
(obviously this .bat will not work very useful in the start of a month (it can't go back to the previous month) so use it now, it will be too late in December ;) )

Since I changed too many places here is my edited .bat
@ECHO OFF cls pushd %~dp0 set check_days=20 @ECHO %date% ::: --- Change the following lines to match your DATE format: set cdy=%date:~0,2% set cmo=%date:~3,2% set cyr=%date:~6,4% set edychk1=- set edychk2=0 set Start_date=%cyr%-%cmo%-%cdy% @ECHO Start date: %Start_date% set Out_file=$com_check_full__%computername%__%Start_date%.txt @ECHO Out file: %Out_file% @ECHO. pause REM.> %Out_file% findstr /C:"[SETI@home] Scheduler request failed: " stdoutdae.txt >$sched_failures-%computername%.txt findstr /C:"[SETI@home] Scheduler request failed: Timeout was reached" stdoutdae.txt >$sched_failures_timeout-%computername%.txt findstr /C:"[SETI@home] Scheduler request completed: " stdoutdae.txt >$sched_successes-%computername%.txt :start set /a lct=%lct%+1 findstr /C:"%cdy%-" $sched_failures-%computername%.txt >sched_failures-%computername%_%cyr%-%cmo%-%cdy%.txt findstr /C:"%cdy%-" $sched_failures_timeout-%computername%.txt >sched_failures_timeout-%computername%_%cyr%-%cmo%-%cdy%.txt findstr /C:"%cdy%-" $sched_successes-%computername%.txt >sched_successes-%computername%_%cyr%-%cmo%-%cdy%.txt REM ---start output senction for /f %%a in ('Find /V /C "" ^< sched_failures-%computername%_%cyr%-%cmo%-%cdy%.txt') do set fails=%%a for /f %%a in ('Find /V /C "" ^< sched_failures_timeout-%computername%_%cyr%-%cmo%-%cdy%.txt') do set time_fails=%%a for /f %%a in ('Find /V /C "" ^< sched_successes-%computername%_%cyr%-%cmo%-%cdy%.txt') do set success=%%a @ECHO Results for calendar date: %cyr%-%cmo%-%cdy% @ECHO Results for calendar date: %cyr%-%cmo%-%cdy% >> %Out_file% set /a schdreqcnt=%fails%+%success% @ECHO Scheduler Requests: %schdreqcnt% @ECHO Scheduler Requests: %schdreqcnt% >> %Out_file% set /a schdsuccesspct=%success%*100/%schdreqcnt% @ECHO Scheduler Success: %schdsuccesspct% %% @ECHO Scheduler Success: %schdsuccesspct% %% >> %Out_file% set /a schdfailspct=%fails%*100/%schdreqcnt% @ECHO Scheduler Failure: %schdfailspct% %% @ECHO Scheduler Failure: %schdfailspct% %% >> %Out_file% set /a schdtofailspct=%time_fails%*100/%schdreqcnt% @ECHO Scheduler Timeout: %schdtofailspct% %% of total @ECHO Scheduler Timeout: %schdtofailspct% %% of total >> %Out_file% set /a schdtmfailspct=%time_fails%*100/%fails% @ECHO Scheduler Timeout: %schdtmfailspct% %% of failures @ECHO Scheduler Timeout: %schdtmfailspct% %% of failures >> %Out_file% REM ---end output senction set /a cdy=%cdy%-1 if %cdy:~0,1%==%edychk1% @ECHO Date makes no sense. I Can't math! & goto end if %cdy:~0,1%==%edychk2% @ECHO Date makes no sense. I Can't math! & goto end if %lct%==%check_days% goto end goto start :end @ECHO. @ECHO End date: %cyr%-%cmo%-%cdy% @ECHO. @ECHO *** All the above text you can find now in file: @ECHO %Out_file% pause del sched_*-%computername%_*.txt %Out_file% :pause



And my Results for 20 days:

Results for calendar date: 2012-11-27
Scheduler Requests: 52
Scheduler Success: 67 %
Scheduler Failure: 32 %
Scheduler Timeout: 0 % of total
Scheduler Timeout: 0 % of failures
Results for calendar date: 2012-11-26
Scheduler Requests: 193
Scheduler Success: 25 %
Scheduler Failure: 74 %
Scheduler Timeout: 1 % of total
Scheduler Timeout: 1 % of failures
Results for calendar date: 2012-11-25
Scheduler Requests: 117
Scheduler Success: 17 %
Scheduler Failure: 82 %
Scheduler Timeout: 1 % of total
Scheduler Timeout: 2 % of failures
Results for calendar date: 2012-11-24
Scheduler Requests: 90
Scheduler Success: 0 %
Scheduler Failure: 100 %
Scheduler Timeout: 1 % of total
Scheduler Timeout: 1 % of failures
Results for calendar date: 2012-11-23
Scheduler Requests: 78
Scheduler Success: 44 %
Scheduler Failure: 55 %
Scheduler Timeout: 0 % of total
Scheduler Timeout: 0 % of failures
Results for calendar date: 2012-11-22
Scheduler Requests: 103
Scheduler Success: 80 %
Scheduler Failure: 19 %
Scheduler Timeout: 0 % of total
Scheduler Timeout: 0 % of failures
Results for calendar date: 2012-11-21
Scheduler Requests: 134
Scheduler Success: 45 %
Scheduler Failure: 54 %
Scheduler Timeout: 13 % of total
Scheduler Timeout: 24 % of failures
Results for calendar date: 2012-11-20
Scheduler Requests: 185
Scheduler Success: 72 %
Scheduler Failure: 27 %
Scheduler Timeout: 4 % of total
Scheduler Timeout: 18 % of failures
Results for calendar date: 2012-11-19
Scheduler Requests: 29
Scheduler Success: 82 %
Scheduler Failure: 17 %
Scheduler Timeout: 6 % of total
Scheduler Timeout: 40 % of failures
Results for calendar date: 2012-11-18
Scheduler Requests: 72
Scheduler Success: 61 %
Scheduler Failure: 38 %
Scheduler Timeout: 38 % of total
Scheduler Timeout: 100 % of failures
Results for calendar date: 2012-11-17
Scheduler Requests: 25
Scheduler Success: 48 %
Scheduler Failure: 52 %
Scheduler Timeout: 48 % of total
Scheduler Timeout: 92 % of failures
Results for calendar date: 2012-11-16
Scheduler Requests: 38
Scheduler Success: 65 %
Scheduler Failure: 34 %
Scheduler Timeout: 34 % of total
Scheduler Timeout: 100 % of failures
Results for calendar date: 2012-11-15
Scheduler Requests: 39
Scheduler Success: 61 %
Scheduler Failure: 38 %
Scheduler Timeout: 33 % of total
Scheduler Timeout: 86 % of failures
Results for calendar date: 2012-11-14
Scheduler Requests: 56
Scheduler Success: 82 %
Scheduler Failure: 17 %
Scheduler Timeout: 17 % of total
Scheduler Timeout: 100 % of failures
Results for calendar date: 2012-11-13
Scheduler Requests: 53
Scheduler Success: 77 %
Scheduler Failure: 22 %
Scheduler Timeout: 9 % of total
Scheduler Timeout: 41 % of failures
Results for calendar date: 2012-11-12
Scheduler Requests: 217
Scheduler Success: 92 %
Scheduler Failure: 7 %
Scheduler Timeout: 5 % of total
Scheduler Timeout: 75 % of failures
Results for calendar date: 2012-11-11
Scheduler Requests: 35
Scheduler Success: 54 %
Scheduler Failure: 45 %
Scheduler Timeout: 8 % of total
Scheduler Timeout: 18 % of failures
Results for calendar date: 2012-11-10
Scheduler Requests: 19
Scheduler Success: 89 %
Scheduler Failure: 10 %
Scheduler Timeout: 10 % of total
Scheduler Timeout: 100 % of failures
Results for calendar date: 2012-11-9
Scheduler Requests: 41
Scheduler Success: 70 %
Scheduler Failure: 29 %
Scheduler Timeout: 19 % of total
Scheduler Timeout: 66 % of failures
Results for calendar date: 2012-11-8
Scheduler Requests: 87
Scheduler Success: 59 %
Scheduler Failure: 40 %
Scheduler Timeout: 37 % of total
Scheduler Timeout: 94 % of failures


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5563
Credit: 51,324,429
RAC: 40,106
Australia
Message 1310724 - Posted: 27 Nov 2012, 8:11:15 UTC


Most of the time i'm now getting a response from the Scheduler within 7 seconds, sometimes within 3. However those 3 second responses are to tell me there is no work available. And there are still the odd Scheduler "Couldn't connect to server" errors.

The server status page hasn't updated many of the numbers in 13 hours.
The one it does appear to be updating is the rate of MB splitting. So close to 0 we might as well call it that.
____________
Grant
Darwin NT.

WezH
Volunteer tester
Send message
Joined: 19 Aug 99
Posts: 78
Credit: 2,922,847
RAC: 13,246
Finland
Message 1310755 - Posted: 27 Nov 2012, 18:25:48 UTC
Last modified: 27 Nov 2012, 18:31:00 UTC

Okay, Tuesday maintenance has been done, Server Status shows almost all up and running.

Let's see what happens in coming week.

Hmmm... Server Status shows that there are no AP splitters on Synergy. Vader, lando, marvin and georgem are now splitters.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5563
Credit: 51,324,429
RAC: 40,106
Australia
Message 1310923 - Posted: 28 Nov 2012, 5:58:03 UTC - in response to Message 1310755.
Last modified: 28 Nov 2012, 5:58:23 UTC

Server staus page shows one of the MB splitters isn't running, however result creation rate is still only slightly more than 0.
So something's wrong with those that are actually running.

Scheduler requests no longer result in an error, they just result in me being told there is no work available.
____________
Grant
Darwin NT.

bill
Send message
Joined: 16 Jun 99
Posts: 848
Credit: 20,578,407
RAC: 15,454
United States
Message 1310959 - Posted: 28 Nov 2012, 8:28:29 UTC - in response to Message 1310923.

I guess shoving that wad of $20.00s under the lab door didn't hurt. :)

11/28/2012 12:15:13 AM | SETI@home | Sending scheduler request: To fetch work.
11/28/2012 12:15:13 AM | SETI@home | Reporting 2 completed tasks
11/28/2012 12:15:13 AM | SETI@home | Requesting new tasks for NVIDIA
11/28/2012 12:15:15 AM | SETI@home | Scheduler request completed: got 6 new tasks
11/28/2012 12:15:18 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.87
11/28/2012 12:15:18 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.93
11/28/2012 12:15:20 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.87
11/28/2012 12:15:20 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.93
11/28/2012 12:15:20 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.99
11/28/2012 12:15:20 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.105
11/28/2012 12:15:21 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.99
11/28/2012 12:15:21 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.105
11/28/2012 12:15:21 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.111
11/28/2012 12:15:21 AM | SETI@home | Started download of 12ja11ab.5457.6372.140733193388035.10.117
11/28/2012 12:15:23 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.111
11/28/2012 12:15:23 AM | SETI@home | Finished download of 12ja11ab.5457.6372.140733193388035.10.117
11/28/2012 12:20:22 AM | SETI@home | Sending scheduler request: To fetch work.
11/28/2012 12:20:22 AM | SETI@home | Requesting new tasks for CPU and NVIDIA
11/28/2012 12:20:24 AM | SETI@home | Scheduler request completed: got 13 new tasks
11/28/2012 12:20:26 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.182
11/28/2012 12:20:26 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.107
11/28/2012 12:20:29 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.182
11/28/2012 12:20:29 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.107
11/28/2012 12:20:29 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.1
11/28/2012 12:20:29 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.249
11/28/2012 12:20:33 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.1
11/28/2012 12:20:33 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.249
11/28/2012 12:20:33 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.113
11/28/2012 12:20:33 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.79
11/28/2012 12:20:35 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.113
11/28/2012 12:20:35 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.221
11/28/2012 12:20:36 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.79
11/28/2012 12:20:36 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.227
11/28/2012 12:20:39 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.221
11/28/2012 12:20:39 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.233
11/28/2012 12:20:40 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.227
11/28/2012 12:20:40 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.239
11/28/2012 12:20:42 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.233
11/28/2012 12:20:42 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.239
11/28/2012 12:20:42 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.245
11/28/2012 12:20:42 AM | SETI@home | Started download of 12au10ad.28819.8460.140733193388035.10.119
11/28/2012 12:20:45 AM | SETI@home | Finished download of 12au10ad.28819.8460.140733193388035.10.119
11/28/2012 12:20:45 AM | SETI@home | Started download of 12ja11ab.5457.8826.140733193388035.10.251
11/28/2012 12:20:47 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.245
11/28/2012 12:20:48 AM | SETI@home | Finished download of 12ja11ab.5457.8826.140733193388035.10.251

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5563
Credit: 51,324,429
RAC: 40,106
Australia
Message 1311094 - Posted: 28 Nov 2012, 18:17:03 UTC - in response to Message 1310959.


MB splitting has increased overnight, and my systems managed to reach the limits (as low as they are), but still not enough is being produced to build up any sort of Ready to Send buffer, and that's with longer running WUs going through the system. If another bunch of shorties come through, people will run out of work again even if the the rest of the system doesn't fall over.
____________
Grant
Darwin NT.

WezH
Volunteer tester
Send message
Joined: 19 Aug 99
Posts: 78
Credit: 2,922,847
RAC: 13,246
Finland
Message 1311100 - Posted: 28 Nov 2012, 18:33:27 UTC - in response to Message 1311094.


MB splitting has increased overnight, and my systems managed to reach the limits (as low as they are), but still not enough is being produced to build up any sort of Ready to Send buffer, and that's with longer running WUs going through the system. If another bunch of shorties come through, people will run out of work again even if the the rest of the system doesn't fall over.


It seems that Oscar is not splitting MB units now, and several channels are with errors.

Positive side: Cricket is not maxed up and Synergy responds really fast to scheduling requests :)
____________

Profile CLYDE
Volunteer tester
Avatar
Send message
Joined: 9 Aug 99
Posts: 770
Credit: 17,454,176
RAC: 33,096
United States
Message 1311123 - Posted: 28 Nov 2012, 19:27:58 UTC - in response to Message 1309829.

Help to make the SETI dilithium crystals stronger!

Get your green star now!



:) :) :) :) :)
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5563
Credit: 51,324,429
RAC: 40,106
Australia
Message 1311294 - Posted: 29 Nov 2012, 7:52:13 UTC - in response to Message 1311123.


Haven't had any Scheduler Timeouts (that i can see) for a couple of days now. Still getting patches of "Couldn't connect to server" and even one "Failure when receiving data from the peer". But so far sinc the weekly outage there's only been a few of thm, then they clear up for a while- long enough to get what little work is possible.
____________
Grant
Darwin NT.

Profile Sirius B
Volunteer tester
Avatar
Send message
Joined: 26 Dec 00
Posts: 9295
Credit: 1,361,414
RAC: 1,591
United Kingdom
Message 1311356 - Posted: 5 Dec 2012, 18:16:28 UTC

& we're back! Thanks guys.
____________

WezH
Volunteer tester
Send message
Joined: 19 Aug 99
Posts: 78
Credit: 2,922,847
RAC: 13,246
Finland
Message 1311359 - Posted: 5 Dec 2012, 18:22:46 UTC - in response to Message 1311356.

Yep, thanks guys, at least forums are up.



____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2203
Credit: 8,012,699
RAC: 4,304
United States
Message 1311391 - Posted: 5 Dec 2012, 18:57:48 UTC

Well this is exciting. I noticed the crickets were chirping again and allowed network comms again. Two failures were then followed by:

2012-12-05 13:54:40 SETI@home Reporting 38 completed tasks, requesting new tasks
2012-12-05 13:54:40 SETI@home [sched_op_debug] CPU work request: 2512099.31 seconds; 0.00 CPUs
2012-12-05 13:55:37 SETI@home Scheduler request completed: got 4 new tasks
2012-12-05 13:55:37 SETI@home [sched_op_debug] estimated total CPU job duration: 165149 seconds

____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 11732
Credit: 5,969,877
RAC: 0
United States
Message 1311401 - Posted: 5 Dec 2012, 19:06:00 UTC

Beta is down hard
12/5/2012 11:04:01 AM | SETI@home Beta Test | [error] No scheduler URLs found in master file

____________

Profile S@NL Etienne Dokkum
Volunteer tester
Avatar
Send message
Joined: 11 Jun 99
Posts: 155
Credit: 12,797,922
RAC: 24,916
Netherlands
Message 1311404 - Posted: 5 Dec 2012, 19:15:15 UTC - in response to Message 1311401.

Beta is down hard
12/5/2012 11:04:01 AM | SETI@home Beta Test | [error] No scheduler URLs found in master file


yep, noticed that too... well at least I had 400+ beta tasks to keep the main cruncher happy during this outage. As far as luck goes : I would run out of beta tasks by tomorrow morning.

Don't bother abusing the retry button as that won't help Eric glueing the database together again ;-)
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 552
Credit: 120,167,460
RAC: 86,093
United Kingdom
Message 1311409 - Posted: 5 Dec 2012, 19:24:49 UTC - in response to Message 1311391.

Well this is exciting. I noticed the crickets were chirping again and allowed network comms again. Two failures were then followed by:

2012-12-05 13:54:40 SETI@home Reporting 38 completed tasks, requesting new tasks
2012-12-05 13:54:40 SETI@home [sched_op_debug] CPU work request: 2512099.31 seconds; 0.00 CPUs
2012-12-05 13:55:37 SETI@home Scheduler request completed: got 4 new tasks
2012-12-05 13:55:37 SETI@home [sched_op_debug] estimated total CPU job duration: 165149 seconds

Looks like my home machine managed to get allocated 97 GPU WUs -- but the response never got through so the ghost population just increased. Continue to get "Project communication failed":

____________

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Copyright © 2014 University of California