Panic Mode On (114) Server Problems?

Message boards : Number crunching : Panic Mode On (114) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 45 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1970201 - Posted: 13 Dec 2018, 22:39:45 UTC

Seems to be happening again,
Thu Dec 13 16:48:52 2018 | SETI@home | Project has no tasks available
Thu Dec 13 16:54:06 2018 | SETI@home | Project has no tasks available
Thu Dec 13 16:59:20 2018 | SETI@home | Project has no tasks available
Thu Dec 13 17:04:35 2018 | SETI@home | Project has no tasks available
Thu Dec 13 17:09:49 2018 | SETI@home | Project has no tasks available
Thu Dec 13 17:15:04 2018 | SETI@home | Project has no tasks available
Thu Dec 13 17:20:14 2018 | SETI@home | Project has no tasks available
Thu Dec 13 17:25:23 2018 | SETI@home | Project has no tasks available
Thu Dec 13 17:30:42 2018 | SETI@home | Project has no tasks available
Thu Dec 13 17:35:56 2018 | SETI@home | Project has no tasks available

Cache is down by about 150 tasks already.
ID: 1970201 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1970202 - Posted: 13 Dec 2018, 22:43:35 UTC

I'm getting work again.
ID: 1970202 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1970203 - Posted: 13 Dec 2018, 22:49:38 UTC - in response to Message 1970202.  
Last modified: 13 Dec 2018, 22:52:29 UTC

Right after I posted a couple machines got work. Another is still down by 200 tasks or so.
Ah, now it just received 150 new tasks...nice.
That didn't take long.
ID: 1970203 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1970209 - Posted: 13 Dec 2018, 23:29:45 UTC

Not sure if it is just hiccups or a sign of a bigger problem. RTS dropped to 488k. There are blc files and an Aricebo file to split... hopefully the system will self correct.
ID: 1970209 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1970238 - Posted: 14 Dec 2018, 1:49:57 UTC - in response to Message 1970209.  

. . Either way it is nearly crisis day again, that is the last day of the week when we run out of fresh work. Hopefully the team have a nice juicy batch of work to load up tomorrow and keep us busy over the weekend ...

Stephen

? ?
ID: 1970238 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1970240 - Posted: 14 Dec 2018, 1:58:42 UTC - in response to Message 1970238.  

. . Either way it is nearly crisis day again, that is the last day of the week when we run out of fresh work. Hopefully the team have a nice juicy batch of work to load up tomorrow and keep us busy over the weekend ...

Stephen

? ?

I hope we can get through the following tapes before more work is added. All tapes are 10.47 GB
    blc05_2bit_guppi_58406_33654_DIAG_3C249_1_0120 blc05_2bit_guppi_58406_33734_DIAG_3C249_1_0121 blc06_2bit_guppi_58406_33654_DIAG_3C249_1_0120
    blc06_2bit_guppi_58406_33734_DIAG_3C249_1_0121


ID: 1970240 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970243 - Posted: 14 Dec 2018, 2:14:32 UTC

The 05dc and 13dc tasks from Arecibo should slow things down a bit. But I see have a bunch of VHAR 05dc tasks that are blowing through in under 30 seconds too so that won't help.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970243 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1970250 - Posted: 14 Dec 2018, 2:59:04 UTC

they just gave us a bunch of blc14 files. That should keep us busy a while.
ID: 1970250 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13769
Credit: 208,696,464
RAC: 304
Australia
Message 1970264 - Posted: 14 Dec 2018, 5:54:40 UTC

Validators appear to be keeping up, but aren't reducing the existing backlog.
Grant
Darwin NT
ID: 1970264 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1970279 - Posted: 14 Dec 2018, 8:58:36 UTC - in response to Message 1970240.  

. . Either way it is nearly crisis day again, that is the last day of the week when we run out of fresh work. Hopefully the team have a nice juicy batch of work to load up tomorrow and keep us busy over the weekend ...

Stephen

? ?

I hope we can get through the following tapes before more work is added. All tapes are 10.47 GB
    blc05_2bit_guppi_58406_33654_DIAG_3C249_1_0120 blc05_2bit_guppi_58406_33734_DIAG_3C249_1_0121 blc06_2bit_guppi_58406_33654_DIAG_3C249_1_0120
    blc06_2bit_guppi_58406_33734_DIAG_3C249_1_0121


. . Nope they went and added a nice batch of Blc14 tapes.

. . The weekend is saved :)

Stephen

:)
ID: 1970279 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970328 - Posted: 14 Dec 2018, 17:08:36 UTC

Same old glitch where the RTS is full but the schedulers aren't sending anything out.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970328 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1970351 - Posted: 14 Dec 2018, 19:56:00 UTC - in response to Message 1970279.  

. . Either way it is nearly crisis day again, that is the last day of the week when we run out of fresh work. Hopefully the team have a nice juicy batch of work to load up tomorrow and keep us busy over the weekend ...

Stephen

? ?

I hope we can get through the following tapes before more work is added. All tapes are 10.47 GB
    blc05_2bit_guppi_58406_33654_DIAG_3C249_1_0120 blc05_2bit_guppi_58406_33734_DIAG_3C249_1_0121 blc06_2bit_guppi_58406_33654_DIAG_3C249_1_0120
    blc06_2bit_guppi_58406_33734_DIAG_3C249_1_0121


. . Nope they went and added a nice batch of Blc14 tapes.

. . The weekend is saved :)

Stephen

:)

Yes I see that. It sounds and doesn't work on 1st tape in is the 1st tape out priority
ID: 1970351 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1970367 - Posted: 14 Dec 2018, 20:53:11 UTC

not sure how it decides which order to split the files... it definitely goes by date (58405/6) first, but if they are all the same day then it seems to "sort of " go by the last number in a range. Very weird, but no big deal as it will split the files eventually.
ID: 1970367 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13769
Credit: 208,696,464
RAC: 304
Australia
Message 1970371 - Posted: 14 Dec 2018, 21:24:45 UTC - in response to Message 1970328.  
Last modified: 14 Dec 2018, 21:26:47 UTC

Same old glitch where the RTS is full but the schedulers aren't sending anything out.

Looking at the Graphs, there also appears to be several periods (sometimes one, sometimes 3) per day where the servers stops sending out work, regardless of how much is ready to go- you can see all the little inverted spikes in the In-progress by Week trace.
Other than the weekly outage, when running well the line is smooth & either mostly level, or with slight, smooth, rises as falls as the WU runtimes change significantly. Lately it's been anything but than smooth.

And some of these splitter glitches appear to be occurring around the same time as the Scheduler not sending out work (or not having any to send. Feeder issues?). There are some spikes around those times in the database Master-queries-per-second numbers, yet there are similar (or larger spikes) at other times when there is no effect on work distribution or splitting.

At least the Validators have started clearing their backlog (allowing the deleters to clear their backlog & file purging to take place).
Grant
Darwin NT
ID: 1970371 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970372 - Posted: 14 Dec 2018, 21:28:54 UTC - in response to Message 1970371.  

Yes, I have seen the same thing in the Haveland graphs. No positive correlation I can see. I just surmise there is some background process that gets higher priority than servicing scheduler requests for 15 minutes to a half hour then everything goes back to normal.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970372 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1970392 - Posted: 14 Dec 2018, 23:41:30 UTC - in response to Message 1970351.  


. . Nope they went and added a nice batch of Blc14 tapes.
. . The weekend is saved :)
Stephen
:)

Yes I see that. It sounds and doesn't work on 1st tape in is the 1st tape out priority


. . Apparently it does not indeed. I am informed that the tapes are processed in order of age. That is, the older tapes are processed first. This results in the behaviour that when a new series is loaded that matches the time range of the series already in place but which is largely completed, the earlier tapes in the new series predate the remaining tapes of the current series are are therefore processed first. When these earlier tapes are finished the splitters begin to process both series of tapes. Since this matches the behaviour I have observed I have no reason to doubt that it is the case.

Stephen

:)
ID: 1970392 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1970562 - Posted: 16 Dec 2018, 0:04:23 UTC

Any particular reason known why the replica database is almost 10 hours behind the Master DB, as of 4PM Berkeley time on 12/15/18?
.

Hello, from Albany, CA!...
ID: 1970562 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13769
Credit: 208,696,464
RAC: 304
Australia
Message 1970570 - Posted: 16 Dec 2018, 0:41:21 UTC - in response to Message 1970562.  

Any particular reason known why the replica database is almost 10 hours behind the Master DB, as of 4PM Berkeley time on 12/15/18?

The whole system has been struggling for several weeks now.
Grant
Darwin NT
ID: 1970570 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1970674 - Posted: 16 Dec 2018, 14:49:17 UTC - in response to Message 1970570.  


The whole system has been struggling for several weeks now.

Was about to say the same. Now I´m getting no tasks available or can´t connect to server + the forum is slow.
ID: 1970674 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1970930 - Posted: 18 Dec 2018, 20:03:31 UTC

less than 7hrs today, nice.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1970930 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 45 · Next

Message boards : Number crunching : Panic Mode On (114) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.