The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 46 · 47 · 48 · 49 · 50 · 51 · 52 . . . 107 · Next

AuthorMessage
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2041293 - Posted: 29 Mar 2020, 12:06:10 UTC

I realize it is nearly an hour old, but:

Workunits waiting for db purging	0	67,703	737,741	58m
Results waiting for db purging	0	140,726	1,576,766	58m


I'm I wrong in thinking that this should be low lying fruit to help clean up the system? What exactly has to happen to trigger this cleanup.
ID: 2041293 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2041294 - Posted: 29 Mar 2020, 12:09:04 UTC

Nevermind, that was a dumb question. Still waiting on wingmen.
ID: 2041294 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2041295 - Posted: 29 Mar 2020, 12:26:17 UTC
Last modified: 29 Mar 2020, 12:28:51 UTC

This will going to be another long sunday.... due the lockdown every day is sunday. And S@H is not helping us.

29-Mar-2020 07:06:12 [SETI@home] Sending scheduler request: To fetch work.
29-Mar-2020 07:06:12 [SETI@home] Reporting 57 completed tasks
29-Mar-2020 07:06:12 [SETI@home] Requesting new tasks for CPU and NVIDIA GPU
29-Mar-2020 07:06:36 [SETI@home] Started upload of 06ap10ab.29609.7429.6.33.58_0_r1137296510_0
29-Mar-2020 07:06:36 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 07:06:36 [SETI@home] [sched_op] Server version 709
29-Mar-2020 07:06:36 [SETI@home] Project has no tasks available
29-Mar-2020 07:06:36 [SETI@home] Project requested delay of 303 seconds

Draining 57 Wu on each scheduled call and not getting anything will deplete the cache very soon.

Since there are 14 Arecibo tapes and the pfb splitters are running (at least 6 of them) I just wondering where all those arecibo new splitted WU are going?
ID: 2041295 · Report as offensive     Reply Quote
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 2041300 - Posted: 29 Mar 2020, 12:41:30 UTC - in response to Message 2041295.  
Last modified: 29 Mar 2020, 12:42:18 UTC

According to the SSP they are not spliting at all or not very much at least. Guess this has something to do with the splitter_throttle_sah process. This is a good thing, the replica is chatching up constantly and the other processes should follow too. They should have slowed down the splitters long time ago.
ID: 2041300 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2041301 - Posted: 29 Mar 2020, 12:43:23 UTC - in response to Message 2041295.  


Since there are 14 Arecibo tapes and the pfb splitters are running (at least 6 of them) I just wondering where all those arecibo new splitted WU are going?

29-Mar-2020 04:12:12 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:17:21 [SETI@home] Scheduler request completed: got 1 new tasks
29-Mar-2020 04:22:28 [SETI@home] Scheduler request completed: got 1 new tasks
29-Mar-2020 04:32:46 [SETI@home] Scheduler request completed: got 1 new tasks
29-Mar-2020 04:37:11 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:42:17 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:47:29 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:52:36 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:57:43 [SETI@home] Scheduler request completed: got 3 new tasks
29-Mar-2020 05:02:50 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 05:15:04 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 05:20:10 [SETI@home] Scheduler request completed: got 2 new tasks
29-Mar-2020 05:36:26 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 05:41:34 [SETI@home] Scheduler request completed: got 0 new tasks

I'm getting a trickle.
ID: 2041301 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2041303 - Posted: 29 Mar 2020, 12:50:15 UTC - in response to Message 2041301.  
Last modified: 29 Mar 2020, 12:51:57 UTC


Since there are 14 Arecibo tapes and the pfb splitters are running (at least 6 of them) I just wondering where all those arecibo new splitted WU are going?

29-Mar-2020 04:12:12 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:17:21 [SETI@home] Scheduler request completed: got 1 new tasks
29-Mar-2020 04:22:28 [SETI@home] Scheduler request completed: got 1 new tasks
29-Mar-2020 04:32:46 [SETI@home] Scheduler request completed: got 1 new tasks
29-Mar-2020 04:37:11 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:42:17 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:47:29 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:52:36 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 04:57:43 [SETI@home] Scheduler request completed: got 3 new tasks
29-Mar-2020 05:02:50 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 05:15:04 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 05:20:10 [SETI@home] Scheduler request completed: got 2 new tasks
29-Mar-2020 05:36:26 [SETI@home] Scheduler request completed: got 0 new tasks
29-Mar-2020 05:41:34 [SETI@home] Scheduler request completed: got 0 new tasks

I'm getting a trickle.

Probably what you get are resends.

Look at task name: 19mr10af.13048.12751.14.41.3_2 on your history file.

if there is a _2 , _3 .... etc. on it they are resends. Not new splitted WU
ID: 2041303 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2041309 - Posted: 29 Mar 2020, 13:13:15 UTC

For me, it's the same as yesterday - a very short burst of new work, then nothing except the occasional resend until the next burst. These were all the same machine:

29/03/2020 12:58:44 | SETI@home | Scheduler request completed: got 4 new tasks
29/03/2020 13:03:54 | SETI@home | Scheduler request completed: got 8 new tasks
29/03/2020 13:09:04 | SETI@home | Scheduler request completed: got 6 new tasks
Only one of those was a resend. Nothing but big fat zeroes before and after.
ID: 2041309 · Report as offensive     Reply Quote
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2041318 - Posted: 29 Mar 2020, 14:00:14 UTC
Last modified: 29 Mar 2020, 14:23:21 UTC

Greetings,

This website is running at a sluggish snails pace. Stats haven't updated, at least mine, since yesterday March 28th, midday or earlier.

G o i n g    d  o  w  n      f    a    s    t    e    r        a       n       d           f           a            s            t            e            r .   .   .   .     .


[edit]
And my only hosts that are still working are my Pis and laptop. My main and my other Linux PC are idle. Perhaps I should just shut down the other Linux PC. Yeah, sounds like a good idea.
[/edit]

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2041318 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2041319 - Posted: 29 Mar 2020, 14:01:08 UTC - in response to Message 2041300.  

According to the SSP they are not spliting at all or not very much at least. Guess this has something to do with the splitter_throttle_sah process. This is a good thing, the replica is chatching up constantly and the other processes should follow too. They should have slowed down the splitters long time ago.
Replica is catching about 0.41 seconds per second. If it keeps doing it, It'll take almost two weeks for it to catch up.

All the other numbers still look very bad. The result count in the database is still going up, the assimilation queue stays extremely high without going either way and validation that had been running without problems for a long time despite all the other problems is now falling behind fast.

And all this is happening when the return rate is at half the normal value due to most of the big crunchers having run out of tasks to crunch.
ID: 2041319 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2041320 - Posted: 29 Mar 2020, 14:06:43 UTC - in response to Message 2041303.  
Last modified: 29 Mar 2020, 14:13:18 UTC

[quote]
Since there are 14 Arecibo tapes and the pfb splitters are running (at least 6 of them) I just wondering where all those arecibo new splitted WU are going?


Probably what you get are resends.

Look at task name: 19mr10af.13048.12751.14.41.3_2 on your history file.

if there is a _2 , _3 .... etc. on it they are resends. Not new splitted WU

Actually no resends came through after I purged them last night.

Edit: I have two left one one system that I marked last night, and they're about 30% complete, then I have no resends.
ID: 2041320 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2041322 - Posted: 29 Mar 2020, 14:14:38 UTC - in response to Message 2041320.  
Last modified: 29 Mar 2020, 14:21:01 UTC

Actually no resends came through after I purged them last night.

All the ones i received (very few BTW) where resends.
Who i could know for sure? Because my host is programmed to start the resend AFAP so they are DL and returned in the following scheduler call. That makes easy to track.
ID: 2041322 · Report as offensive     Reply Quote
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2041324 - Posted: 29 Mar 2020, 14:20:45 UTC

Scheduler requests are now failing, even with NNT set. Seems like the entire system is starting to seize up.
ID: 2041324 · Report as offensive     Reply Quote
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 2041326 - Posted: 29 Mar 2020, 14:40:16 UTC - in response to Message 2041319.  

Replica is catching about 0.41 seconds per second. If it keeps doing it, It'll take almost two weeks for it to catch up.

Still better than falling behind all the time.

And all this is happening when the return rate is at half the normal value due to most of the big crunchers having run out of tasks to crunch.

Well, like I said, they should have slowed down the splitters long time ago. They know how many results the database can contain before swapping to disk starts and everything becomes slow. Don't get why after running this project for over 20 years they still let such things happen again and again.
ID: 2041326 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2041327 - Posted: 29 Mar 2020, 14:44:59 UTC

I haven't received any tasks in 90 minutes now. I was lucky to be in the splitters as they were splitting a lot of files, and one AstroPulse file. I haven't received any GPU tasks since then. Got about 8-10 AP GPU tasks, and 6-7 AP CPU tasks among all three hosts. About the same number of SETI GPU tasks. Was nice while it lasted.
ID: 2041327 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2041334 - Posted: 29 Mar 2020, 15:32:48 UTC

It looks like a repeat of yesterday. Machines out of work, not receiving any new work, most times on the SSP over 4 Hours old, Results received in last hour down to 70k. I guess it's time to go out and wander around the yard again...
ID: 2041334 · Report as offensive     Reply Quote
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20343
Credit: 7,508,002
RAC: 20
United Kingdom
Message 2041348 - Posted: 29 Mar 2020, 17:03:40 UTC - in response to Message 2041280.  

Windows 10 Outperforming Linux On A ~$5000 Laptop
Using a brand new top end laptop is a bad way to compare the performances of different operating systems. What you really end up comparing is how fast the support for the latest proprietary hardware quirks gets added to different operating systems. Laptop power saving features are a notoriously fast moving target...

Indeed that is the suspicion:

There is some new/custom ACPI/'power-saving' that has kept that particular laptop in power-saving mode throughout the Linux tests.

Regardless, pretty good all round for the results seen!


IT is what we make it...
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 2041348 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2041354 - Posted: 29 Mar 2020, 17:32:42 UTC

Looks like the validators have stopped validating. SSP hasn't updated those values for hours but my RAC is dropping fast.
ID: 2041354 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2041363 - Posted: 29 Mar 2020, 18:13:12 UTC

And now my stats have stopped updating at all. Credits and RAC displayed by the manager don't change when the client does scheduler requests. The numbers have stayed the same for almost an hour.

Both numbers staying unchanged is a clear indication that the data doesn't update. If credits stayed the same, then RAC should drop fast and if RAC stayed the same, then credits should grow at a steady rate.
ID: 2041363 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2041365 - Posted: 29 Mar 2020, 18:19:24 UTC

The end is closing.
ID: 2041365 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13747
Credit: 208,696,464
RAC: 304
Australia
Message 2041366 - Posted: 29 Mar 2020, 18:20:42 UTC

Getting very slow Scheduler responses, and getting errors on some responses.
Grant
Darwin NT
ID: 2041366 · Report as offensive     Reply Quote
Previous · 1 . . . 46 · 47 · 48 · 49 · 50 · 51 · 52 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.