The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 44 · 45 · 46 · 47 · 48 · 49 · 50 . . . 107 · Next

AuthorMessage
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2041092 - Posted: 28 Mar 2020, 16:06:08 UTC - in response to Message 2041089.  
Last modified: 28 Mar 2020, 16:07:27 UTC

Yep. I'm just going through the tasks I have and making sure the _2 and _1 are prioritized and running first. Maybe we can force a small kick on the system by trying to clear out the DB???

Edit: My room is so much colder now.
ID: 2041092 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22221
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2041095 - Posted: 28 Mar 2020, 16:12:21 UTC

No point in prioritising _1 tasks, they are one half of the initial pair sent out.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2041095 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2041096 - Posted: 28 Mar 2020, 16:13:19 UTC - in response to Message 2041092.  

Yep. I'm just going through the tasks I have and making sure the _2 and _1 are prioritized and running first. Maybe we can force a small kick on the system by trying to clear out the DB???.
There's no point in priorizing _1s. Only _2s and up. _0 and _1 are the initial replication, not resends.
ID: 2041096 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2041097 - Posted: 28 Mar 2020, 16:18:53 UTC - in response to Message 2041089.  
Last modified: 28 Mar 2020, 16:28:53 UTC

To where the new splitted data is going?
I'm getting some, in short bursts every few hours. It's just the luck of asking at exactly the right time.

Edit - I didn't get everything I was wanting, but it'll do for now. These were all new work, not resends.

28/03/2020 16:26:07 | SETI@home | Reporting 1 completed tasks
28/03/2020 16:26:07 | SETI@home | [sched_op] NVIDIA GPU work request: 36067.85 seconds; 0.00 devices
28/03/2020 16:26:10 | SETI@home | Scheduler request completed: got 31 new tasks
28/03/2020 16:26:10 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 15004 seconds
ID: 2041097 · Report as offensive     Reply Quote
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2041098 - Posted: 28 Mar 2020, 16:20:52 UTC

I wish that the system was making some progress on assimilation since it isn't doing much of any splitting. I know the Status page data is about an hour out of date, but I would like to see the system recovering a bit since it isn't splitting much if at all at the moment.

I have been prioritizing 2s as well. I used to do it to help the db size, but recently I have been doing it because they are usually short run WUs that I could return and get a "meatier" WU in its place. Since I'm no longer topped off, and thus have space should any request actually get something, I'm no longer prioritizing anything and I'm running in normal FIFO order. I think I'm good for another 2 days, so I'm hanging out for a bit longer.

glad to have made it over 2 million on this and the previous mac-mini.
ID: 2041098 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2041100 - Posted: 28 Mar 2020, 16:34:44 UTC

Well then...the _2s are running. Not much help on my end.
ID: 2041100 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2041103 - Posted: 28 Mar 2020, 16:42:11 UTC

All my machines are Out of Work and not receiving much of anything. I think it's time to do something a little more constructive, I dunno, maybe finish burning the leafs or something...
ID: 2041103 · Report as offensive     Reply Quote
Profile [TA]Assimilator1
Avatar

Send message
Joined: 16 Oct 99
Posts: 52
Credit: 8,551,146
RAC: 50
United Kingdom
Message 2041104 - Posted: 28 Mar 2020, 16:43:54 UTC
Last modified: 28 Mar 2020, 16:44:19 UTC

I seem to be having the opposite problem to most, so far my faster rig hasn't run out of WUs (got some some at ~11.30, 14.30 & 15.30), but the cache is dwindling.
My 2nd rig's CPU ran out of WUs yesterday, bar 1 AP WU which is going to take over a day!! It's GPU was getting WUs sporadically but is now out.

So is the consensus just general server problems which aren't being addressed, rather than a purposeful wind down?
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP@H.

Main rig - Ryzen 5 3600, 32GB DDR4 3200, RX 580 8GB, 500GB Samsung 970 Evo+, Win 10
2nd rig - i7 4930k @4.1 GHz, 16GB DDR2 1866, HD 7870 XT 3GB (DS), Win7 64bit
ID: 2041104 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2041109 - Posted: 28 Mar 2020, 17:08:42 UTC - in response to Message 2041104.  

So is the consensus just general server problems which aren't being addressed, rather than a purposeful wind down?
The database server is out of RAM and running in snail mode and the splitters are being heavily throttled in an attempt to reduce the size of the database so it could recover.
ID: 2041109 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2041113 - Posted: 28 Mar 2020, 17:18:38 UTC - in response to Message 2041092.  

Yep. I'm just going through the tasks I have and making sure the _2 and _1 are prioritized and running first. Maybe we can force a small kick on the system by trying to clear out the DB???

Edit: My room is so much colder now.


_1 and _0 are original, normal tasks

_2 or greater are the re-sends.
ID: 2041113 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2041117 - Posted: 28 Mar 2020, 17:34:43 UTC - in response to Message 2041097.  

To where the new splitted data is going?
I'm getting some, in short bursts every few hours. It's just the luck of asking at exactly the right time.

Edit - I didn't get everything I was wanting, but it'll do for now. These were all new work, not resends.

28/03/2020 16:26:07 | SETI@home | Reporting 1 completed tasks
28/03/2020 16:26:07 | SETI@home | [sched_op] NVIDIA GPU work request: 36067.85 seconds; 0.00 devices
28/03/2020 16:26:10 | SETI@home | Scheduler request completed: got 31 new tasks
28/03/2020 16:26:10 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 15004 seconds


Just get:
28-Mar-2020 12:29:30 [SETI@home] Scheduler request completed: got 33 new tasks


Did that indicates we both win the S@H new WU lotto?
ID: 2041117 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2041120 - Posted: 28 Mar 2020, 17:46:26 UTC

7 splitters splitting
6 machines doing nothing
5 extraterrestrial rings
4 servers burning
3 clients crashing
2 servers smashing
and an adult beverage to be enjoyed now.
ID: 2041120 · Report as offensive     Reply Quote
Profile [TA]Assimilator1
Avatar

Send message
Joined: 16 Oct 99
Posts: 52
Credit: 8,551,146
RAC: 50
United Kingdom
Message 2041126 - Posted: 28 Mar 2020, 18:12:48 UTC - in response to Message 2041109.  
Last modified: 28 Mar 2020, 18:14:43 UTC

Lol! :D

So is the consensus just general server problems which aren't being addressed, rather than a purposeful wind down?
The database server is out of RAM and running in snail mode and the splitters are being heavily throttled in an attempt to reduce the size of the database so it could recover.


Ok, thanks :) (not that I can remember what the splitters do :o, ........oh wait, do they split up the large data blocks from the observatory(ies?) to the smaller chunks we crunch?
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP@H.

Main rig - Ryzen 5 3600, 32GB DDR4 3200, RX 580 8GB, 500GB Samsung 970 Evo+, Win 10
2nd rig - i7 4930k @4.1 GHz, 16GB DDR2 1866, HD 7870 XT 3GB (DS), Win7 64bit
ID: 2041126 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2041130 - Posted: 28 Mar 2020, 18:26:34 UTC - in response to Message 2041126.  

Ok, thanks :) (not that I can remember what the splitters do :o, ........oh wait, do they split up the large data blocks from the observatory(ies?) to the smaller chunks we crunch?
They produce the tasks for us to download and crunch. If they are not running and the RTS buffer is empty, all we get are some occasional resends. And to get even them our computers have to win the lottery against thousands of other hungry computers.
ID: 2041130 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2041146 - Posted: 28 Mar 2020, 19:28:56 UTC

i feel so lucky, i just get 6 new WU!!! Enough for less than a minute.... but is a progress. 6 is better than nothing.
ID: 2041146 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 2041159 - Posted: 28 Mar 2020, 20:11:18 UTC
Last modified: 28 Mar 2020, 20:25:24 UTC

Still barely any work, so it looks like the Servers have finally ground to a halt.
"Results out in the field" is only 5 million now. Dropping away as work comes in but little goes out.
"Results returned and awaiting validation" 20.2 million and "Workunits waiting for assimilation" 6.85 million.



I think they might as well shut things down now.
Leave just the BOINC master database, SETI@home science database, data-driven web pages, the Transitioners and the sah assimilators running but shut down all other functions (inc AP, the Scheudler, upload & download servers, Replica etc). Just let the Assimilators clear that 6.85 million WU backlog.
Once that is done, then start up the Deleters to delete all the Tasks & WUs that have now been marked for deletion. Once that backlog clears, then run the Purgers to actually remove everything marked for Deletion. And once that is done, time for one final official weekly outage to compact the database.

It would be nice to think this could all be done in a few days, but given the state of the database i can see it taking a week (or more). But once done, the database shouldn't have any further issues with 6.8 million less WUs and at least 13.6 million less Results in it, along with much smaller indexes.

Fire up everything except for the Scheduler, upload server (and of course the splitters). Let things run for a bit (after a week there should be plenty of resends to go out), then start up the Scheduler, then a bit later the upload server, and let the WU cleanup lottery commence.
And i would really, really, really hope they finally implement the revised deadlines for resends, set it for 2 days & get the worst of the outstanding work cleared up in 3.5 months, instead of the 10+ months it will take (if all work is to be cleared completely).
Grant
Darwin NT
ID: 2041159 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2041161 - Posted: 28 Mar 2020, 20:24:27 UTC - in response to Message 2041146.  

i feel so lucky, i just get 6 new WU!!! Enough for less than a minute.... but is a progress. 6 is better than nothing.

I am amazed a Linux machine can process 6 work units in under a minute. Did I read that correctly?
ID: 2041161 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 2041162 - Posted: 28 Mar 2020, 20:26:29 UTC - in response to Message 2041161.  

i feel so lucky, i just get 6 new WU!!! Enough for less than a minute.... but is a progress. 6 is better than nothing.
I am amazed a Linux machine can process 6 work units in under a minute. Did I read that correctly?
Depends how many GPUs you have, and what type. High end cards can do a WU in 30sec or so.
Grant
Darwin NT
ID: 2041162 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2041165 - Posted: 28 Mar 2020, 20:40:25 UTC - in response to Message 2041162.  
Last modified: 28 Mar 2020, 21:29:19 UTC

i feel so lucky, i just get 6 new WU!!! Enough for less than a minute.... but is a progress. 6 is better than nothing.
I am amazed a Linux machine can process 6 work units in under a minute. Did I read that correctly?
Depends how many GPUs you have, and what type. High end cards can do a WU in 30sec or so.

Yes you are right. This host has now 4 GPU's (could be more) and it crunch a WU in about 70 secs (some in the range of 30 secs and slower in about 2:10) so yes 6 WU normal last for 1-1:30 minutes. But the current WU produced are mainly shorties or noise bombs so they crunch in about 1 min only.
I measure the crunching performance on how many WU are reported on each scheduled call. It goes from 22 WU to 60 each 5 min. Then just do the math.
Be aware i running on 1070 & 2070, In the 2080 Ti or Titan that number is cut by 2 or 3.
ID: 2041165 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2041166 - Posted: 28 Mar 2020, 20:40:40 UTC - in response to Message 2041159.  

Once that is done, then start up the Deleters to delete all the Tasks & WUs that have now been marked for deletion. Once that backlog clears, then run the Purgers to actually remove everything marked for Deletion. And once that is done, time for one final official weekly outage to compact the database.

Unless the above can be done remotely it will not be getting done until Berkely is out of lockdown.
ID: 2041166 · Report as offensive     Reply Quote
Previous · 1 . . . 44 · 45 · 46 · 47 · 48 · 49 · 50 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.