Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 70 · 71 · 72 · 73 · 74 · 75 · 76 . . . 94 · Next
| Author | Message |
|---|---|
Stephen "Heretic" ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628
|
And now and again out at Parkes.Or maybe just at the Breakthrough Listen office down on Campus - that's where I found him back in July.Yeah Matt has been over at the Breakthrough Listen project for a couple of years now.Yeah Matt is really missed in times like these and he would've had those MBv7's put to bed long ago.Wait, I've been out of the loop for a while. Matt left? . . Not recently I don't think :( Stephen ? ? |
Stephen "Heretic" ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628
|
A word of warning- if you do manage to sore some work, be prepared to have to Retry pending transfers a few 100 (it feels like a thousand) times. . . been there, done that, over and over and over and ... oh what the heck ... Stephen :( |
|
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27
|
Just ran across something new. I looked at my pending page and found a workunit at randon. It said "completed, waiting validation. But when I clicked on it to see the status, I found this https://setiathome.berkeley.edu/workunit.php?wuid=3863183873 Reported by 3 units and validated. Something is borked. Dave |
betreger ![]() Send message Joined: 29 Jun 99 Posts: 11451 Credit: 29,581,041 RAC: 66
|
That's not unusual. The first 2 tasks did not match but were close. The 3rd was close enough to validate the first 2. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873
|
Reported by 3 units and validated. Side effect of the minimum 3 quorum for early overflows and the flakey AMD card problem. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
I am getting a very small flow of Seti@Home tasks. So I have NNTed both E@H and WCG in hopes of sucking more down :) Some of the weather tasks run more than a half a dayso it will take a while to widdle about half of them down. Tom A proud member of the OFA (Old Farts Association). |
|
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27
|
I guess I was not clear. Looking at my pending page, it says that the workunit is "completed, waiting validation" YET looking at the workunit, it says validated. Just ran across something new. Dave |
Stephen "Heretic" ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628
|
Just ran across something new. . . Not sure what your issue is with that WU. Normally a WU will linger in the system for approx 24 hours after validation, currently with the problems everyone has been discussing and are very concerned about they are hanging about for much longer. That unit has only just validated so I would not expect to wave goodbye to it for a day or 3 yet ... . . OH, maybe you missed the discussion of the change that was introduced because of NAVI cards whereby overflow results are being triple checked for validation. {edit} . . The misleading listing on the your stats page may be due to the lag with the replica database?? Stephen :( |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873
|
Yes, things are borked. That is what this thread has been discussing for the past two weeks. Also the replica database is 8000 seconds behind. So what you see on your page is already 2 hours old. There is nothing normal about the current situation so no reason to expect normal classifications. I would just not worry about it since there is nothing you can do on your end to change anything. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Unixchick ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22
|
There will also be weirdness from the replica db not being caught up with the main db. |
Cruncher-American ![]() Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340
|
Tired of this current snafu. Shut my crunchers down and will hold off until SETI has a couple of days of normal work flow. Good crunching, all. |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
I am seeing no reductions in the size of the database with all the task counts at all time highs. Nothing is going to happen until we fall below the magic 20M number.We fell below that at 02:50 UTC, but nothing has happened yet at the splitters. Assimilation queue seems to be slowly going down now. Two steps down, one up. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13985 Credit: 208,696,464 RAC: 304
|
Assimilation queue seems to be slowly going down now. Two steps down, one up.Until we can get "Results returned and awaiting validation" down to around 3.5 million (given the present amount of Work in progress- so 7 million to go), and the "Workunits waiting for assimilation" back down to 0 (3.7 million to go), any new work just causes those numbers to climb. And ideally we'd want the purge numbers to be within about 500k of the In progress numbers (i think that was the general ball park when things were working). Grant Darwin NT |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
Until we can get "Results returned and awaiting validation" down to around 3.5 million (given the present amount of Work in progress- so 7 million to go), and the "Workunits waiting for assimilation" back down to 0 (3.7 million to go), any new work just causes those numbers to climb.If the underlying problem is not fixed, the numbers will just start growing again no matter how low they were driven. Apparently the splitters are occasionally running in so short bursts that the SSP can't catch them. I got a small bunch of freshly split _0s and _1s. Mostly noise bombs. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13985 Credit: 208,696,464 RAC: 304
|
If the underlying problem is not fixed, the numbers will just start growing again no matter how low they were driven.Yep. It appears we've just about finished all the BLC35 noise bombs***. And there is now a fix for the AMD RX 5000 card issues. While the increased serverside limits didn't help things, it was those 2 issues that really brought things undone- as the way to stop dodgy results getting in to the science database was require more than 1 wingman to verify a noisy WU result. Combined with files that were producing almost nothing but noise bombs, the size of the database exploded as the hardware just couldn't keep up with the load. And there may have been other performance related issues that have contributed to the initial database rapid expansion & the corresponding excruciatingly slow recovery. Having said that, it shows that we really do need new hardware in order to meet (not too distant) future workloads (let alone the continuing upload & download server issues). Edit- *** Having said that, there's still a big heap of them still to come (there were that many noisy files there). Grant Darwin NT |
Peter Send message Joined: 12 Feb 14 Posts: 19 Credit: 1,385,738 RAC: 6
|
Yeaaaaah, a lot of tasks for for CPU and CPU+GPU are now waiting :) |
|
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0
|
Edit: Except for the replica, which is now 5,91 hours behind, and it's getting worse for each update of the SSP. :-( Fun time, I just config'd graphs for replica: https://munin.kiska.pw/munin/Munin-Node/Munin-Node/replica_setiathome.html This should make Grant happy :D |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13985 Credit: 208,696,464 RAC: 304
|
Yeaaaaah, a lot of tasks for for CPU and CPU+GPU are now waiting :)It's nice to get work, but it would have been nicer (given how things are at present) for the backlogs to be a few more million down before that happened. Grant Darwin NT |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13985 Credit: 208,696,464 RAC: 304
|
This should make Grant happy :DVery nice. Now, if the "Results returned and awaiting validation" were on the same graph as the "Results out in the field" for both for MB & AP it'd be perfect (they're the same order of magnitude as each other- millions for MB and hundreds of thousands for AP, whereas the Assimilation & Deletion numbers are (when things aren't broken) usually around 0 so with the values in their millions there it makes it harder to see what's been going on with the smaller values). Oh, and the "Workunits waiting for db purging" and "Results waiting for db purging" could also go on the "Results returned and awaiting validation" and "Results out in the field" graph (or have their own). Pretty please. Pretty please with a cherry on top. Grant Darwin NT |
|
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0
|
This should make Grant happy :DVery nice. Once it starts populating :D https://munin.kiska.pw/munin/Munin-Node/Munin-Node/results_setiathomev8_in_progress_validation.html Remind me to do the other stuff later |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.