Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 50 · 51 · 52 · 53 · 54 · 55 · 56 . . . 94 · Next
| Author | Message |
|---|---|
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
But it says 'not running' which means they are out of work (obviously not the case) or failed. So apparently the splitters try to run but soon crash. And then stay in the 'not running' state until someone manually restarts them. Just to crash again after a few minutes.They automatically stop when the Ready-to-send buffer is full- and their status becomes "Not running". Grant Darwin NT |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
Yeah, but instead of slowing down their output it's stopping it dead. And then it takes a while for them to decide to startup again as the Ready-to-send buffer is a long way from full. Then that finally start cranking up the output, and then stop again- well before the Ready-to-send buffer even makes triple digits.And the splitters are running again. Grant Darwin NT |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
The situation seems to be a bit better than yesterday. Yesterday it was very rare to catch them in running state and when that happened, it never lasted more than one ssp update cycle. Now they have been running for several cycles in a row. A few dropped out in the last cycle but the rest keep running. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
The situation seems to be a bit better than yesterday. Yesterday it was very rare to see catch them in running state and when that happened, it never lasted more than one ssp update cycle. Now they have been running for several cycles in a row. A few dropped out in the last cycle but the rest keep running.I think it's just a case of the status not being updated again. Many of the numbers haven't changed in over half an hour. Things are very much borked. Grant Darwin NT |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
If they are intentionally stopped, the status says 'disabled'. But it says 'not running' which means they are out of work (obviously not the case) or failed. So apparently the splitters try to run but soon crash. And then stay in the 'not running' state until someone manually restarts them. Just to crash again after a few minutes.There's been a fairly recent change (a few months ago), where the splitters show 'not running' when the automatic limiter kicks in. That was when the 'ready to send' limit was around 600K, and the limiters were regularly kicking in and out in the course of an average day. I've not worked out exactly what Eric did to implement "To that end we are throttling work generation to a rate at which the table size is shrinking." - it seems like that change is still having repercussions, perhaps because of the excessive numbers of overflow tasks recently. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
Many of the numbers haven't changed in over half an hour.And now they have- no work ready to go, no work being produced- all splitters not running. Grant Darwin NT |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
I've not worked out exactly what Eric did to implement "To that end we are throttling work generation to a rate at which the table size is shrinking." - it seems like that change is still having repercussions, perhaps because of the excessive numbers of overflow tasks recently.Could be; although the In progress numbers are way down the Validation & Assimilation backlogs haven't really improved much at all (any reduction in Validation numbers just results in an increase in Assimilation numbers). Grant Darwin NT |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
think it's just a case of the status not being updated again.All the numbers that were supposed to change did change during those five consecutive ssp updates that showed running splitters. The result generation rate for example was 6.3545, 61.8328, 77.8624, 78.4933, 91.5302 on those five updates. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
But an awful lot of 'Results returned and awaiting validation'. That's the table Eric was trying to get down to manageable size - maybe he's put an extra term in the throttle trigger? If we're still running an extra check on overflow tasks, all those BLC35s will go straight into that category, and stay there a while.Many of the numbers haven't changed in over half an hour.And now they have- no work ready to go, no work being produced- all splitters not running. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
The result generation rate for example was 6.3545, 61.8328, 77.8624, 78.4933, 91.5302 on those five updates.Not when i was refreshing the Server status page, it just showed as 91 or so with 6 ready to send over that 30min or so. Time stamp on the server status page changed with each refresh, As of time & the status numbers didn't. *shrug* It's broken, and i don't see it getting sorted till Monday- I don't see it being a quick fix. Let them have the weekend to relax and get stuck in to it next week. A chance for me to further reduce my power bill. Grant Darwin NT |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
If we're still running an extra check on overflow tasks, all those BLC35s will go straight into that category, and stay there a while.Just picked up a few more WUs on one system- 80%+ BLC35s. Grant Darwin NT |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
I'm trying to throw my high-replication tasks like WU 3835497267 back as quickly as possible, so they can start their purdah in the 24-hour purge queue as soon as possible. Perhaps Eric could lower that delay for the time being? |
rob smith ![]() Send message Joined: 7 Mar 03 Posts: 22941 Credit: 416,307,556 RAC: 380
|
OK, it's my fault - I was planning on putting a couple more RPi on stream.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
Ready to send was 13, 10, 0, 41, 41. Only on the last two of the five updates it stayed the same.The result generation rate for example was 6.3545, 61.8328, 77.8624, 78.4933, 91.5302 on those five updates.Not when i was refreshing the Server status page, it just showed as 91 or so with 6 ready to send over that 30min or so. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
I'm trying to throw my high-replication tasks like WU 3835497267 back as quickly as possible, so they can start their purdah in the 24-hour purge queue as soon as possible. Perhaps Eric could lower that delay for the time being?But first they have to be validated (which has barely reduced in size), then they have to be assimilated (which is still growing in size as things eventually do get validated), then they can be deleted, then purged. It's the initial "Results returned and awaiting validation" & then the "Workunits waiting for assimilation" that just isn't making any sort of dent in their backlogs at present. Once they clear, then we'll see how well the Purger is or isn't coping (and it probably won't cope as it's on Bruno, which is the upload server, which has been having issues for months now). Grant Darwin NT |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
Well, at least that tie-breaker validated immediately, and knocked another five off the 'waiting for validation' list. Sure, there are more stages to complete - but it's on its way again now. |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
I wonder how much would the database shrink if setiathome reduced their ridiculously long deadlines. My oldest task that is still waiting for validation I returned in October. Astropulse was added a couple years later so its deadline is more reasonable but the deadline of normal tasks is a relic from the nineties. Computers (even Raspberry Pis) are orders of magnitude faster now. When the tasks linger in the database for months, at some point we reach the point where maintaining those database rows has consumed more computer resources than what would have been needed if the servers crunched the tasks themselves. |
Schatten Send message Joined: 12 Oct 02 Posts: 18 Credit: 14,047,388 RAC: 9
|
Getting some Workunits but many of the Vlars are very short (I hope really are or I a have a problem). That's a bit sad. Disclaimer: I am using the new Driver 20.1.3, also the updated apps (Since the 21th of January 2020). I know that some bad invalid tasks will show up sooner or later from the time before I used the new Apps. I am sorry for that. :-/ |
rob smith ![]() Send message Joined: 7 Mar 03 Posts: 22941 Credit: 416,307,556 RAC: 380
|
The size reduction wouldn't be that much. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
The size reduction wouldn't be that much.56% of the tasks in my 'Validation pending' list are ones I returned over 1 week ago. |
©2026 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.