Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 36 · 37 · 38 · 39 · 40 · 41 · 42 . . . 94 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13960 Credit: 208,696,464 RAC: 304 ![]() ![]() |
The Results-ready-to-send are what we call Work Units.There is only 1 copy of any given WU. It is downloaded as many times as requiredWe are not talking about workunits (unique sets of data processed by multiple hosts) but actual tasks given to individual hosts. The term the ssp is using is 'result' (even when it's still an uncrunched task to be sent to a host, not an actual crunched result yet). And because rrts stands for 'Results ready to send', this suggests that this duplicating is counted in it. Work Units are what are allocated & downloaded by each system (generally referred to as being sent) to process (they are also called tasks). It is still a separate row in the database even when sharing the same file on the download server and database performance seems to be the issue here, not the disk capacity on the download servers.No, it's not a disk capacity issue, it's a workload issue, and the more work the system has to keep track off, the greater the server problems are. So the more times a WU has to be processed before it's declared Valid, Invalid or an Error, the greater the load on the database. One thing that seems weird is the very high ratio between 'Results returned and awaiting validation' and 'Workunits waiting for validation' on the ssp - currently about 6.4.The number you are quoting isn't related to the 2 terms you quoted. When things are working 'Results returned and awaiting validation' is usually less the 'Results out in the field', quite a bit less. And when things are working properly, 'Workunits waiting for validation' and 'Workunits waiting for assimilation' are effectively 0, they are handled as they occur and you might occasionally see a dozen or so as a value there, but that's not very often. When things are working properly... Grant Darwin NT |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1647 Credit: 12,921,799 RAC: 89 ![]() ![]() |
I think one thing that could help the server load is if they could release a big bunch of Astropulse work. Astropulse tasks take many times longer to crunch, so if a bigger proportion of tasks would be them, then the number of tasks 'in flight' at any given time would be lower. I agree this would certainly help, it would only help while the work was being processed then we would go back to being in the same situation unless they were able to upgrade servers to cope with capacity ![]() |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Nope. The splitters produce the actual work unit, this is then copied to send to each participating host.Nothing is "sent" to the hosts. Hosts ask for work at their own pace and this pace is independent of the rate at which the splitters can produce work as long as it is the limiting bottleneck. The extra triplicates will just wait in the rrts queue until the hosts can digest them. . . I give up, have it your way, and enjoy your little world. Stephen :( |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13960 Credit: 208,696,464 RAC: 304 ![]() ![]() |
And once again the splitters take a rest & we're out of work. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13960 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Even with the splitters shut down and the In-progress falling like a stone, the Validation, Assimilation & Deletion backlog remains, barely touched. Grant Darwin NT |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22817 Credit: 416,307,556 RAC: 380 ![]() ![]() |
With the validators being very slow and picky it is very difficult to see how well my latest RPi is doing. My existing computers appear to be having tasks validated while for the new one they are being ignored. But this may be down to the fact that the backup database is now over 8 hours behind main.... Backup is obliviously running as it is making progress, but not at one hour per hour. It's running down, so sombody needs to wind its clock up again. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Although no new workunits are being split, I've been getting substantial numbers of replacement _2 tasks. That implies that the validators are working, and failing substantial numbers of matches (it may be that many of my _2s turn out to be overflows, and vanish in a flash). Somebody mentioned 'initial replication'. I have a dim memory that we discussed this years ago, and found that the number should more accurately be called 'current replication' - the figure you see may not necessarily be the true initial number. It would be hard to check that until the databases sync up, and we can check newly-split work (ha!) in real time. Edit - see Initial Replication of FOUR?? Any comments |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13960 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I see the splitters fired up for a short while there, then gave up again. The Validation backlog has dropped by a few hundred thousand (it needs to drop by a good 4 million). The Deletion backlog has almost cleared, but that's probably because the Assimilator backlog has increased. Reducing the server side limits doesn't seem to have helped things much, if at all. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 1 Apr 13 Posts: 1859 Credit: 268,616,081 RAC: 1,349 ![]() ![]() |
Back to Einstein already ... ![]() ![]() |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Reducing the server side limits doesn't seem to have helped things much, if at all. The effect of reduce the limits will bee see in days (or weeks) only due the way SETI works. As you could see in the SSP the total WU is about 28-29MM and the daily production of the entire SETIverse is about 3 MM of WU. So if all of those WU where validated and cleared (something not realistic) it will take > 4 days to down this numbers to a more manageable size (< 20 MM WU). That is one of the side effects when you mess with the size of a DB. Increasing is fast, decrease is slow. In the real world we could expect the changes will start to make difference by the end of this week or sooner if some housekeeping on the DB could be done maybe at the next "mal outrage" day. My 0.02 ![]() |
![]() Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 ![]() |
Just got told I had a stalled download after I did a manual update because I was running out of tasks on my most active box. Retried the download. Waiting for the next scheduler to kick. Sat 18 Jan 2020 08:42:40 AM CST | SETI@home | Scheduler request completed: got 0 new tasks Maybe later. Alligator? Tom A proud member of the OFA (Old Farts Association). |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Even with the splitters shut down and the In-progress falling like a stone, the Validation, Assimilation & Deletion backlog remains, barely touched.Validation backlog did shed a digit. It is under 10 million now. |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22817 Credit: 416,307,556 RAC: 380 ![]() ![]() |
...and my latest RPi is now showing some credit Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
After awhile Crocodile |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Replica continues to climb in time behind master. Up to over 10hrs now I’m also not getting very many tasks. Most requests end up with project has no tasks. Occasionally I’ll get 1 task. Nothing substantial for several hours now. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Lewbylews6 Send message Joined: 17 Jan 20 Posts: 1 Credit: 29,096 RAC: 0 |
On my account it states that i currently have 232 tasks in progress but my BOINC Manager isnt processing anything, when i request an update the most recent log in my Event Log says the following; 18/01/2020 15:48:59 | SETI@home | update requested by user 18/01/2020 15:49:01 | SETI@home | Sending scheduler request: Requested by user. 18/01/2020 15:49:01 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU 18/01/2020 15:49:04 | SETI@home | Scheduler request completed: got 0 new tasks 18/01/2020 15:49:04 | SETI@home | Project has no tasks available Can anyone help in getting me more work to process? This is from my account; All tasks for Lewbylews6 State: All (241) · In progress (232) · Validation pending (0) · Validation inconclusive (0) · Valid (9) · Invalid (0) · Error (0) Application: All (241) · AstroPulse v7 (0) · SETI@home v8 (241) Thanks, L |
![]() Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 ![]() |
We're all in that boat. The project has problems with its servers, you can check that on this page: https://setiathome.berkeley.edu/show_server_status.php. The Results ready to send are incredibly low, while all the other numbers are high, so something is wrong with the back-end and we'll just have to wait until that's fixed. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Although no new workunits are being split, I've been getting substantial numbers of replacement _2 tasks. That implies that the validators are working, and failing substantial numbers of matches (it may be that many of my _2s turn out to be overflows, and vanish in a flash). Wow, that thread goes back a long way. And the issue is with the linguistical choice of the proper definition of "initial". In my previous example of WU https://setiathome.berkeley.edu/workunit.php?wuid=3756812713 it is apparent that initial replication was 2 as evidenced by the sequential WU numbers originally generated. Only when the second original was not returned in time did the replacement get generated and now the IR number was bumped. So it looks like the code that generates the IR is ancient and has been discussed in antiquity. So I take back my assertion that the recent AMD code change instantly created 3 sequential WU's in initial replication. Now just to wait and see if all of the recent changes have ANY effect on the database size. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Wow, that thread goes back a long way.I have a dim memory, but this message board remembers everything. And it has a search tool! |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Replica continues to climb in time behind master. Up to over 10hrs now . . Same here, this machine was completely OOW on the GPUs so I had to cannibalise the CPU queue. But it is still "no tasks available" and will be OOW again very soon .... Stephen :( |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.