Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 107 · Next
Author | Message |
---|---|
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19707 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Vurry odd. First, no tasks issued today the 4th, have been validated. Second. All the tasks validated on the 3rd have been purged, but not those from the 2nd. https://setiathome.berkeley.edu/results.php?userid=8083616&offset=40&show_names=0&state=4&appid=29 |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Quoting myself: When the reports start working reliably, the assimilation queue will probably hit a new all time high. The queue has grown rapidly by about 600,000 workunits after each downtime and it is now higher than it has ever been just before that post downtime growth.I was right. We are there now. The previous record was 4.25 million and now we are at 4.4 million and still going up... |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
First, no tasks issued today the 4th, have been validated.If you want others to be able to open your links, link the computer specific page. User pages are not visible to other users. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Over 21 milllion results in the database. Why are the splitters still running? |
![]() ![]() ![]() Send message Joined: 1 Apr 13 Posts: 1859 Credit: 268,616,081 RAC: 1,349 ![]() ![]() |
Over 21 milllion results in the database. Why are the splitters still running? They aren't. ![]() ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
SSP says they are.Over 21 milllion results in the database. Why are the splitters still running?They aren't. Edit: Looking at the history now and apparently they were running continuously (or at least whenever the SSP was sampled) up to 22:10 SSP update. 22:20 the page didn't update and in 22:30 and 22:40 updates they were stopped but 22:50 running again. |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
Quoting myself: Results awaiting validation as over ![]() |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19707 Credit: 40,757,560 RAC: 67 ![]() ![]() |
First, no tasks issued today the 4th, have been validated.If you want others to be able to open your links, link the computer specific page. User pages are not visible to other users. Sorry, I did know that, put it down to a senior moment. https://setiathome.berkeley.edu/show_host_detail.php?hostid=8708959 |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Results awaiting validation as overThere is really only 5.5 million results waiting for validation. The remaining 9.7 million are validated results waiting for assimilation. They are shown together because SSP has no separate field for results waiting for assimilation. I got this 9.7 million by multiplying the 4.45 million workunits waiting for assimilation by the average replication 2.19. And this 2.19 came from the the results waiting for db purging divided by workunits waiting for dp purging. If I take that 9.7 milllion and add the 180000 waiting for purging and then divide this sum by the 5.5 milllion waiting for validation, I get 1.8, which should mean that there are 1.8 times as many validated results as there are results waiting for validation. If I calculate my own valid result count on the web site and divide it by the sum of pending and inconclusive counts, I get 1.86. Pretty good match. |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19707 Credit: 40,757,560 RAC: 67 ![]() ![]() |
I'll agree with that, if only because, in my computers valid listing, there are at least 50%, and probably nearer 60%, of the tasks that have been visible for 3 days or more, when the norm is only those validated in the last 24 hours are still visible. i.e. they haven't moved on from validation, to assimilation and then purged 24 hours later. The purged total should be approx equal to the number of tasks validated in the last 24 hours. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13958 Credit: 208,696,464 RAC: 304 ![]() ![]() |
We listen, then look at the facts, and the timing and order in which things occurred, and so disregard what you are saying. Fix the backlog of Results returned and awaiting validation, and then the Assimilators will be able to do their thing.The system is having a hard time with assimilation.I've been saying that for days, but no one listens, or say's "no, it's the RAM being swamped." The Results returned and awaiting validation blew out, then the Assimilator backlog came in to existence. The Results returned and awaiting validation blew out even further, and the Assiliator backlog got even larger. Results returned and awaiting validation blew out further, and the Assimilator backlog grew even larger. Cause, effect. Cause, effect. The Effect doesn't cause the cause. Grant Darwin NT |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19707 Credit: 40,757,560 RAC: 67 ![]() ![]() |
We listen, then look at the facts, and the timing and order in which things occurred, and so disregard what you are saying. Fix the backlog of Results returned and awaiting validation, and then the Assimilators will be able to do their thing.The system is having a hard time with assimilation.I've been saying that for days, but no one listens, or say's "no, it's the RAM being swamped." But looking at the tasks on my computer, There are NO tasks awaiting Validation. There are 10 problem tasks where it states they have validated but the wingman hasn't reported all 30th Jan. The problem is that they haven't been purged, and going by the SS numbers it looks like they are stuck in the Validation process, have not been Assimilated and forwarded to the purging process. The purging numbers for tasks should be equal to the number of tasks validated in the previous 24 hrs, just like it was on 15th Nov. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13958 Credit: 208,696,464 RAC: 304 ![]() ![]() |
The problem is that they haven't been purged, and going by the SS numbers it looks like they are stuck in the Validation process, have not been Assimilated and forwarded to the purging process. The purging numbers for tasks should be equal to the number of tasks validated in the previous 24 hrs, just like it was on 15th Nov.Purging is set to occur 24 hours (or so) after a WU (and it's result files) have been deleted. No files are Deleted until after the canonical result has been Assimilated. Assimilation cannot occur until the WU has been Validated (or declared dead). A WU cannot be Validated until Quorum has been reached (which may require anywhere from 2 to 10 results to achieve), or the WU is declared dead due to too many errors. There is nothing stuck in the Validation process- they are just WUs waiting for enough results to to reach Quorum to Validate, or error out. Whichever occurs first. And the only Purge numbers on the SSP show what is waiting to be processed (from the SSP- Workunits waiting for db purging & Results waiting for db purging). That number will grow or shrink depending on how fast or slow Purging is occurring and how fast or slow WUs & Results are being Deleted. And that Deleted number will grow or shrink depending on how fast Deletion is occurring, and how fast or slow Assimilation is occurring. And all the rest of it i can't be bothered pointing out as i know it's not making any impact on your misconceptions of how the system works. Suffice to say that expecting a certain value in the purge numbers for a given number of WUs that have been Validated- even though the Purge numbers come from the results of Validation - is doomed to failure as they are not comparable in any way, shape or form. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13958 Credit: 208,696,464 RAC: 304 ![]() ![]() |
But looking at the tasks on my computer, There are NO tasks awaiting Validation.I just looked at your Task list and there are 582 sitting there waiting for Validation. What do you think Validation Pending & Validation Inconclusive mean? What it does mean is that you have processed those WUs, you have returned a Result, but they are still waiting to be Validated. Grant Darwin NT |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19707 Credit: 40,757,560 RAC: 67 ![]() ![]() |
But looking at the tasks on my computer, There are NO tasks awaiting Validation.I just looked at your Task list and there are 582 sitting there waiting for Validation. Did it occur for you to look at those tasks, in the pending list, in not one of them, as yet, has the 1st wingman reported in. in the Inconclusive list it is similar it is still waiting a wingman to report. So of course they haven't been validated. But in the Valid list there are approx 830 tasks that were validated before 09:00 4th March 2020, yesterday, (that is the normal 24 period) that haven't been purged. These are the tasks causing the bloat. And I guess as I cannot see the results of the Assimilation process, or the Science database, that there are a large number of tasked listed in the Validators and Assimilators that there is something blocking the tasks moving out of Validation into Assimilation and also out of Assimilation. Indicating to me that the blockage is in Assimilation, in that tasks there are not clearing and blocking further tasks moving out of the Validators. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
The Results returned and awaiting validation blew out, then the Assimilator backlog came in to existence. The Results returned and awaiting validation blew out even further, and the Assiliator backlog got even larger. Results returned and awaiting validation blew out further, and the Assimilator backlog grew even larger.You choose to ignore the fact that results that are validated but waiting for assimilation do not have their number displayed separately on SSP but are included in 'results waiting for validation' count. There is nothing unusual in the number of tasks that are really waiting for validation. There are now about 5.34 million results that have been returned but not validated yet. It's the 9.66 million results stuck in assimilation queue that blow up the results waiting for validation' count on SSP. When there is a problem in validation, the 'Workunits waiting for validation' count will go up. It lists the workunits that are ready to be validated i.e. all results returned, but have not been validated yet. That number is very small and this indicates that validation is working fine. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13958 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Why would i?Did it occur for you to look at those tasks, in the pending list,But looking at the tasks on my computer, There are NO tasks awaiting Validation.I just looked at your Task list and there are 582 sitting there waiting for Validation. You stated, as i quoted above, and will quote yet again But looking at the tasks on my computer, There are NO tasks awaiting Validation.You now have 587 WUs that are waiting for Validation. in not one of them, as yet, has the 1st wingman reported in. in the Inconclusive list it is similar it is still waiting a wingman to report. So of course they haven't been validated.So WTF would you claim to have no WUs waiting on Validation, if you agree you have WUs waiting on Validation??????? Seriously??????? But in the Valid list there are approx 830 tasks that were validated before 09:00 4th March 2020, yesterday, (that is the normal 24 period) that haven't been purged.No, they are the symptom of the bloat. The bloat is, and continues to be, Results returned and awaiting validation. Everything else has followed on from that. Even blind Freddy looking at the graphs could see this, but you choose not too. And I guess as I cannot see the results of the Assimilation process, or the Science database, that there are a large number of tasked listed in the Validators and Assimilators that there is something blocking the tasks moving out of Validation into Assimilation and also out of Assimilation.What is blocking the Tasks moving out of Validation, is that they are still waiting to be Validated. The increased Quorum. Remember that? The major cause of the increase in Results returned and awaiting validation. They won't move out of Validation, untill they are Validated. There is nothing stopping them from moving out of that state, other than waiting for a result to be returned to provide the necessary Quorum. They won't Validate, until that Quorum is met. Once they Validate, then they will move on to Assimilation, which isn't working as well as it should, due to the bloat in Results returned and awaiting validation. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13958 Credit: 208,696,464 RAC: 304 ![]() ![]() |
When there is a problem in validation, the 'Workunits waiting for validation' count will go up. It lists the workunits that are ready to be validated i.e. all results returned, but have not been validated yet. That number is very small and this indicates that validation is working fine.Which i and several others have attempted to point out many times but keep being told there is a problem with Validation. We agree- there is no problem with Validation. There never was. There isn't an issue with Validation. Validation is not an issue. You choose to ignore the fact that results that are validated but waiting for assimilation do not have their number displayed separately on SSP but are included in 'results waiting for validation' count.As to the rest of what you are saying, you need to read the bottom of the Server Status page again to understand what the actual Statuses mean. You are assuming & attributing things to an incorrect understanding of the the meanings of at least one (if not more) of the database status terms. Once a WU has been Validated, the results would no longer be in the the Results returned and awaiting validation. But since there isn't a list specifically for Result files waiting for Assimilation, you decided that it must still be included in in the the Results returned and awaiting validation, even though by the very definition of that, it means they wouldn't be. The fact is they just aren't displayed on the Server Status page. Grant Darwin NT |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19707 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Quick one, could the problems be in the transitioner: Handles state transitions of workunits and results. Basically, the transitioners keep track of the results in progress and makes sure they properly move down the pipeline. It is always asking the questions: Is this workunit ready to send out? Has this result been received yet? Is this a valid result? Can we delete it now? quote from SS page |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Once a WU has been Validated, the results would no longer be in the the Results returned and awaiting validation. But since there isn't a list specifically for Result files waiting for Assimilation, you decided that it must still be included in in the the Results returned and awaiting validation, even though by the very definition of that, it means they wouldn't be.I they weren't listed at all, then the total number of results in the database would be over 30 million now. Eric specifically said that 20 mil is the limit they want to stay under to avoid the database rows spillling out of RAM. The sum of all the displayed result counts on SSP tracked this exact 20 mil until after last downtime it increased to 21 mil and is now tracking it. Perhaps they discovered that 21 mil still fits in RAM and adjusted the splitter throttling: ![]() So it is obvious that every result is included in some of the displayed counts. And results waiting for validation is the only count big enough for those nearly 10 million unassimilated results to fit. They also fit there nearly perfectly leaving the number of results that really are waiting for validation very near its normal historical value. Also look at your own results on the web site Do you really have way more results in pending or inclonclusive states than you have normally had before the current problems started? |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.