Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 107 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13771 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I'm sure the guys don't want to touch these things anymore, but maybe someone should take a look at it and see if there's something they can do to get it to move towards recovery.Set deadlines for all new work (inc AP to 2 weeks, set Resend deadlines to 3 days. Within a couple of weeks the bloat should be significantly reduced. Within a month, a huge dent. Enough for the Assimilators to do their thing again at the very least. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13771 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Replica seconds behind master 66,057It's reached a new record, and is now setting the bar as high as it can. Grant Darwin NT |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Maybe when you posted you did something because the splitters are currently running at over 94 a second.it is certainly better than the 3 point something that they were running at. At most before hibernation we can only have another 2 weekly outages assuming that they decide to do maintenanceSplitters start and stop as needed to maintain the result table size below 21 milllion rows. When they are running, you get a high result generation rate. When they are not running, you get a low rate from resends only. If they start or stop during the time window that the rate data on SSP is gathered from, you get something in between. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Not a new record yet. I have seen it way above 100,000 seconds in the past.Replica seconds behind master 66,057It's reached a new record, and is now setting the bar as high as it can. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13771 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Ah, the current graphs don't go that far back.Not a new record yet. I have seen it way above 100,000 seconds in the past.Replica seconds behind master 66,057It's reached a new record, and is now setting the bar as high as it can. Give it time, it's not far from that now. Grant Darwin NT |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Replica seconds behind master 81,909 and rising. Get new work is almost a lottery, no stats are been generated and UL retries are the new normal. Are you sure is not to press the panic bottom? Maybe after breakfast? You know we have a new toaster to debut. ![]() |
![]() Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 ![]() |
I promised myself I would stay out of this and just enjoy with cola and nuts, but out of nuts... so... A result is a task, and a task is a result. Two words for the same thing. If there are two of them in the field, they are both tasks, and they are both results. There are, however, two tasks/results (at least) for each WU.At this project: One Work Unit == two identical tasks sent to different hosts. In a certain past BOINC changed that what the computer returns is a result file. So, One Work Unit == two identical tasks sent to different hosts which calculate the data therein and send a result file back. Is easiest to remember it that way. You have to do something to a task first before you get a result. That the SSP doesn't show this here is because the SSP code hasn't been changed here in absolute ages. https://github.com/BOINC/boinc/blob/master/html/user/server_status.php writes echo "</td><td>\n"; echo "<h3>".tra("Computing status")."</h3>\n"; echo "<h4>".tra("Work")."</h4>\n"; start_table('table-striped'); item_html("Tasks ready to send", $j->results_ready_to_send); item_html("Tasks in progress", $j->results_in_progress); item_html("Workunits waiting for validation", $j->wus_need_validate); item_html("Workunits waiting for assimilation", $j->wus_need_assimilate); item_html("Workunits waiting for file deletion", $j->wus_need_file_delete); item_html("Tasks waiting for file deletion", $j->results_need_file_delete); item_html("Transitioner backlog (hours)", number_format($j->transitioner_backlog, 2)); end_table(); echo "<h4>".tra("Users")."</h4>\n"; start_table('table-striped'); item_html("With credit", $j->users_with_credit); item_html("With recent credit", $j->users_with_recent_credit); item_html("Registered in past 24 hours", $j->users_past_24_hours); end_table(); echo "<h4>".tra("Computers")."</h4>\n"; start_table('table-striped'); item_html("With credit", $j->hosts_with_credit); item_html("With recent credit", $j->hosts_with_recent_credit); item_html("Registered in past 24 hours", $j->hosts_past_24_hours); item_html("Current GigaFLOPS", round($j->flops, 2)); end_table(); |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14656 Credit: 200,643,578 RAC: 874 ![]() ![]() |
To accompany your snack: Manager: use "task" rather than "result" in text |
![]() Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 ![]() |
Yes because at first it was all called results. If you check the database files, you'll find they still store entries based on result, whether they're tasks or results or not. If you check the database files you'll also find that Seti has its own entries in various database files. I'm waiting for Windows 10 to index all files on my computer so I can search inside of them (why isn't this done by default?) before I continue my search. |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19161 Credit: 40,757,560 RAC: 67 ![]() ![]() |
The replica has passed a milestone, now over a day behind, 87,830 s. A day is 86,400 s. |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 ![]() ![]() |
The replica has passed a milestone, now over a day behind, 87,830 s. A day is 86,400 s. I also see that results out in the field have dropped by around 200,000 there is now around 5.8 million ![]() |
![]() ![]() Send message Joined: 6 Jun 99 Posts: 233 Credit: 200,655,462 RAC: 212 ![]() ![]() |
I think we may see a replica lag >= 1000000 seconds, especially if they forego maintenance until THE END. Member of the 20 Year Club ![]() ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
To accompany your snack: Manager: use "task" rather than "result" in textA somewhat illogical change because in that place the word actually refers to the result produced by a completed task. It is the returned results you get credit for, not the tasks. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
The replica has passed a milestone, now over a day behind, 87,830 s. A day is 86,400 s. mostly due to the splitter output sputtering along. not pumping out nearly as many as it was yesterday. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
The total result count is still staying steadily very close to 21 mil, so there is no problem in splitting process. The splitters are just being throttled because assimilator queue is hogging all the database space.I also see that results out in the field have dropped by around 200,000 there is now around 5.8 millionmostly due to the splitter output sputtering along. not pumping out nearly as many as it was yesterday. |
![]() ![]() ![]() Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62 ![]() ![]() |
the outage makes the server crazy ^^ 3839686335 the initial quorum of 2 was filled after the wingman task deadline but the server wasn't programmed for this case .. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Replica seconds behind master 94,995 Will the 100K the mark to set the panic mode to ON? ![]() |
![]() Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 ![]() |
Nice, an example that has all of them in the correct order: double rsc_disk_bound; // upper bound on amount of disk needed (bytes) // (including input, output and temp files, but NOT the app) // used for 2 purposes: // 1) for scheduling (don't send this WU to a host w/ insuff. disk) // 2) abort task if it uses more than this disk bool need_validate; // this WU has at least 1 successful result in // validate state = INIT Lines 458 and further in https://github.com/BOINC/boinc/blob/master/db/boinc_db_types.h |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
the outage makes the server crazy ^^That's normal for the boinc server, not any extra crazyness due to outage or anything. You can return your result after the deadline and you get the credit as long as the workunit is still in the database. Even when it has been assimilated already and is waiting to be deleted. Returning the expired result will change its status from error to valid. |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 ![]() ![]() |
It will be interesting to see whether or not it is just a interim situation that the results out in the field is at 5.87 million or whether or not this will help clear some backlogs as people could be moving to other projects. I also wonder whether or not turning the replica database or for a week would help things and then allow it to catch up while no new work is been sent out. On the other hand as other people have mentioned not long to go until the project is shut for hibernation ![]() |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.