Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 82 · 83 · 84 · 85 · 86 · 87 · 88 . . . 107 · Next
Author | Message |
---|---|
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Just got some resends of tasks that are due to expire April 30th. So it is now 12 days before due date.My CPU is currently crunching stuff with May 8 deadlines and doing them in deadline order. If they start resending those early, they are just needlessly postponing their assimilation because my computer is returning them very soon. Mine is crunching now May 23 and have work until the mid of June, so if it`s a waste of time please someone let me know. At least i save some electric power. ![]() |
![]() Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 ![]() |
It's a miracle! HURRAYYYYY!!!! A proud member of the OFA (Old Farts Association). |
Ulrich Metzner ![]() Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 ![]() ![]() |
Wow: 18/04/2020 22:06:24 | SETI@home | Scheduler request completed: got 2 new tasks 18/04/2020 22:06:24 | SETI@home | Project requested delay of 1818 seconds 18/04/2020 22:06:26 | SETI@home | Started download of blc73_2bit_guppi_58693_10159_HIP99390_0147.20075.0.22.45.1.vlar 18/04/2020 22:06:26 | SETI@home | Started download of blc64_2bit_guppi_58838_14095_TIC249067445_0058.30016.818.20.29.227.vlar 18/04/2020 22:06:34 | SETI@home | Finished download of blc73_2bit_guppi_58693_10159_HIP99390_0147.20075.0.22.45.1.vlar 18/04/2020 22:06:34 | SETI@home | Finished download of blc64_2bit_guppi_58838_14095_TIC249067445_0058.30016.818.20.29.227.vlar...and then also VLARs. :/ Aloha, Uli |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Well, it does appear to be bringing the "Results returned and awaiting validation" and "Results out in the field" in to line with each other. ![]() I don't remember the actual values, but my Pendings appear to have almost halved (roughly 1500 to 840), and my Valids over ten times what they were (40 or so to almost 500) and Inconclusives almost 3 times higher (roughly 80 to 230). Grant Darwin NT |
Wild6-NJ Send message Joined: 4 Aug 99 Posts: 43 Credit: 100,336,791 RAC: 140 ![]() |
Seems to me that the purpose in all of this is to reach quorum quicker on all the pending Workunits. Once this is accomplished, the extra results not returned on those WUs will be cancelled by the server(if not started by the client yet). This accomplishes the science quicker while disappointing those with large caches. |
![]() Send message Joined: 27 Dec 15 Posts: 123 Credit: 92,602,985 RAC: 172 ![]() ![]() |
Sample a couple of the individual tasks by clicking on the task #. You will find that most are re-sends from users who haven't contacted the server since the day the task was sent to them. I got 55 CUDA32s sent to one of my older rigs today. Same story. Most of the users hadn't contacted the server since early March. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Seems to me that the purpose in all of this is to reach quorum quicker on all the pending Workunits. . . The servers do not know if any task 'in progress' is actually started by the client or not. And the system will not terminate outstanding tasks until their deadline is reached, unless there is manual intervention. Stephen . . |
Wild6-NJ Send message Joined: 4 Aug 99 Posts: 43 Credit: 100,336,791 RAC: 140 ![]() |
I've had tasks cancelled on me on rosetta before my client started on them. They hadn't expired. On Seti, when results have started processing and go beyond their expiration, they're allowed to finish. If they haven't started by expiration they're cancelled. Don't know if that's decided on the client-side or not. Anyway, I think the server finds out which results are actually in progress when the client checks-in. It can then cancel any of the unstarted results at that time. True, if the client goes AWOL, then the solution is more difficult. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
I've had tasks cancelled on me on rosetta before my client started on them. They hadn't expired. . . Are you sure? Rosetta has a 72 hour deadline. On Seti, when results have started processing and go beyond their expiration, they're allowed to finish. . . When a deadline is reached a task is expired and a resend is initiated. There does appear to be 'period of grace' in which a late result will be accepted. I have completed a resent task only to have my result trashed because the original host finally returned a result sometime between my host receiving the resend and my result being returned. But this is a period of only a few hours at most. Anyway, I think the server finds out which results are actually in progress when the client checks-in. It can then cancel any of the unstarted results at that time. . . Nope, to the best of my knowledge there are only two parts to that transaction, the first is the reporting of already uploaded results and the second is the request for more work. The servers are not interested in what is currently running, just what is completed. Stephen . . |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
It doesn't help when you only send one early logjam-breaker, but both original partners are MIA. WU 3926013786. Server operators have two options available to them: * Abort if unstarted * Abort unconditionally But neither will be used unless the human administrators call them into action - the servers won't do it by themselves. |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
. . Are you sure? Rosetta has a 72 hour deadline. Yes some of their work has a 72 hour deadline ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Server operators have two options available to them:The server doesn't know which tasks are unstarted and which are not. Client reports the tasks it has on scheduler request but it doesn't tell their status. Unstarted, running and completed but not yet reported tasks all look the same. If the server offers the operators the option to abort unstarted tasks, then the only way for that option to work is the server telling the client to do this aborting. Which means the aborting won't happen if the client is MIA. |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
I have processed and returned the following week
19/04/2020 7:28:22 PM | SETI@home | Started download of blc64_2bit_guppi_58838_31043_TIC66561343_0116.11277.409.20.29.229.vlar 19/04/2020 7:28:25 PM | SETI@home | Finished download of 08ap11ah.7795.20108.7.34.160 19/04/2020 7:28:26 PM | SETI@home | Finished download of blc64_2bit_guppi_58838_31043_TIC66561343_0116.11277.409.20.29.229.vlar
![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
. . When a deadline is reached a task is expired and a resend is initiated. There does appear to be 'period of grace' in which a late result will be accepted. I have completed a resent task only to have my result trashed because the original host finally returned a result sometime between my host receiving the resend and my result being returned. But this is a period of only a few hours at most.Expired results are accepted as long as the WU exists in the database. If the WU has been purged, then the returned result won't get credit but an error. And when that happens, the resend that is still out won't get automatically aborted. It gets credit too when it is returned and the WU won't be purged as long as the resend hasn't been returned or timed out. If your result was trashed, then it wasn't because of this but for some other reason. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Yes, that's right. The client has to initiate the conversation, and the server will send the 'abort if unstarted' message in the reply. Then it's up to the client to check if it has started, and act accordingly.Server operators have two options available to them:The server doesn't know which tasks are unstarted and which are not. Client reports the tasks it has on scheduler request but it doesn't tell their status. Unstarted, running and completed but not yet reported tasks all look the same. If the server offers the operators the option to abort unstarted tasks, then the only way for that option to work is the server telling the client to do this aborting. Which means the aborting won't happen if the client is MIA. It happens quite a lot if you have a large cache, and you receive a resend of a task which has timed out. If you're running a standard client (without prioritisation of resends), and the original owner gets their report in a little bit late, then the server can send a "didn't need" conditional kill message. Rare to see it for SETI MB tasks - more common for AP, and at other projects. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Yes, that's right. The client has to initiate the conversation, and the server will send the 'abort if unstarted' message in the reply. Then it's up to the client to check if it has started, and act accordingly. . . I had never seen that message until lately, I have seen it several times since the March 30th Script SNAFU. Stephen :( |
BetelgeuseFive ![]() Send message Joined: 6 Jul 99 Posts: 158 Credit: 17,117,787 RAC: 19 ![]() ![]() |
Weird, this https://setiathome.berkeley.edu/show_host_detail.php?hostid=8823418 host has thousands of active tasks. It is returning results for tasks sent today while it has lots of tasks that were sent back in March and have (much) earlier deadlines. Does it have thousands of ghost tasks ? Tom |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Go about 500 into his valid list and you’ll see that he’s returning tasks from late March now. His client prioritizes resends and sends those back first. There have been a lot of resends going out in the last few days Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Weird, this https://setiathome.berkeley.edu/show_host_detail.php?hostid=8823418 host has thousands of active tasks. It is returning results for tasks sent today while it has lots of tasks that were sent back in March and have (much) earlier deadlines. Does it have thousands of ghost tasks ? No ghosts or anything weird. It´s just uses a different client who crunch first the resends and them in a dateline order not the regular FIFO order. You could see mine host who does the same. That helps the server by clearing the resends ASAP. ![]() |
![]() ![]() Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 ![]() |
Well, it does appear to be bringing the "Results returned and awaiting validation" and "Results out in the field" in to line with each other. If you use the custom zoom settings on https://munin.kiska.pw/munin/Munin-Node/Munin-Node/results_setiathomev8_in_progress_validation.html and https://munin.kiska.pw/munin/Munin-Node/Munin-Node/results_setiathomev8_creation.html, you can get some more meaningful insights for the past 48 hours or so. ![]() You can see that the "Results returned and awaiting validation" is now slightly below "Results out in the field" for SETI@home v8 although AstroPulse results are still a bit behind, probably for different reasons. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.