The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 82 · 83 · 84 · 85 · 86 · 87 · 88 . . . 107 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2045685 - Posted: 18 Apr 2020, 19:26:36 UTC - in response to Message 2045678.  
Last modified: 18 Apr 2020, 19:27:16 UTC

Just got some resends of tasks that are due to expire April 30th. So it is now 12 days before due date.

edit: some of the early resends I got yesterday have a May7 deadline. I should be finish them today, but I hope it doesn't resend the resends before I do them.
My CPU is currently crunching stuff with May 8 deadlines and doing them in deadline order. If they start resending those early, they are just needlessly postponing their assimilation because my computer is returning them very soon.

Mine is crunching now May 23 and have work until the mid of June, so if it`s a waste of time please someone let me know. At least i save some electric power.
ID: 2045685 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 2045690 - Posted: 18 Apr 2020, 19:54:18 UTC - in response to Message 2045574.  

It's a miracle!
18/04/2020 18:33:57 | SETI@home | Scheduler request completed: got 150 new tasks


HURRAYYYYY!!!!
A proud member of the OFA (Old Farts Association).
ID: 2045690 · Report as offensive     Reply Quote
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 2045694 - Posted: 18 Apr 2020, 20:08:18 UTC

Wow:
18/04/2020 22:06:24 | SETI@home | Scheduler request completed: got 2 new tasks
18/04/2020 22:06:24 | SETI@home | Project requested delay of 1818 seconds
18/04/2020 22:06:26 | SETI@home | Started download of blc73_2bit_guppi_58693_10159_HIP99390_0147.20075.0.22.45.1.vlar
18/04/2020 22:06:26 | SETI@home | Started download of blc64_2bit_guppi_58838_14095_TIC249067445_0058.30016.818.20.29.227.vlar
18/04/2020 22:06:34 | SETI@home | Finished download of blc73_2bit_guppi_58693_10159_HIP99390_0147.20075.0.22.45.1.vlar
18/04/2020 22:06:34 | SETI@home | Finished download of blc64_2bit_guppi_58838_14095_TIC249067445_0058.30016.818.20.29.227.vlar
...and then also VLARs. :/
Aloha, Uli

ID: 2045694 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13913
Credit: 208,696,464
RAC: 304
Australia
Message 2045707 - Posted: 18 Apr 2020, 23:02:38 UTC
Last modified: 18 Apr 2020, 23:07:25 UTC

Well, it does appear to be bringing the "Results returned and awaiting validation" and "Results out in the field" in to line with each other.




I don't remember the actual values, but my Pendings appear to have almost halved (roughly 1500 to 840), and my Valids over ten times what they were (40 or so to almost 500) and Inconclusives almost 3 times higher (roughly 80 to 230).
Grant
Darwin NT
ID: 2045707 · Report as offensive     Reply Quote
Wild6-NJ
Volunteer tester

Send message
Joined: 4 Aug 99
Posts: 43
Credit: 100,336,791
RAC: 140
Message 2045719 - Posted: 19 Apr 2020, 1:06:21 UTC - in response to Message 2045707.  
Last modified: 19 Apr 2020, 1:08:06 UTC

Seems to me that the purpose in all of this is to reach quorum quicker on all the pending Workunits.
Once this is accomplished, the extra results not returned on those WUs will be cancelled by the server(if not started by the client yet).

This accomplishes the science quicker while disappointing those with large caches.
ID: 2045719 · Report as offensive     Reply Quote
Profile Ghan-buri-Ghan Mike

Send message
Joined: 27 Dec 15
Posts: 123
Credit: 92,602,985
RAC: 172
United States
Message 2045723 - Posted: 19 Apr 2020, 1:46:38 UTC - in response to Message 2045685.  

Sample a couple of the individual tasks by clicking on the task #. You will find that most are re-sends from users who haven't contacted the server since the day the task was sent to them. I got 55 CUDA32s sent to one of my older rigs today. Same story. Most of the users hadn't contacted the server since early March.
ID: 2045723 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2045726 - Posted: 19 Apr 2020, 2:58:05 UTC - in response to Message 2045719.  

Seems to me that the purpose in all of this is to reach quorum quicker on all the pending Workunits.
Once this is accomplished, the extra results not returned on those WUs will be cancelled by the server(if not started by the client yet).

This accomplishes the science quicker while disappointing those with large caches.


. . The servers do not know if any task 'in progress' is actually started by the client or not. And the system will not terminate outstanding tasks until their deadline is reached, unless there is manual intervention.

Stephen

. .
ID: 2045726 · Report as offensive     Reply Quote
Wild6-NJ
Volunteer tester

Send message
Joined: 4 Aug 99
Posts: 43
Credit: 100,336,791
RAC: 140
Message 2045727 - Posted: 19 Apr 2020, 3:11:21 UTC - in response to Message 2045726.  

I've had tasks cancelled on me on rosetta before my client started on them. They hadn't expired.

On Seti, when results have started processing and go beyond their expiration, they're allowed to finish.
If they haven't started by expiration they're cancelled. Don't know if that's decided on the client-side or not.

Anyway, I think the server finds out which results are actually in progress when the client checks-in. It can then cancel any of the unstarted results at that time.

True, if the client goes AWOL, then the solution is more difficult.
ID: 2045727 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2045733 - Posted: 19 Apr 2020, 3:48:51 UTC - in response to Message 2045727.  

I've had tasks cancelled on me on rosetta before my client started on them. They hadn't expired.

. . Are you sure? Rosetta has a 72 hour deadline.

On Seti, when results have started processing and go beyond their expiration, they're allowed to finish.
If they haven't started by expiration they're cancelled. Don't know if that's decided on the client-side or not.

. . When a deadline is reached a task is expired and a resend is initiated. There does appear to be 'period of grace' in which a late result will be accepted. I have completed a resent task only to have my result trashed because the original host finally returned a result sometime between my host receiving the resend and my result being returned. But this is a period of only a few hours at most.

Anyway, I think the server finds out which results are actually in progress when the client checks-in. It can then cancel any of the unstarted results at that time.
True, if the client goes AWOL, then the solution is more difficult.


. . Nope, to the best of my knowledge there are only two parts to that transaction, the first is the reporting of already uploaded results and the second is the request for more work. The servers are not interested in what is currently running, just what is completed.

Stephen

. .
ID: 2045733 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2045746 - Posted: 19 Apr 2020, 7:15:26 UTC

It doesn't help when you only send one early logjam-breaker, but both original partners are MIA. WU 3926013786.

Server operators have two options available to them:
* Abort if unstarted
* Abort unconditionally

But neither will be used unless the human administrators call them into action - the servers won't do it by themselves.
ID: 2045746 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1646
Credit: 12,921,799
RAC: 89
New Zealand
Message 2045748 - Posted: 19 Apr 2020, 8:08:18 UTC - in response to Message 2045733.  

. . Are you sure? Rosetta has a 72 hour deadline.

Yes some of their work has a 72 hour deadline
ID: 2045748 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045749 - Posted: 19 Apr 2020, 8:29:11 UTC - in response to Message 2045746.  

Server operators have two options available to them:
* Abort if unstarted
* Abort unconditionally
The server doesn't know which tasks are unstarted and which are not. Client reports the tasks it has on scheduler request but it doesn't tell their status. Unstarted, running and completed but not yet reported tasks all look the same. If the server offers the operators the option to abort unstarted tasks, then the only way for that option to work is the server telling the client to do this aborting. Which means the aborting won't happen if the client is MIA.
ID: 2045749 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1646
Credit: 12,921,799
RAC: 89
New Zealand
Message 2045751 - Posted: 19 Apr 2020, 8:55:40 UTC
Last modified: 19 Apr 2020, 9:00:09 UTC

I have processed and returned the following week
    19/04/2020 7:28:22 PM | SETI@home | Started download of 08ap11ah.7795.20108.7.34.160
    19/04/2020 7:28:22 PM | SETI@home | Started download of blc64_2bit_guppi_58838_31043_TIC66561343_0116.11277.409.20.29.229.vlar
    19/04/2020 7:28:25 PM | SETI@home | Finished download of 08ap11ah.7795.20108.7.34.160
    19/04/2020 7:28:26 PM | SETI@home | Finished download of blc64_2bit_guppi_58838_31043_TIC66561343_0116.11277.409.20.29.229.vlar


No surprise if you guessed that the task with the numbers 64 was a noise bomb because you were right ran for 14.08 seconds


ID: 2045751 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045752 - Posted: 19 Apr 2020, 8:57:14 UTC - in response to Message 2045733.  

. . When a deadline is reached a task is expired and a resend is initiated. There does appear to be 'period of grace' in which a late result will be accepted. I have completed a resent task only to have my result trashed because the original host finally returned a result sometime between my host receiving the resend and my result being returned. But this is a period of only a few hours at most.
Expired results are accepted as long as the WU exists in the database. If the WU has been purged, then the returned result won't get credit but an error.

And when that happens, the resend that is still out won't get automatically aborted. It gets credit too when it is returned and the WU won't be purged as long as the resend hasn't been returned or timed out. If your result was trashed, then it wasn't because of this but for some other reason.
ID: 2045752 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2045767 - Posted: 19 Apr 2020, 12:15:45 UTC - in response to Message 2045749.  

Server operators have two options available to them:
* Abort if unstarted
* Abort unconditionally
The server doesn't know which tasks are unstarted and which are not. Client reports the tasks it has on scheduler request but it doesn't tell their status. Unstarted, running and completed but not yet reported tasks all look the same. If the server offers the operators the option to abort unstarted tasks, then the only way for that option to work is the server telling the client to do this aborting. Which means the aborting won't happen if the client is MIA.
Yes, that's right. The client has to initiate the conversation, and the server will send the 'abort if unstarted' message in the reply. Then it's up to the client to check if it has started, and act accordingly.

It happens quite a lot if you have a large cache, and you receive a resend of a task which has timed out. If you're running a standard client (without prioritisation of resends), and the original owner gets their report in a little bit late, then the server can send a "didn't need" conditional kill message. Rare to see it for SETI MB tasks - more common for AP, and at other projects.
ID: 2045767 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2045769 - Posted: 19 Apr 2020, 12:23:24 UTC - in response to Message 2045767.  
Last modified: 19 Apr 2020, 12:23:46 UTC

Yes, that's right. The client has to initiate the conversation, and the server will send the 'abort if unstarted' message in the reply. Then it's up to the client to check if it has started, and act accordingly.
It happens quite a lot if you have a large cache, and you receive a resend of a task which has timed out. If you're running a standard client (without prioritisation of resends), and the original owner gets their report in a little bit late, then the server can send a "didn't need" conditional kill message. Rare to see it for SETI MB tasks - more common for AP, and at other projects.


. . I had never seen that message until lately, I have seen it several times since the March 30th Script SNAFU.

Stephen

:(
ID: 2045769 · Report as offensive     Reply Quote
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 2045815 - Posted: 19 Apr 2020, 13:53:41 UTC

Weird, this https://setiathome.berkeley.edu/show_host_detail.php?hostid=8823418 host has thousands of active tasks. It is returning results for tasks sent today while it has lots of tasks that were sent back in March and have (much) earlier deadlines. Does it have thousands of ghost tasks ?

Tom
ID: 2045815 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2045820 - Posted: 19 Apr 2020, 14:06:30 UTC - in response to Message 2045815.  

Go about 500 into his valid list and you’ll see that he’s returning tasks from late March now. His client prioritizes resends and sends those back first. There have been a lot of resends going out in the last few days
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2045820 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2045821 - Posted: 19 Apr 2020, 14:06:57 UTC - in response to Message 2045815.  
Last modified: 19 Apr 2020, 14:42:12 UTC

Weird, this https://setiathome.berkeley.edu/show_host_detail.php?hostid=8823418 host has thousands of active tasks. It is returning results for tasks sent today while it has lots of tasks that were sent back in March and have (much) earlier deadlines. Does it have thousands of ghost tasks ?

No ghosts or anything weird. It´s just uses a different client who crunch first the resends and them in a dateline order not the regular FIFO order.
You could see mine host who does the same. That helps the server by clearing the resends ASAP.
ID: 2045821 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2045835 - Posted: 19 Apr 2020, 15:42:29 UTC - in response to Message 2045707.  

Well, it does appear to be bringing the "Results returned and awaiting validation" and "Results out in the field" in to line with each other.




I don't remember the actual values, but my Pendings appear to have almost halved (roughly 1500 to 840), and my Valids over ten times what they were (40 or so to almost 500) and Inconclusives almost 3 times higher (roughly 80 to 230).


If you use the custom zoom settings on https://munin.kiska.pw/munin/Munin-Node/Munin-Node/results_setiathomev8_in_progress_validation.html and https://munin.kiska.pw/munin/Munin-Node/Munin-Node/results_setiathomev8_creation.html, you can get some more meaningful insights for the past 48 hours or so.
This is a custom zoom from one of those pages.
You can see that the "Results returned and awaiting validation" is now slightly below "Results out in the field" for SETI@home v8 although AstroPulse results are still a bit behind, probably for different reasons.
ID: 2045835 · Report as offensive     Reply Quote
Previous · 1 . . . 82 · 83 · 84 · 85 · 86 · 87 · 88 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.