Data Chat

Message boards : Number crunching : Data Chat
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 34 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2004560 - Posted: 27 Jul 2019, 23:40:35 UTC - in response to Message 2004540.  
Last modified: 27 Jul 2019, 23:44:22 UTC

Already mentioned in the Lost "Ghost" Task Recovery Protocol thread updated a week ago.
https://setiathome.berkeley.edu/forum_thread.php?id=84176&postid=2003336
. . Wait for enough completed and reported tasks to decrease your work cache by at least 80 tasks so you have room for the resends.


. . Ah well I could have guessed someone would know about it before me, but it was still exciting to discover. And now it is known here too ...

. . The first sign I had was recovering 42 lost tasks, which for a short while made me think whoever made the change had a sense of whimsy. I quickly discarded that theory and decided it was the space limit in my cache allowance. So I waited till I had 120 completed tasks, then I got 80 so I knew what the actual limit is. And I like it !!!

Stephen

:)
ID: 2004560 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2004577 - Posted: 28 Jul 2019, 1:55:16 UTC

I did mention to Eric during my day at the lab, that people had found a way of using ghost recovery even though it was turned off, but that they were frustrated by the 20 limit.

I hope (though I've been on the road - no confirmation) that the uplift was as a result of that chat - so the procedure has Eric's tacit approval, too.
ID: 2004577 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2004578 - Posted: 28 Jul 2019, 2:01:50 UTC - in response to Message 2004577.  
Last modified: 28 Jul 2019, 2:19:05 UTC

I also bothered him directly as well... I can be quite a pest when required. :^)

Edit: Also for those wondering why automatic ghost recovery was turned off, you can thank computers like this one with currently 17,845 tasks "in progress" yet not a single valid one returned. Automatically re-sending ghosts would require the scheduler to check its task allotment list of 17,845 against the list actually present on the host whenever it requested work, ie every five minutes.

So, if it were automatic, three hundred machines as bad as this asking for work would be on average one per second. The scheduler would have time and bandwidth for little else.
ID: 2004578 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2004582 - Posted: 28 Jul 2019, 2:17:12 UTC - in response to Message 2004578.  

I also bothered him directly as well... I can be quite a pest when required. :^)

Whoever bugged Eric, or rather got his attention, it is most appreciated to have the resend quota raised to 80 now. You can clear your "ghosts" much more quickly now. All for the benefit of the database.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2004582 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2004583 - Posted: 28 Jul 2019, 2:21:35 UTC - in response to Message 2004578.  
Last modified: 28 Jul 2019, 2:23:12 UTC

...computers like this one with currently 17,845 tasks "in progress" yet not a single valid one returned.
That is impressive.
And shows that the system for limiting & releasing work for problem hosts needs work.


So, if it were automatic, three hundred machines as bad as this asking for work would be on average one per second. The scheduler would have time and bandwidth for little else.

It would be good if the BOINC Manager had an option to "Check for ghosts".
Grant
Darwin NT
ID: 2004583 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2004585 - Posted: 28 Jul 2019, 2:27:27 UTC - in response to Message 2004578.  

So, if it were automatic, three hundred machines as bad as this asking for work would be on average one per second. The scheduler would have time and bandwidth for little else.

When i sugest to do automaticaly i was thinking on something less fully automatic. Something like when we ask for a cpu bechmark. You manualy ask to start the recovery protocol and only them it will run but then the rest of the task will be compleated automaticaly without the need of several user interactions like we have today.
ID: 2004585 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2004587 - Posted: 28 Jul 2019, 2:37:50 UTC - in response to Message 2004585.  

Well, if you can program it, I can submit it to the developers.
ID: 2004587 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2004590 - Posted: 28 Jul 2019, 2:42:51 UTC - in response to Message 2004587.  
Last modified: 28 Jul 2019, 2:49:14 UTC

Well, if you can program it, I can submit it to the developers.

Maybe if i have access to the old code said by Grant i could try to modify and make it runs. Not promise anything. But is possible. My C++ programing is a little rusty.
ID: 2004590 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2004591 - Posted: 28 Jul 2019, 2:48:01 UTC - in response to Message 2004587.  

Well, if you can program it, I can submit it to the developers.
I'm thinking it would require Scheduler changes- When someone uses "Check for Ghosts" on the BOINC Manager, it sets a flag on the server for the Scheduler to run a check just for that particular host, then resets the flag afterwards so it doesn't keep doing it.
And I know they don't like making changes to the Scheduler (and my programming knowledge started and ended with dBase III+).
Grant
Darwin NT
ID: 2004591 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2004592 - Posted: 28 Jul 2019, 2:55:54 UTC

I don't think we can modify the server code. We know that the code is already there, but that it is deliberately turned off (for the reasons that Mr. Kevvy described).

So this would be a change in the client, plus a trigger in the Manager to set it going. And that needs to be a discrete enough trigger that people who need (and know how) to use it can find it, but it doesn't get fired off by every Tom, Dick, or Harriet under the sun.

Getting a lift to my last dinner on American soil in five minutes - I'll think about it and post more later.

Quick though before I go - the key trigger is "report the same result twice". The easiest way I can think of to invoke that is "don't process ACKs for the next report".

And now dinner calls...
ID: 2004592 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2004593 - Posted: 28 Jul 2019, 2:57:57 UTC - in response to Message 2004590.  
Last modified: 28 Jul 2019, 3:00:14 UTC

Well, if you can program it, I can submit it to the developers.
Maybe if i have access to the old code said by Grant i could try to modify and make it runs. Not promise anything. But is possible. My C++ programing is a little rusty.

On the client side, it'd be a case of adding a Check for Ghosts button on the BOINC Manager under the commands group.
Clicking that would result in a new tag <check_for_ghosts>0<check_for_ghosts> becoming <check_for_ghosts>1<check_for_ghosts> (to be reset to 0 after a successful Scheduler request)

The heavy lifting part would be with the Scheduler, to look for and act on that flag, for just the host that is requesting it.



Edit- or as Richard suggests, to get the Manager to do on request what has to be done manually for the Ghost recovery protocol...
Grant
Darwin NT
ID: 2004593 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2004595 - Posted: 28 Jul 2019, 3:23:56 UTC - in response to Message 2004577.  

I did mention to Eric during my day at the lab, that people had found a way of using ghost recovery even though it was turned off, but that they were frustrated by the 20 limit.

I hope (though I've been on the road - no confirmation) that the uplift was as a result of that chat - so the procedure has Eric's tacit approval, too.


. . Mr Kevvy posted a message in the ghost recovery thread that it had been increased to 40 and proved stable so was again increased, this time to 80. It has worked nicely for me.

. . I feel that it was highly likely your feedback to Eric was significant, so thanks for the effort.

Stephen

:)
ID: 2004595 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2004596 - Posted: 28 Jul 2019, 3:28:00 UTC - in response to Message 2004583.  

That is impressive.
And shows that the system for limiting & releasing work for problem hosts needs work.
It would be good if the BOINC Manager had an option to "Check for ghosts".


. . +1

. . That would make it easier to keep tabs.

Stephen

. .
ID: 2004596 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2004597 - Posted: 28 Jul 2019, 3:31:08 UTC - in response to Message 2004585.  

So, if it were automatic, three hundred machines as bad as this asking for work would be on average one per second. The scheduler would have time and bandwidth for little else.

When i sugest to do automaticaly i was thinking on something less fully automatic. Something like when we ask for a cpu bechmark. You manualy ask to start the recovery protocol and only them it will run but then the rest of the task will be compleated automaticaly without the need of several user interactions like we have today.


. . Yes, a user selectable 'option' to recover ghosted WUs would be a very good and friendly solution. I don't imagine (though I could be proven wrong) that machines like the one cited by Mr Kevvy would be using that option at any time.

Stephen

. .
ID: 2004597 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2004598 - Posted: 28 Jul 2019, 3:33:51 UTC - in response to Message 2004591.  

Well, if you can program it, I can submit it to the developers.
I'm thinking it would require Scheduler changes- When someone uses "Check for Ghosts" on the BOINC Manager, it sets a flag on the server for the Scheduler to run a check just for that particular host, then resets the flag afterwards so it doesn't keep doing it.
And I know they don't like making changes to the Scheduler (and my programming knowledge started and ended with dBase III+).


. . Been there :)

Stephen

:)
ID: 2004598 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2004599 - Posted: 28 Jul 2019, 3:40:09 UTC - in response to Message 2004578.  

I also bothered him directly as well... I can be quite a pest when required. :^)
So, if it were automatic, three hundred machines as bad as this asking for work would be on average one per second. The scheduler would have time and bandwidth for little else.


. . Keep up the good work. :) And as Grant said, there is need for a process to identify hosts such as the one you cited and exterminate them 'bring them into line'.

Stephen

:)
ID: 2004599 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2004627 - Posted: 28 Jul 2019, 13:50:41 UTC - in response to Message 2004592.  

The key trigger is "report the same result twice". The easiest way I can think of to invoke that is "don't process ACKs for the next report".

And now dinner calls...
Well, the final dinner turned out to be 'on American water', rather than 'soil':



And a fine round-off to my trip it was too. Many thanks to Brian, our host here. Too busy attending to the fresh salmon to think about coding, but I still think the line quoted is the easiest. I'll have to close down soon to pack for the flight back tonight, but we can pick this up again after maintenance Tuesday, when I'm back home.
ID: 2004627 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2004931 - Posted: 30 Jul 2019, 23:48:58 UTC - in response to Message 2003807.  

In regards to the 2 terabytes of SSD storage for the upload server unless things have changed in the last 4 years usable space on a 512 gig SSD is about 464 gig so that means you are looking at about 1.856 TB of usable space. It will be also interesting to see how long they last because they will get written to more than average drive, unless they are industrial drives.
ID: 2004931 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2004951 - Posted: 31 Jul 2019, 3:41:39 UTC

They gave us a sequence for blc35_2bit_guppi_58642_ but some of the bits are missing. There are gaps in the sequence of ending numbers. I'm not sure what this means. Is the data ok? why are bits missing?
ID: 2004951 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2004968 - Posted: 31 Jul 2019, 5:49:14 UTC - in response to Message 2004951.  

They gave us a sequence for blc35_2bit_guppi_58642_ but some of the bits are missing. There are gaps in the sequence of ending numbers. I'm not sure what this means. Is the data ok? why are bits missing?

A little confused about your comment. What is missing?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2004968 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 34 · Next

Message boards : Number crunching : Data Chat


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.