Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 87 · 88 · 89 · 90 · 91 · 92 · 93 . . . 107 · Next
Author | Message |
---|---|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14686 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Exactly. They seem to be using brute force, rather than nuanced sophistication. Let's see if we can be more sophisticated than them, and help them out by returning any resends as quickly as possible. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Also what I have read in Boinc source suggests that Boinc has a mechanism where tasks can be flagged to be sent to 'reliable' hosts only. Realiable meaning host that doesn't produce lot of errors or invalids and doesn't have a long turnaround time. That's exactly what should be used for these 'presends' but doesn't seem to be used. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
and help them out by returning any resends as quickly as possible. I will complain of course, but you need to test whatever bad liquor you are drinking. Yesterday you asked exactly the inverse. LOL I'd also say it's time to switch off the 'process resends first' option - Anyway the cache level is at 2.8k. About half a day to complete all the tasks mainly because all who left are vlars most Arecibo. ![]() |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Also what I have read in Boinc source suggests that Boinc has a mechanism where tasks can be flagged to be sent to 'reliable' hosts only. Realiable meaning host that doesn't produce lot of errors or invalids and doesn't have a long turnaround time. That's exactly what should be used for these 'presends' but doesn't seem to be used. Someone will claim about this as an elitist measure, and the fire on the thread will restart. ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14686 Credit: 200,643,578 RAC: 874 ![]() ![]() |
LOL. Yesterday's remark was aimed at (a limited number of) bunkerers. Today's remark was aimed at the broader population, who don't have a choice because they've only got resends - our first-run tasks ran out three weeks ago!and help them out by returning any resends as quickly as possible.I will complain of course, but you need to test whatever bad liquor you are drinking. I had some white wine last night, but this morning I've only touched coffee. Hic. |
Sirius B ![]() ![]() Send message Joined: 26 Dec 00 Posts: 24922 Credit: 3,081,182 RAC: 7 ![]() |
In the past, that may have been true, but not now.Also what I have read in Boinc source suggests that Boinc has a mechanism where tasks can be flagged to be sent to 'reliable' hosts only. Realiable meaning host that doesn't produce lot of errors or invalids and doesn't have a long turnaround time. That's exactly what should be used for these 'presends' but doesn't seem to be used. I stopped crunching on 10th after the last initial task completed. My thoughts were & still are, let the fast hosts get the resends so that the project can get completed. My only fear is that if/when Seti returns, it'll be part of Science United rather than Boinc. :-( |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
I had to turn that option off in my client at the end of March because I had collected enough tasks to reach the deadline limit. Processing resends with deadlines beyond the choke point first would have made me miss some deadlines. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14686 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I think that our general experience, over the years since Credit New was introduced, is that the 'reliable host' flag is itself unreliable, and too often allows new work to be sent to hosts who haven't yet returned the old work - or have only returned a small proportion of it. We had a sticky thread about people sending PMs to the owners of unreliable hosts, until recently.Also what I have read in Boinc source suggests that Boinc has a mechanism where tasks can be flagged to be sent to 'reliable' hosts only. Realiable meaning host that doesn't produce lot of errors or invalids and doesn't have a long turnaround time. That's exactly what should be used for these 'presends' but doesn't seem to be used.Someone will claim about this as an elitist measure, and the fire on the thread will restart. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
I've only touched coffee. Hic. Coffee + aspirins for me too. My head hurts. Have no idea why it's hurt each morning. ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
My only fear is that if/when Seti returns, it'll be part of Science United rather than Boinc. :-(SU isn't an alternative to Boinc. It's an alternative to account managers. Projects that SU makes your host crunch are normal Boinc projects you can attach 'manually' too. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
I think that our general experience, over the years since Credit New was introduced, is that the 'reliable host' flag is itself unreliable, and too often allows new work to be sent to hosts who haven't yet returned the old work - or have only returned a small proportion of it. We had a sticky thread about people sending PMs to the owners of unreliable hosts, until recently.Also what I have read in Boinc source suggests that Boinc has a mechanism where tasks can be flagged to be sent to 'reliable' hosts only. Realiable meaning host that doesn't produce lot of errors or invalids and doesn't have a long turnaround time. That's exactly what should be used for these 'presends' but doesn't seem to be used.Someone will claim about this as an elitist measure, and the fire on the thread will restart. Maybe is wise on this last days down the WU limit a lot more while keep sending the resends, something like 15 per device instead of the actual 150, that will allow the user pick a small bath of files, return and pick more. Instead of a large bath (like the 300 WU some received) who takes days/weeks to crunch on the regular hosts. By doing this the data will be crunched a lot faster since the WU's will be spread to a lot more hosts and the WU will not rest for a long time in any host. ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
I think that our general experience, over the years since Credit New was introduced, is that the 'reliable host' flag is itself unreliable, and too often allows new work to be sent to hosts who haven't yet returned the old work - or have only returned a small proportion of it.How can the option prove itself unreliable if it hasn't been used? |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
My only fear is that if/when Seti returns, it'll be part of Science United rather than Boinc. :-( <Panic mode ON> Hope NO or we will be definitely doomed. ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Maybe is wise on this last days down the WU limit a lot more while keep sending the resends, something like 15 per device instead of the actual 150, that will allow the user pick a small bath of files, return and pick more. Instead of a large bath (like the 300 WU some received) who takes days/weeks to crunch on the regular hosts. By doing this the data will be crunched a lot faster since the WU will be spread to a lot more hosts and the WU will not rest for a long time in any host.The main problem is that their script is sending the resends in big bunches and the scheduler request cooldown is 30 minutes. So some 'lucky' hosts will receive a ridiculous amount and the majority gets nothing. They should remove this raffle by making the scheduler buffer a lot smaller so that a host can get only a few tasks per request and everyone would get a little. If everyone has a little instead of just the lucky ones having a lot, then much bigger proportion of the setiathome distributed supercomputer capacity would be in use. I'm sure they do have statistics about how many hosts are contacting on average in the time between two script runs. If they limit the number of tasks one scheduler request can receive to the number of tasks one script run resends divided by number of scheduler requests happening between two runs, then everyone would get something and every task would still be taken by someone. If they just reduced the server side limit for max tasks in cache to some small number, then "spoofers" would get most of the stuff. Good for me and my team but not good in general. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
If they just reduced the server side limit for max tasks in cache to some small number, then "spoofers" would get most of the stuff. Good for me and my team but not good in general. It's hard to any spoofer to build a even small cache DL 15 WU each 30 min. Since most of us has fast hosts who crunch this 15 WU in less than 30 min. That is why i suggest a very small number of WU (maybe instead 2 for each CPU and 10 of each GPU) . So the supercomputer could activate all his nodes (hosts) and end the task ASAP. The idea is to distribute the work for all possible hosts at the time. As each host returns it's job a new set could be sended to it. ![]() |
![]() ![]() Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 ![]() |
@Richard H Is there an option in BOINC code to Cancel unstarted tasks which have already met Quorum ? I'm fairly sure that I have seen this used before on some other projects, possibly LHC This would help some of the owners of machines with very large, multi day caches to continue to reduce the backlog quicker, while not needing to process unnecessary work. I have a few examples here https://setiathome.berkeley.edu/results.php?hostid=8800543&offset=0&show_names=1&state=0&appid= from 31/3. There are only 7, so they are quite easy to see. |
![]() ![]() Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 ![]() |
Thanks, I thought I had seen it before. I think there may be over 1 million tasks in this condition now "Results waiting for db purging" out of the total "Results returned and awaiting validation" I'm sure that value has been increasing significantly over the last few days, but I can't find a Munin graph for it. https://munin.kiska.pw/munin/setiathome-day.html |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Is there an option in BOINC code to Cancel unstarted tasks which have already met Quorum ?The message client produces when this happens is "Server requested abort of unknown task %s" And the 'unknown' here suggests that this mechanism is used to abort tasks that have no matching result row in the database. So more like a bug resolving mechanism than an option for server admins to use. Edit - actually I misread the code. The message is only printed when the client failed to find the task it was asked to abort. The 'unknown' refers to this. |
![]() ![]() Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 ![]() |
I think the Unknowns may be Ghost tasks that have been cancelled. I understand that this will only be helpful for big caches, but it should be helpful to reduce wasted effort, by ensuring that their tasks waiting to be processed are still needed. The remainder will eventually time out, in weeks or months, if they never contact the servers again. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
If they really wanted to clear those results from the database faster, they would have reduced the ridiculously long deadlines. Some of the extra resends they are sending now have deadlines in late July. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.