Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 71 · 72 · 73 · 74 · 75 · 76 · 77 . . . 94 · Next
| Author | Message |
|---|---|
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
And there is now a fix for the AMD RX 5000 card issues.They can force only 'vanilla' hosts to upgrade their apps. So they can't really revert the triple validation kludge for overflow results before enough of the anonymous platform hosts have updated their apps to make the risk of a task getting sent to two bad hosts tiny enough to be acceptable. Unless they can 'blacklist' amd gpus from receiving the _1 if the corresponding _0 was sent to one. But I don't think the system supports this because if it did, they would have already done it instead of using this triple validation kludge - which isn't even 100% watertight because there's still the risk of all three going to bad hosts. |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
I am waiting and waiting to have the website confirm that I have a full cache. Everything is running Seti@Home except for three weather forecast tasks from WCG. Eyeballing it looks like I have a full set of cpu tasks and a less than full set of gpu tasks. But all the gpus are engaged and I think I may have 150 gpu tasks so hopefully it will stay that way. Apparently the Replica DB is "just a bit behind". It just reported I have 6 tasks in progress. I know I have to take off my shoes to count past 10 but I am sure I have more than "6" :) Here it is Sunday morning and I/we? are finally get a steady flow of tasks? Tom A proud member of the OFA (Old Farts Association). |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
Now, if the "Results returned and awaiting validation" were on the same graph as the "Results out in the field" for both for MB & AP it'd be perfectActually one of the more interesting graphs would be ts SUM of 'Results ready to send', 'Results out in the field', 'Results returned and awaiting validation' and 'Results waiting for db purging' for both MB & AP. That is all eight fields in one sum. This would be the number of results in the database. The value that Eric said has to be kept under 20 milllion to avoid the result table spilling out of RAM. It is now 18.9 milllion. Those 71 ancient zombie S@Hv7 results appear to have finally been purged! |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
I am waiting and waiting to have the website confirm that I have a full cache.Do what I did: Write a program that reads the client_state.xml and reports the number of tasks for CPU and GPU. That way you can easily see how full your queues are and you don't need the website for that, so it works even during the out(r)ages. And the data will always be fresh no matter how behind the relica db is. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
I think BoincTasks can do that, as well. |
Jimbocous ![]() Send message Joined: 1 Apr 13 Posts: 1861 Credit: 268,616,081 RAC: 1,349
|
I think BoincTasks can do that, as well. Quite well, in fact.
|
Jimbocous ![]() Send message Joined: 1 Apr 13 Posts: 1861 Credit: 268,616,081 RAC: 1,349
|
And, at least for the moment, the floodgates appear to have opened.
|
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
Something has changed. The floodgates are wide open but the assimilation queue is still getting smaller. |
Chris904395093209d ![]() Send message Joined: 1 Jan 01 Posts: 112 Credit: 29,923,129 RAC: 6
|
I'm not seeing the '71' under the S@H V7 column on the server status page. Did those finally get cleaned up in the dbase? ~Chris
|
Kissagogo27 Send message Joined: 6 Nov 99 Posts: 717 Credit: 8,032,827 RAC: 62
|
UTC+1 ^^ |
Mr. Kevvy ![]() Send message Joined: 15 May 99 Posts: 3866 Credit: 1,114,826,392 RAC: 3,319
|
|
juan BFP ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799
|
I'm not seeing the '71' under the S@H V7 column on the server status page. Did those finally get cleaned up in the dbase? Maybe is time to start to cut the timeline of the WUs and some changes in the way the work is distributed like sending the resends to the fastest hosts to clear them ASAP. Or we will be trapped on an endless loop of no new work each time the total reaches 20 MM.
|
Mr. Kevvy ![]() Send message Joined: 15 May 99 Posts: 3866 Credit: 1,114,826,392 RAC: 3,319
|
Or we will be trapped on an endless loop of no new work each time the total reaches 20 MM. Possible explanation of why this has only been happening recently here.... Briefly: Quorum=3 for overflows coupled with BLC35 files which generate little except overflows.
|
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
Maybe it time to start to cut the timeline of the WUs and some changes in the way the work is distributed like sending the resends to the fastest hosts to clear them ASAP.Again NOT the fastest but the ones with the shortest average turnaround time. Slow host with a tiny cache can return the result faster than a fast host with a huge spoofed cache. One thing that could prevent this from happening again is if the system monitored the rate of overflows returned and when any file being split exceeds some threshold, that file would be heavily throttled so that it continues being split but would produce only a small percentage of all the workunits. Or this could even happen without any monitoring if the different splitters split different files instead of all bunching up on the same file. So if some file (or a few files) produced an overflow storm, the storm would be diluted by all the other splitters splitting clean files. But I don't know how this would affect the splitter performance. Spreading out could be faster or slower than bunching up. |
juan BFP ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799
|
Again NOT the fastest but the ones with the shortest average turnaround time. Slow host with a tiny cache can return the result faster than a fast host with a huge spoofed cache. Sorry the meaning was lost in the translation. For me fastests host are the ones with the shortest average turnaround time (less than 1 day). They could clear the WU in very little time and help to reduce the DB size. Obviusly the WU must be sended with a very small death time line (less than 3 days in this case) . The way is done now, by sending the WU to any hosts (with a long death time line) just make the DB size problem even worst.
|
|
Speedy Send message Joined: 26 Jun 04 Posts: 1648 Credit: 12,921,799 RAC: 89
|
I think BoincTasks can do that, as well. I agree. It would be good if boinc tasks or another piece of software could push short tasks to the front of the queue. Does anybody know of any software that does this?
|
Cruncher-American ![]() Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340
|
Better solution: if you can detect short tasks without running them, why not just abort them? Can Boinc Tasks do this? Could the servers? |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19851 Credit: 40,757,560 RAC: 67
|
Better solution: if you can detect short tasks without running them, why not just abort them? The only known way is to run them. For a short time, like the time taken on a 2060 GPU, or better, for bomb to be -9ed. We don't know how many tasks are sent/day but we do know how many are returned/hr. Average tasks returned per hr * 24 * short time on GPU / 86400 (s in day) = GPU's needed |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
Sun 02 Feb 2020 06:16:57 PM CST | SETI@home | Scheduler request completed: got 92 new tasks Yum! Something to crunch ;) Tom A proud member of the OFA (Old Farts Association). |
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768
|
Looks like more trouble. About 30 minutes ago the Website got very Slow and the Scheduler checked out; Mon Feb 3 01:08:50 2020 | SETI@home | [sched_op] Starting scheduler requestJust when everything was working well... |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.