Strange problem with one host not requesting work

Author	Message
Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1967491 - Posted: 28 Nov 2018, 5:31:24 UTC Last modified: 28 Nov 2018, 5:32:18 UTC Ok, I've got a strange problem that has cropped up on host 6279633 since the outage returned this afternoon. The host is not requesting the normal amount of gpu work to keep the allotted cache size. I should have 400 tasks onboard at all times with 3 gpus and 1 cpu. But at each server request for work, it is just requesting a minuscule amount of work seconds. I don't have any recent error backoffs either to account for that. SETI@home v8 (anonymous platform, NVIDIA GPU) Number of tasks completed 778698 Max tasks per day 6422 Number of tasks today 1956 Consecutive valid tasks 6389 Average processing rate 2,300.07 GFLOPS Average turnaround time 0.12 days But each request is only asking for 2-14 task each time and the cache continues to fall. Pipsqueek 4143 SETI@home 11/27/2018 9:24:07 PM Reporting 13 completed tasks 4144 SETI@home 11/27/2018 9:24:07 PM Requesting new tasks for CPU and NVIDIA GPU 4145 SETI@home 11/27/2018 9:24:07 PM [sched_op] CPU work request: 1220293.91 seconds; 0.00 devices 4146 SETI@home 11/27/2018 9:24:07 PM [sched_op] NVIDIA GPU work request: 906.26 seconds; 0.00 devices 4147 SETI@home 11/27/2018 9:24:09 PM Scheduler request completed: got 13 new tasks 4148 SETI@home 11/27/2018 9:24:09 PM [sched_op] Server version 709 4149 SETI@home 11/27/2018 9:24:09 PM Project requested delay of 303 seconds 4150 SETI@home 11/27/2018 9:24:09 PM [sched_op] estimated total CPU task duration: 1909 seconds 4151 SETI@home 11/27/2018 9:24:09 PM [sched_op] estimated total NVIDIA GPU task duration: 964 seconds Right now I am down over 225 tasks from my assigned allotment. I am not in any Nvidia backoff either. I have plenty of hard drive space at over 9GB available to BOINC. So that is not the issue. Can anyone point out what is going on? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1967491 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1967498 - Posted: 28 Nov 2018, 7:25:35 UTC The issue is that the host just isn't asking for work for the gpu most times. When I backed out of the driver and reapplied, the host was offline for several minutes so that the next scheduler request was past the wait period. I then got over a hundred tasks to fill my cache up. But the cache continues to fall. I got a like for like request for work one time after the initial filling of the cache but nothing after that. Tue 27 Nov 2018 10:48:09 PM PST \| SETI@home \| Sending scheduler request: To fetch work. Tue 27 Nov 2018 10:48:09 PM PST \| SETI@home \| Reporting 13 completed tasks Tue 27 Nov 2018 10:48:09 PM PST \| SETI@home \| Requesting new tasks for CPU and NVIDIA GPU Tue 27 Nov 2018 10:48:09 PM PST \| SETI@home \| [sched_op] CPU work request: 1220690.01 seconds; 0.00 devices Tue 27 Nov 2018 10:48:09 PM PST \| SETI@home \| [sched_op] NVIDIA GPU work request: 238643.07 seconds; 0.00 devices Tue 27 Nov 2018 10:48:11 PM PST \| SETI@home \| Scheduler request completed: got 13 new tasks Tue 27 Nov 2018 10:48:11 PM PST \| SETI@home \| [sched_op] Server version 709 Tue 27 Nov 2018 10:48:11 PM PST \| SETI@home \| Project requested delay of 303 seconds Tue 27 Nov 2018 10:48:11 PM PST \| SETI@home \| [sched_op] estimated total CPU task duration: 1939 seconds Tue 27 Nov 2018 10:48:11 PM PST \| SETI@home \| [sched_op] estimated total NVIDIA GPU task duration: 963 seconds But the next requests for work are only asking for cpu work. Tue 27 Nov 2018 10:58:24 PM PST \| SETI@home \| [sched_op] Starting scheduler request Tue 27 Nov 2018 10:58:24 PM PST \| SETI@home \| Sending scheduler request: To fetch work. Tue 27 Nov 2018 10:58:24 PM PST \| SETI@home \| Reporting 15 completed tasks Tue 27 Nov 2018 10:58:24 PM PST \| SETI@home \| Requesting new tasks for CPU Tue 27 Nov 2018 10:58:24 PM PST \| SETI@home \| [sched_op] CPU work request: 1223402.60 seconds; 0.00 devices Tue 27 Nov 2018 10:58:24 PM PST \| SETI@home \| [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices Tue 27 Nov 2018 10:58:26 PM PST \| SETI@home \| Scheduler request completed: got 3 new tasks Tue 27 Nov 2018 10:58:26 PM PST \| SETI@home \| [sched_op] Server version 709 Tue 27 Nov 2018 10:58:26 PM PST \| SETI@home \| Project requested delay of 303 seconds Tue 27 Nov 2018 10:58:26 PM PST \| SETI@home \| [sched_op] estimated total CPU task duration: 5805 seconds Tue 27 Nov 2018 10:58:26 PM PST \| SETI@home \| [sched_op] estimated total NVIDIA GPU task duration: 0 seconds Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1967498 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1967499 - Posted: 28 Nov 2018, 7:45:52 UTC Looking at the Tasks for that system, it is showing 400 tasks in progress; 100 CPU & 300 GPU. Grant Darwin NT ID: 1967499 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1967500 - Posted: 28 Nov 2018, 8:08:00 UTC - in response to Message 1967499. Looking at the Tasks for that system, it is showing 400 tasks in progress; 100 CPU & 300 GPU. Now it is. It was down to 75 gpu tasks for a long while. I finally really looked through the log with work_fetch on along with my normal sched_ops and determined that for some reason GPUGrid was swamping Seti. Never had any issues in the past since the resource share is so low compared to Seti. Once I suspended GPUGrid I got a gpu request for Seti and could refill my cache to normal. Once I unsuspended GPUGrid I was back to no Seti gpu tasks. So I aborted 3 of the 6 GPUGrid tasks and it still was not getting any Seti gpu work. So I reset the GPUGrid project and that seems to have restored things to normal. I have my usual 6 task cache for GPUGrid and am able to get my normal Seti gpu cache refills. So something went titsup with GPUGrid during the outage I guess that affected Seti on that host. None of my other 3 hosts attached to GPUGrid and were running GPUGrid tasks during the outage have had any issues. Plugging along normally just like always. Very strange. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1967500 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1967615 - Posted: 29 Nov 2018, 1:37:29 UTC Pretty sure the issue with that host was that I had double the amount of normal work assigned to it because of 6 ghosts. Don't know how that happened but it did. Instead of my normal 6 tasks in progress, the project shows 12 assigned to the host. I actually only have 6 physical tasks on the host as expected. I fixed the problem by resetting the project and that cleared out the local client_state file on that machine but did nothing of course with the project scheduler. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1967615 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.