Strange problem with one host not requesting work

Message boards : Number crunching : Strange problem with one host not requesting work
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1967491 - Posted: 28 Nov 2018, 5:31:24 UTC
Last modified: 28 Nov 2018, 5:32:18 UTC

Ok, I've got a strange problem that has cropped up on host 6279633 since the outage returned this afternoon. The host is not requesting the normal amount of gpu work to keep the allotted cache size.
I should have 400 tasks onboard at all times with 3 gpus and 1 cpu. But at each server request for work, it is just requesting a minuscule amount of work seconds. I don't have any recent error backoffs either to account for that.
SETI@home v8 (anonymous platform, NVIDIA GPU)
Number of tasks completed 778698
Max tasks per day 6422
Number of tasks today 1956
Consecutive valid tasks 6389
Average processing rate 2,300.07 GFLOPS
Average turnaround time 0.12 days

But each request is only asking for 2-14 task each time and the cache continues to fall.

Pipsqueek

4143 SETI@home 11/27/2018 9:24:07 PM Reporting 13 completed tasks
4144 SETI@home 11/27/2018 9:24:07 PM Requesting new tasks for CPU and NVIDIA GPU
4145 SETI@home 11/27/2018 9:24:07 PM [sched_op] CPU work request: 1220293.91 seconds; 0.00 devices
4146 SETI@home 11/27/2018 9:24:07 PM [sched_op] NVIDIA GPU work request: 906.26 seconds; 0.00 devices
4147 SETI@home 11/27/2018 9:24:09 PM Scheduler request completed: got 13 new tasks
4148 SETI@home 11/27/2018 9:24:09 PM [sched_op] Server version 709
4149 SETI@home 11/27/2018 9:24:09 PM Project requested delay of 303 seconds
4150 SETI@home 11/27/2018 9:24:09 PM [sched_op] estimated total CPU task duration: 1909 seconds
4151 SETI@home 11/27/2018 9:24:09 PM [sched_op] estimated total NVIDIA GPU task duration: 964 seconds

Right now I am down over 225 tasks from my assigned allotment. I am not in any Nvidia backoff either. I have plenty of hard drive space at over 9GB available to BOINC. So that is not the issue. Can anyone point out what is going on?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1967491 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1967498 - Posted: 28 Nov 2018, 7:25:35 UTC

The issue is that the host just isn't asking for work for the gpu most times. When I backed out of the driver and reapplied, the host was offline for several minutes so that the next scheduler request was past the wait period. I then got over a hundred tasks to fill my cache up. But the cache continues to fall. I got a like for like request for work one time after the initial filling of the cache but nothing after that.

Tue 27 Nov 2018 10:48:09 PM PST | SETI@home | Sending scheduler request: To fetch work.
Tue 27 Nov 2018 10:48:09 PM PST | SETI@home | Reporting 13 completed tasks
Tue 27 Nov 2018 10:48:09 PM PST | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
Tue 27 Nov 2018 10:48:09 PM PST | SETI@home | [sched_op] CPU work request: 1220690.01 seconds; 0.00 devices
Tue 27 Nov 2018 10:48:09 PM PST | SETI@home | [sched_op] NVIDIA GPU work request: 238643.07 seconds; 0.00 devices
Tue 27 Nov 2018 10:48:11 PM PST | SETI@home | Scheduler request completed: got 13 new tasks
Tue 27 Nov 2018 10:48:11 PM PST | SETI@home | [sched_op] Server version 709
Tue 27 Nov 2018 10:48:11 PM PST | SETI@home | Project requested delay of 303 seconds
Tue 27 Nov 2018 10:48:11 PM PST | SETI@home | [sched_op] estimated total CPU task duration: 1939 seconds
Tue 27 Nov 2018 10:48:11 PM PST | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 963 seconds

But the next requests for work are only asking for cpu work.

Tue 27 Nov 2018 10:58:24 PM PST | SETI@home | [sched_op] Starting scheduler request
Tue 27 Nov 2018 10:58:24 PM PST | SETI@home | Sending scheduler request: To fetch work.
Tue 27 Nov 2018 10:58:24 PM PST | SETI@home | Reporting 15 completed tasks
Tue 27 Nov 2018 10:58:24 PM PST | SETI@home | Requesting new tasks for CPU
Tue 27 Nov 2018 10:58:24 PM PST | SETI@home | [sched_op] CPU work request: 1223402.60 seconds; 0.00 devices
Tue 27 Nov 2018 10:58:24 PM PST | SETI@home | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
Tue 27 Nov 2018 10:58:26 PM PST | SETI@home | Scheduler request completed: got 3 new tasks
Tue 27 Nov 2018 10:58:26 PM PST | SETI@home | [sched_op] Server version 709
Tue 27 Nov 2018 10:58:26 PM PST | SETI@home | Project requested delay of 303 seconds
Tue 27 Nov 2018 10:58:26 PM PST | SETI@home | [sched_op] estimated total CPU task duration: 5805 seconds
Tue 27 Nov 2018 10:58:26 PM PST | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 0 seconds
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1967498 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1967499 - Posted: 28 Nov 2018, 7:45:52 UTC

Looking at the Tasks for that system, it is showing 400 tasks in progress; 100 CPU & 300 GPU.
Grant
Darwin NT
ID: 1967499 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1967500 - Posted: 28 Nov 2018, 8:08:00 UTC - in response to Message 1967499.  

Looking at the Tasks for that system, it is showing 400 tasks in progress; 100 CPU & 300 GPU.

Now it is. It was down to 75 gpu tasks for a long while. I finally really looked through the log with work_fetch on along with my normal sched_ops and determined that for some reason GPUGrid was swamping Seti. Never had any issues in the past since the resource share is so low compared to Seti.

Once I suspended GPUGrid I got a gpu request for Seti and could refill my cache to normal. Once I unsuspended GPUGrid I was back to no Seti gpu tasks. So I aborted 3 of the 6 GPUGrid tasks and it still was not getting any Seti gpu work. So I reset the GPUGrid project and that seems to have restored things to normal. I have my usual 6 task cache for GPUGrid and am able to get my normal Seti gpu cache refills. So something went titsup with GPUGrid during the outage I guess that affected Seti on that host. None of my other 3 hosts attached to GPUGrid and were running GPUGrid tasks during the outage have had any issues. Plugging along normally just like always. Very strange.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1967500 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1967615 - Posted: 29 Nov 2018, 1:37:29 UTC

Pretty sure the issue with that host was that I had double the amount of normal work assigned to it because of 6 ghosts. Don't know how that happened but it did. Instead of my normal 6 tasks in progress, the project shows 12 assigned to the host. I actually only have 6 physical tasks on the host as expected. I fixed the problem by resetting the project and that cleared out the local client_state file on that machine but did nothing of course with the project scheduler.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1967615 · Report as offensive

Message boards : Number crunching : Strange problem with one host not requesting work


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.