How to get one of my computers to ask for CPU work

Message boards : Number crunching : How to get one of my computers to ask for CPU work
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1664329 - Posted: 12 Apr 2015, 2:35:34 UTC
Last modified: 12 Apr 2015, 2:42:01 UTC

Now that things seem to be getting back to normal, I find that one of my machines is only asking for NVIDIA work, while the other asks for NVIDIA and CPU. The first has 200 WUs now (2 GPUs) while the other has 300 (CPU and 2 GPUs). This has been going on for about 9 hours now.

No changes were made to any parameters or files by me since before the shutdown of the past few days, when both machines were getting CPU and GPU work.

Looking at the Event Log, I see that at first it DID ask for CPU or CPU and NVIDIA, but it gave up on CPU after only a few tries.

Is there any likely explanation for this phenomenon?
ID: 1664329 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1664331 - Posted: 12 Apr 2015, 2:51:27 UTC - in response to Message 1664329.  

Click on the Projects Tab & then Properties.
Any values for the CPU work fetch deferred for/interval?
Grant
Darwin NT
ID: 1664331 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1664338 - Posted: 12 Apr 2015, 3:08:35 UTC - in response to Message 1664331.  
Last modified: 12 Apr 2015, 3:18:09 UTC

Click on the Projects Tab & then Properties.
Any values for the CPU work fetch deferred for/interval?


Yes - it says CPU work fetch deferral interval 5:20:00 (which makes no sense, since he has none). On the other machine, it has 0:20:00 (which makes sense, since he won't run out in 20 minutes).

BTW: Running BOINC 7.0.64 on that machine. But it didn't do this before, so I don't think the version is relevant.
ID: 1664338 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1664349 - Posted: 12 Apr 2015, 3:52:17 UTC - in response to Message 1664338.  

Click on the Projects Tab & then Properties.
Any values for the CPU work fetch deferred for/interval?


Yes - it says CPU work fetch deferral interval 5:20:00 (which makes no sense, since he has none). On the other machine, it has 0:20:00 (which makes sense, since he won't run out in 20 minutes).

BTW: Running BOINC 7.0.64 on that machine. But it didn't do this before, so I don't think the version is relevant.

As of right now both of your machines have 300 tasks.
I'm guessing that BOINC sorted this out on its own?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1664349 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1664352 - Posted: 12 Apr 2015, 4:06:45 UTC - in response to Message 1664338.  

Click on the Projects Tab & then Properties.
Any values for the CPU work fetch deferred for/interval?


Yes - it says CPU work fetch deferral interval 5:20:00 (which makes no sense, since he has none)

Actually it does make sense because it has none.
Each time you ask for work, and don't get any, the work request backoff increases. Each time you report completed work, it gets reset.
So when there is an outage, after about 5-6 requests when there is no work available the deferral will be up to 4-5 hours before it asks for work again.
If it gets work then, well and good, if not the backoff starts increasing again with each unsuccessful attempt.
Grant
Darwin NT
ID: 1664352 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1664433 - Posted: 12 Apr 2015, 8:18:58 UTC

Reduce your cache size - ask for less work.

Your machines both have the maximum allowed number of workunits in progress. If you request new work without at the same time returning completed work, you'll get nothing. If you do return work, you'll just get enough to replace those returns - and the chances are they'll be GPU tasks.

Enable work fetch debug - just one cycle will do - and note the value here:

12/04/2015 09:13:52 | | [work_fetch] --- state for NVIDIA GPU ---
12/04/2015 09:13:52 | | [work_fetch] shortfall 13163.73 nidle 0.00 saturated 7572.27 busy 0.00

That's the number of seconds before your GPUs need to request more work.

Make your total cache size smaller than that number. When you next finish GPU work, click update: you will request CPU work only (GPU not needed) and you're running again.
ID: 1664433 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1664506 - Posted: 12 Apr 2015, 13:31:25 UTC - in response to Message 1664433.  
Last modified: 12 Apr 2015, 13:37:06 UTC

Thanks, Richard. Next time I get in that fix I will try your solution. As others above noted, BOINC did fix itself eventually (after the 5:20 expired).

I have been ignoring cache size settings since the project started limiting the number of WUs I could have in my queues. The cache limitations became meaningless for my crunchers (or so I thought).

I do have a problem with the interval getting so long. In my case, I had no CPU and the servers certainly had plenty, so the excessive time delay hurt me with no benefit to the project. Perhaps this topic needs more analysis.
ID: 1664506 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1664523 - Posted: 12 Apr 2015, 14:19:18 UTC - in response to Message 1664506.  

Backoffs only really come into play in situations like this, recovery after an outage or work shortage. Once a host has an initial loading, and so long as work continues to be available reasonably consistently (doesn't have to be on every request), the important backoffs are are cleared every time a task finishes.

Backoffs are most visible (and people get most irritated by them!) during recovery phases. But, from the point of view of BOINC and the projects it supports, that's probably when they are most needed to divide the limited amount of available work evenly amongst the population of crunchers.
ID: 1664523 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1667069 - Posted: 19 Apr 2015, 1:44:17 UTC

Richard: turns out that if I do a MANUAL Update, the NVIDIA work fetch deferral interval gets reset to 10 minutes. Unfortunately, I7-3820-PC is in this state, and since he gets no GPU WUs ("No work available"), he gets the larger and larger deferral interval. He does get occasional CPU work, and KeplerBox, my other machine, is getting both.

This really SUCKS. BOINC is screwed up, in my estimation. I understand the 5 minute interval when he asks for more work, but stretching it out when the system is running and generating work for the GPU is just stupid, even if my particular machine happens not to be getting any. It should only be stretched when there is no work being generated, since then the non-asking makes sense.
ID: 1667069 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1667143 - Posted: 19 Apr 2015, 8:32:19 UTC - in response to Message 1667069.  
Last modified: 19 Apr 2015, 8:32:36 UTC

Richard: turns out that if I do a MANUAL Update, the NVIDIA work fetch deferral interval gets reset to 10 minutes. Unfortunately, I7-3820-PC is in this state, and since he gets no GPU WUs ("No work available"), he gets the larger and larger deferral interval. He does get occasional CPU work, and KeplerBox, my other machine, is getting both.

This really SUCKS. BOINC is screwed up, in my estimation. I understand the 5 minute interval when he asks for more work, but stretching it out when the system is running and generating work for the GPU is just stupid, even if my particular machine happens not to be getting any. It should only be stretched when there is no work being generated, since then the non-asking makes sense.

If you must sit on old Boinc versions, Boinc 7.0.64 in this case, you'll get that,

If you update to the current recommended Boinc, 7.4.42 at this time, you'll get this useful changeset:

http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=789637f637753c4e06f7ca58ce2de285d1491cc8

client: request work from backed-off resources if doing RPC anyway


Claggy
ID: 1667143 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1667159 - Posted: 19 Apr 2015, 10:03:28 UTC - in response to Message 1667143.  
Last modified: 19 Apr 2015, 10:17:21 UTC

client: request work from backed-off resources if doing RPC anyway


Maybe because I stick to old versions modified by myself, I don't recognise that. Could you tell me what that means please ? Is it in English ?

[Edit:] ahhh perhaps it's the semantically challenged version of:
client: request work for backed-off resources when doing RPC


One more successful humpty-dumptyism demystification, check.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1667159 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1667170 - Posted: 19 Apr 2015, 10:32:30 UTC - in response to Message 1667159.  

client: request work from backed-off resources if doing RPC anyway


Maybe because I stick to old versions modified by myself, I don't recognise that. Could you tell me what that means please ? Is it in English ?

[Edit:] ahhh perhaps it's the semantically challenged version of:

client: request work for backed-off resources when doing RPC

Yes, the logic is:

If you didn't get work for a given resource last time, slow down the requests. If the project hasn't got an application for your GPU yet, it isn't worth hammering the server every 10 seconds to find out if the programmer has finished writing it yet.

But if your CPU is ready for another task anyway, it doesn't cost anything to tag on a GPU request at the same time.

That's the general BOINC client picture, which of course is not SETI-specific: other projects are available.

Here, applications are available for most hardware types, so the reason for non-allocation of work is usually different. In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff: and the backoffs for 'quota reached' or 'feeder empty' are exactly the same as the backoffs for 'programmer hasn't finished writing yet'. I did suggest (many years ago) that that the backoff algorithm should take account of the reason for non-allocation, but I know I wouldn't want to design such a function myself.

The significant point for this thread is that once you manage to get hold of some work (difficult with the current server gremlins), any backoff caused by failure to receive work when requested is cleared each time you complete any of the tasks you've already got. So, if you keep the cache low, and the work requests "little and often" (which means a low or zero 'additional work' setting), you stand a far better chance of continuous running.
ID: 1667170 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1667174 - Posted: 19 Apr 2015, 10:47:23 UTC - in response to Message 1667170.  

Can't say I'm 100% happy with that changeset, on my i7-2600K/GTX760/HD7770 host when i get work i tend to get it for one vendors device only,
ie the GTX760 can finish a MBv7 shortie in 5 minutes or so, that'll reset the backoff for the NV device, and allow ATI/AMD work to be asked for too,
then I get ATI/AMD work first, and no NV work, it can get very one sided when trying for APv7 only, the NV device doesn't always get a chance to ask on it's own,
the work around I use is to lower the cache level to below the amount the ATI/AMD device already has.

Claggy
ID: 1667174 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1667370 - Posted: 19 Apr 2015, 21:41:19 UTC
Last modified: 19 Apr 2015, 21:44:49 UTC

RICHARD:

In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff

But why was SETI refusing to ASK FOR (not send) CPU work in that instance? I had no CPU work at all. What does the status of my GPU work have to do with CPU work??? (and vice versa, I might add).
ID: 1667370 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1667396 - Posted: 19 Apr 2015, 23:40:43 UTC - in response to Message 1667370.  

RICHARD:

In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff

But why was SETI refusing to ASK FOR (not send) CPU work in that instance? I had no CPU work at all. What does the status of my GPU work have to do with CPU work??? (and vice versa, I might add).

I thought we'd covered that. It had asked, been refused, and gone into backoff because of the refusal. You mentioned the backoff: that's the only way a (resource) backoff is allowed to accrue.
ID: 1667396 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 7031
Credit: 2,084,789
RAC: 3
Message 1667409 - Posted: 20 Apr 2015, 0:39:32 UTC

The Preferences page needs an update.
ID: 1667409 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1667410 - Posted: 20 Apr 2015, 0:56:02 UTC - in response to Message 1667409.  

The Preferences page needs an update.

The Preference has Just had an update.

Claggy
ID: 1667410 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1667512 - Posted: 20 Apr 2015, 8:12:15 UTC - in response to Message 1667396.  

RICHARD:

In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff

But why was SETI refusing to ASK FOR (not send) CPU work in that instance? I had no CPU work at all. What does the status of my GPU work have to do with CPU work??? (and vice versa, I might add).

I thought we'd covered that. It had asked, been refused, and gone into backoff because of the refusal. You mentioned the backoff: that's the only way a (resource) backoff is allowed to accrue.


I am questioning that policy. Since the project went to the 5-minute minimum between allowed requests, I contend there is no need for the backoff when the project is producing work; I was refused NOT because I was at my max WUs onboard but because of an artifact in the way the project queues work - NOT MY FAULT, WHY SHOULD I SUFFER?
ID: 1667512 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1667513 - Posted: 20 Apr 2015, 8:23:02 UTC - in response to Message 1667512.  

RICHARD:

In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff

But why was SETI refusing to ASK FOR (not send) CPU work in that instance? I had no CPU work at all. What does the status of my GPU work have to do with CPU work??? (and vice versa, I might add).

I thought we'd covered that. It had asked, been refused, and gone into backoff because of the refusal. You mentioned the backoff: that's the only way a (resource) backoff is allowed to accrue.

I am questioning that policy. Since the project went to the 5-minute minimum between allowed requests, I contend there is no need for the backoff when the project is producing work; I was refused NOT because I was at my max WUs onboard but because of an artifact in the way the project queues work - NOT MY FAULT, WHY SHOULD I SUFFER?

As Claggy said, you would not have 'suffered' (it's a pretty minor sort of suffering, in my view) if you had been running a more recent version of BOINC: people like Claggy and I (and some, but too few, others) pay attention to how BOINC works, and try to get changes made when we see undesirable side-effects from policies which make sense in other parts of the BOINC community.
ID: 1667513 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1667515 - Posted: 20 Apr 2015, 8:28:26 UTC - in response to Message 1667410.  

The Preferences page needs an update.

The Preference has Just had an update.

Claggy

That should have been:

The Preference pages have Just had an update, note the new layout:

Computing preferences

Claggy
ID: 1667515 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : How to get one of my computers to ask for CPU work


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.