Message boards :
Number crunching :
Work fetch debug enabled - how to read the logs?
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 17 May 99 Posts: 271 Credit: 5,852,934 RAC: 0 ![]() |
I have my host configured to accept GPU work for CUDA MB and ATI OpenCL AP. Basically this works fine. But I'm not getting GPU work for AP (only got 1 unit so far!). For AP CPU and MB GPU I'm getting lots of units. Problem seems to be the BOINC scheduler. Here is a piece of log with work_fetch_debug enabled. I just don't know how to interpret this. - What does shortfall / nidle / saturated / busy / fetchable / runnable mean? - Why am I not getting GPU AP work? Help! :) 17/12/2010 12:06:23 [wfd]: work fetch start 17/12/2010 12:06:23 [wfd] ------- start work fetch state ------- 17/12/2010 12:06:23 [wfd] target work buffer: 864.00 + 864000.00 sec 17/12/2010 12:06:23 [wfd] CPU: shortfall 3372168.73 nidle 2.96 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00 17/12/2010 12:06:23 SETI@home [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 480.00 (comm deferred) 17/12/2010 12:06:23 [wfd] NVIDIA GPU: shortfall 0.00 nidle 0.00 saturated 891797.44 busy 0.00 RS fetchable 0.00 runnable 100.00 17/12/2010 12:06:23 SETI@home [wfd] NVIDIA GPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred) 17/12/2010 12:06:23 [wfd] ATI GPU: shortfall 812171.29 nidle 0.00 saturated 48560.68 busy 0.00 RS fetchable 0.00 runnable 100.00 17/12/2010 12:06:23 SETI@home [wfd] ATI GPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 60.00 (comm deferred) 17/12/2010 12:06:23 SETI@home [wfd] overall LTD -7244091.73 17/12/2010 12:06:23 [wfd] ------- end work fetch state ------- 17/12/2010 12:06:23 [wfd] No project chosen for work fetch |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Just at the moment, you're not asking for work, so you sure as heck won't get any. You are (comm deferred) at seti: wait until the five-minute server-imposed backoff has ended, or, if that's cleared (see Projects tab in BOINC Manager), click 'update'. [wfd] becomes clearer if you look at an iteration where an actual request is generated. If you're new to extended logging, [sched_op_debug] is easier to start with. |
![]() ![]() Send message Joined: 17 May 99 Posts: 271 Credit: 5,852,934 RAC: 0 ![]() |
I pressed "Update" and got this (one MB task - still no AP work): How do I read this? 17/12/2010 12:32:02 SETI@home update requested by user 17/12/2010 12:32:02 [wfd] Request work fetch: project updated by user 17/12/2010 12:32:03 [wfd] Request work fetch: Backoff ended for SETI@home 17/12/2010 12:32:04 [wfd]: work fetch start 17/12/2010 12:32:04 SETI@home chosen: idle instance CPU: 2.96 inst, 3374491.91 sec 17/12/2010 12:32:04 [wfd] ------- start work fetch state ------- 17/12/2010 12:32:04 [wfd] target work buffer: 864.00 + 864000.00 sec 17/12/2010 12:32:04 [wfd] CPU: shortfall 3374491.91 nidle 2.96 saturated 0.00 busy 0.00 RS fetchable 100.00 runnable 0.00 17/12/2010 12:32:04 SETI@home [wfd] CPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00 17/12/2010 12:32:04 [wfd] NVIDIA GPU: shortfall 30187.42 nidle 0.00 saturated 834676.58 busy 0.00 RS fetchable 100.00 runnable 100.00 17/12/2010 12:32:04 SETI@home [wfd] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00 17/12/2010 12:32:04 [wfd] ATI GPU: shortfall 813286.97 nidle 0.00 saturated 47215.93 busy 0.00 RS fetchable 100.00 runnable 100.00 17/12/2010 12:32:04 SETI@home [wfd] ATI GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00 17/12/2010 12:32:04 SETI@home [wfd] overall LTD -6914445.25 17/12/2010 12:32:04 [wfd] ------- end work fetch state ------- 17/12/2010 12:32:04 SETI@home [wfd] request: 3374491.91 sec CPU (3374491.91 sec, 2.96) NVIDIA GPU (30187.42 sec, 0.00) ATI GPU (813286.97 sec, 0.00) 17/12/2010 12:32:04 SETI@home Sending scheduler request: Requested by user. 17/12/2010 12:32:04 SETI@home Requesting new tasks for CPU and GPU 17/12/2010 12:32:13 SETI@home Scheduler request completed: got 1 new tasks 17/12/2010 12:32:13 [wfd] Request work fetch: RPC complete 17/12/2010 12:32:15 SETI@home Started download of 31my10ab.29811.397821.8.10.248 17/12/2010 12:32:18 [wfd]: work fetch start 17/12/2010 12:32:18 [wfd] ------- start work fetch state ------- 17/12/2010 12:32:18 [wfd] target work buffer: 864.00 + 864000.00 sec 17/12/2010 12:32:18 [wfd] CPU: shortfall 3373769.91 nidle 2.96 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00 17/12/2010 12:32:18 SETI@home [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred) 17/12/2010 12:32:18 [wfd] NVIDIA GPU: shortfall 12328.37 nidle 0.00 saturated 852535.63 busy 0.00 RS fetchable 0.00 runnable 100.00 17/12/2010 12:32:18 SETI@home [wfd] NVIDIA GPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred) 17/12/2010 12:32:18 [wfd] ATI GPU: shortfall 813279.33 nidle 0.00 saturated 47223.66 busy 0.00 RS fetchable 0.00 runnable 100.00 17/12/2010 12:32:18 SETI@home [wfd] ATI GPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred) 17/12/2010 12:32:18 SETI@home [wfd] overall LTD -6996498.80 17/12/2010 12:32:18 [wfd] ------- end work fetch state ------- 17/12/2010 12:32:18 [wfd] No project chosen for work fetch Is there maybe a FAQ that explains all those terms (LTD, etc.) ? |
![]() ![]() Send message Joined: 23 Jul 99 Posts: 2412 Credit: 351,996 RAC: 0 |
FAQ? debug messages are for the developer ;) I pressed "Update" and got this (one MB task - still no AP work): ok, let me try from what I gleaned from running that, trying to proof a bug... 17/12/2010 12:32:04 [wfd] ------- start work fetch state ------- First there's your cache settings in seconds 'target work buffer' 'shortfall' - how much is missing (taking into account projected runtimes of tasks on board) 'nidle' how many cores have run out of work 'saturated' how much work is left (corrected for on_frac,active_frac and ncpu) 'busy' how long it expects to run in EDF 'RS fetchable and runnable' uh. uncorrected sum of resource share of the projects which can fetch/run (probably taking into account LTD) Carola ------- I'm multilingual - I can misunderstand people in several languages! |
![]() ![]() Send message Joined: 17 May 99 Posts: 271 Credit: 5,852,934 RAC: 0 ![]() |
'RS fetchable and runnable' uh. uncorrected sum of resource share of the projects which can fetch/run (probably taking into account LTD) - I am running only S@H - nothing else. - This "fetchable 100.00 runnable 0.00" is not really clear to me. Can you explain some more? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Is there maybe a FAQ that explains all those terms (LTD, etc.) ? LTD is 'Long Term Debt', used to prioritise work fetch between projects when you're running more then one. This "fetchable 100.00 runnable 0.00" is not really clear to me. Can you explain some more? Fetchable - you could potentially ask for work (not backed off, in comms deferral, blocked by preferences, or anything like that): "100" - the raw resource share. Runnable - would be the raw resource share if it could be run. Since it's 0, it's telling you that it can't be run - because you haven't got any work allocated at the moment, in this case. |
![]() ![]() Send message Joined: 17 May 99 Posts: 271 Credit: 5,852,934 RAC: 0 ![]() |
So ... the question is still: Why am I not getting work? "Results ready to send" is between 3500 - 4000 since several hours. |
![]() Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 ![]() |
Cause you're still in "comm deferred" mode: 17/12/2010 12:32:18 [wfd] ATI GPU: shortfall 813279.33 nidle 0.00 saturated 47223.66 busy 0.00 RS fetchable 0.00 runnable 100.00 Your ATI GPU is saturated for the next 47223.66 seconds. So.. are you sure you're not running Collatz on that system as well? Or Milkyway? or both? |
![]() ![]() Send message Joined: 17 May 99 Posts: 271 Credit: 5,852,934 RAC: 0 ![]() |
As a workaround, I re-scheduled ALL my CPU AP tasks to ATI GPU. - Why I only got 1 AP ATI task, and about 30 AP CPU tasks is beyond my comprehension. Since GPU is much faster then CPU ... I don't even wanna know :) - So after the re-schedule, CPU is completely dry. And hence should request work, no? But it doesn't ... |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
... I'll tell you anyhow :) BOINC doesn't assume GPU is faster. The design for stock applications is that the project app_plan code will define relative speeds, and for anonymous platform applications the app_info.xml <flops> fields serve the same purpose. If there are no <flops> entries, BOINC assumes each application is the same speed, and (in that case only) the order they're in the app_info.xml will determine which is chosen by the Scheduler code as the "best" application to which available work will be sent first. Joe |
![]() ![]() Send message Joined: 17 May 99 Posts: 271 Credit: 5,852,934 RAC: 0 ![]() |
Thanks :) |
![]() ![]() Send message Joined: 23 Jul 99 Posts: 2412 Credit: 351,996 RAC: 0 |
If there are no <flops> entries, BOINC assumes each application is the same speed, and (in that case only) the order they're in the app_info.xml will determine which is chosen by the Scheduler code as the "best" application to which available work will be sent first. So, until I manage to put some flops entries into app_info, I can preferentially get GPU work by swapping the entries around? Where's my editor... Carola ------- I'm multilingual - I can misunderstand people in several languages! |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
If there are no <flops> entries, BOINC assumes each application is the same speed, and (in that case only) the order they're in the app_info.xml will determine which is chosen by the Scheduler code as the "best" application to which available work will be sent first. I hope so, at times when there isn't enough available work to satisfy a work request fully. I'll note that I did not chase the logic back all the way, my statement assumes the order from app_info.xml will control the order in the request to the Scheduler. Joe |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I'll note that I did not chase the logic back all the way, my statement assumes the order from app_info.xml will control the order in the request to the Scheduler.Joe I thought we tested that one out during one of the previous outages? Negative? I would need time to track down the reference - I think it was more than a year ago. Possibly at Beta? |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.