Work fetch debug enabled - how to read the logs?

Message boards : Number crunching : Work fetch debug enabled - how to read the logs?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1056936 - Posted: 17 Dec 2010, 11:09:38 UTC
Last modified: 17 Dec 2010, 11:11:29 UTC

I have my host configured to accept GPU work for CUDA MB and ATI OpenCL AP. Basically this works fine.

But I'm not getting GPU work for AP (only got 1 unit so far!). For AP CPU and MB GPU I'm getting lots of units.

Problem seems to be the BOINC scheduler.

Here is a piece of log with work_fetch_debug enabled.

I just don't know how to interpret this.

- What does shortfall / nidle / saturated / busy / fetchable / runnable mean?

- Why am I not getting GPU AP work?

Help! :)

17/12/2010 12:06:23 [wfd]: work fetch start
17/12/2010 12:06:23 [wfd] ------- start work fetch state -------
17/12/2010 12:06:23 [wfd] target work buffer: 864.00 + 864000.00 sec
17/12/2010 12:06:23 [wfd] CPU: shortfall 3372168.73 nidle 2.96 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00
17/12/2010 12:06:23 SETI@home [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 480.00 (comm deferred)
17/12/2010 12:06:23 [wfd] NVIDIA GPU: shortfall 0.00 nidle 0.00 saturated 891797.44 busy 0.00 RS fetchable 0.00 runnable 100.00
17/12/2010 12:06:23 SETI@home [wfd] NVIDIA GPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred)
17/12/2010 12:06:23 [wfd] ATI GPU: shortfall 812171.29 nidle 0.00 saturated 48560.68 busy 0.00 RS fetchable 0.00 runnable 100.00
17/12/2010 12:06:23 SETI@home [wfd] ATI GPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 60.00 (comm deferred)
17/12/2010 12:06:23 SETI@home [wfd] overall LTD -7244091.73
17/12/2010 12:06:23 [wfd] ------- end work fetch state -------
17/12/2010 12:06:23 [wfd] No project chosen for work fetch
ID: 1056936 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1056939 - Posted: 17 Dec 2010, 11:23:17 UTC - in response to Message 1056936.  

Just at the moment, you're not asking for work, so you sure as heck won't get any.

You are (comm deferred) at seti: wait until the five-minute server-imposed backoff has ended, or, if that's cleared (see Projects tab in BOINC Manager), click 'update'.

[wfd] becomes clearer if you look at an iteration where an actual request is generated. If you're new to extended logging, [sched_op_debug] is easier to start with.
ID: 1056939 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1056941 - Posted: 17 Dec 2010, 11:33:19 UTC - in response to Message 1056939.  
Last modified: 17 Dec 2010, 12:04:46 UTC

I pressed "Update" and got this (one MB task - still no AP work):

How do I read this?

17/12/2010 12:32:02 SETI@home update requested by user
17/12/2010 12:32:02 [wfd] Request work fetch: project updated by user
17/12/2010 12:32:03 [wfd] Request work fetch: Backoff ended for SETI@home
17/12/2010 12:32:04 [wfd]: work fetch start
17/12/2010 12:32:04 SETI@home chosen: idle instance CPU: 2.96 inst, 3374491.91 sec
17/12/2010 12:32:04 [wfd] ------- start work fetch state -------
17/12/2010 12:32:04 [wfd] target work buffer: 864.00 + 864000.00 sec
17/12/2010 12:32:04 [wfd] CPU: shortfall 3374491.91 nidle 2.96 saturated 0.00 busy 0.00 RS fetchable 100.00 runnable 0.00
17/12/2010 12:32:04 SETI@home [wfd] CPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
17/12/2010 12:32:04 [wfd] NVIDIA GPU: shortfall 30187.42 nidle 0.00 saturated 834676.58 busy 0.00 RS fetchable 100.00 runnable 100.00
17/12/2010 12:32:04 SETI@home [wfd] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
17/12/2010 12:32:04 [wfd] ATI GPU: shortfall 813286.97 nidle 0.00 saturated 47215.93 busy 0.00 RS fetchable 100.00 runnable 100.00
17/12/2010 12:32:04 SETI@home [wfd] ATI GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
17/12/2010 12:32:04 SETI@home [wfd] overall LTD -6914445.25
17/12/2010 12:32:04 [wfd] ------- end work fetch state -------
17/12/2010 12:32:04 SETI@home [wfd] request: 3374491.91 sec CPU (3374491.91 sec, 2.96) NVIDIA GPU (30187.42 sec, 0.00) ATI GPU (813286.97 sec, 0.00)
17/12/2010 12:32:04 SETI@home Sending scheduler request: Requested by user.
17/12/2010 12:32:04 SETI@home Requesting new tasks for CPU and GPU
17/12/2010 12:32:13 SETI@home Scheduler request completed: got 1 new tasks
17/12/2010 12:32:13 [wfd] Request work fetch: RPC complete
17/12/2010 12:32:15 SETI@home Started download of 31my10ab.29811.397821.8.10.248
17/12/2010 12:32:18 [wfd]: work fetch start
17/12/2010 12:32:18 [wfd] ------- start work fetch state -------
17/12/2010 12:32:18 [wfd] target work buffer: 864.00 + 864000.00 sec
17/12/2010 12:32:18 [wfd] CPU: shortfall 3373769.91 nidle 2.96 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00
17/12/2010 12:32:18 SETI@home [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred)
17/12/2010 12:32:18 [wfd] NVIDIA GPU: shortfall 12328.37 nidle 0.00 saturated 852535.63 busy 0.00 RS fetchable 0.00 runnable 100.00
17/12/2010 12:32:18 SETI@home [wfd] NVIDIA GPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred)
17/12/2010 12:32:18 [wfd] ATI GPU: shortfall 813279.33 nidle 0.00 saturated 47223.66 busy 0.00 RS fetchable 0.00 runnable 100.00
17/12/2010 12:32:18 SETI@home [wfd] ATI GPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred)
17/12/2010 12:32:18 SETI@home [wfd] overall LTD -6996498.80
17/12/2010 12:32:18 [wfd] ------- end work fetch state -------
17/12/2010 12:32:18 [wfd] No project chosen for work fetch


Is there maybe a FAQ that explains all those terms (LTD, etc.) ?
ID: 1056941 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1056950 - Posted: 17 Dec 2010, 12:28:17 UTC - in response to Message 1056941.  

FAQ? debug messages are for the developer ;)

I pressed "Update" and got this (one MB task - still no AP work):

How do I read this?


ok, let me try from what I gleaned from running that, trying to proof a bug...

17/12/2010 12:32:04 [wfd] ------- start work fetch state -------
17/12/2010 12:32:04 [wfd] target work buffer: 864.00 + 864000.00 sec
17/12/2010 12:32:04 [wfd] CPU: shortfall 3374491.91 nidle 2.96 saturated 0.00 busy 0.00 RS fetchable 100.00 runnable 0.00


First there's your cache settings in seconds 'target work buffer'

'shortfall' - how much is missing (taking into account projected runtimes of tasks on board)
'nidle' how many cores have run out of work
'saturated' how much work is left (corrected for on_frac,active_frac and ncpu)
'busy' how long it expects to run in EDF
'RS fetchable and runnable' uh. uncorrected sum of resource share of the projects which can fetch/run (probably taking into account LTD)
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1056950 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1056951 - Posted: 17 Dec 2010, 12:45:02 UTC - in response to Message 1056950.  

'RS fetchable and runnable' uh. uncorrected sum of resource share of the projects which can fetch/run (probably taking into account LTD)


- I am running only S@H - nothing else.

- This "fetchable 100.00 runnable 0.00" is not really clear to me. Can you explain some more?
ID: 1056951 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1056952 - Posted: 17 Dec 2010, 12:52:39 UTC - in response to Message 1056941.  

Is there maybe a FAQ that explains all those terms (LTD, etc.) ?

LTD is 'Long Term Debt', used to prioritise work fetch between projects when you're running more then one.

This "fetchable 100.00 runnable 0.00" is not really clear to me. Can you explain some more?

Fetchable - you could potentially ask for work (not backed off, in comms deferral, blocked by preferences, or anything like that): "100" - the raw resource share.

Runnable - would be the raw resource share if it could be run. Since it's 0, it's telling you that it can't be run - because you haven't got any work allocated at the moment, in this case.
ID: 1056952 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1056956 - Posted: 17 Dec 2010, 13:14:54 UTC - in response to Message 1056952.  


Fetchable - you could potentially ask for work ...

Runnable - ... it's telling you that it can't be run - because you haven't got any work allocated at the moment, in this case.


So ... the question is still: Why am I not getting work? "Results ready to send" is between 3500 - 4000 since several hours.
ID: 1056956 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1056957 - Posted: 17 Dec 2010, 13:23:57 UTC - in response to Message 1056956.  

Cause you're still in "comm deferred" mode:
17/12/2010 12:32:18 [wfd] ATI GPU: shortfall 813279.33 nidle 0.00 saturated 47223.66 busy 0.00 RS fetchable 0.00 runnable 100.00
17/12/2010 12:32:18 SETI@home [wfd] ATI GPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred)

Your ATI GPU is saturated for the next 47223.66 seconds.

So.. are you sure you're not running Collatz on that system as well? Or Milkyway? or both?
ID: 1056957 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1056962 - Posted: 17 Dec 2010, 13:42:56 UTC - in response to Message 1056957.  


Your ATI GPU is saturated for the next 47223.66 seconds.


As a workaround, I re-scheduled ALL my CPU AP tasks to ATI GPU.

- Why I only got 1 AP ATI task, and about 30 AP CPU tasks is beyond my comprehension. Since GPU is much faster then CPU ... I don't even wanna know :)

- So after the re-schedule, CPU is completely dry. And hence should request work, no? But it doesn't ...
ID: 1056962 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1057052 - Posted: 17 Dec 2010, 16:42:52 UTC - in response to Message 1056962.  

...
- Why I only got 1 AP ATI task, and about 30 AP CPU tasks is beyond my comprehension. Since GPU is much faster then CPU ... I don't even wanna know :)...

I'll tell you anyhow :)

BOINC doesn't assume GPU is faster. The design for stock applications is that the project app_plan code will define relative speeds, and for anonymous platform applications the app_info.xml <flops> fields serve the same purpose.

If there are no <flops> entries, BOINC assumes each application is the same speed, and (in that case only) the order they're in the app_info.xml will determine which is chosen by the Scheduler code as the "best" application to which available work will be sent first.
                                                                   Joe
ID: 1057052 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1057056 - Posted: 17 Dec 2010, 16:49:50 UTC - in response to Message 1057052.  


I'll tell you anyhow :)


Thanks :)
ID: 1057056 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1057067 - Posted: 17 Dec 2010, 17:12:40 UTC - in response to Message 1057052.  

If there are no <flops> entries, BOINC assumes each application is the same speed, and (in that case only) the order they're in the app_info.xml will determine which is chosen by the Scheduler code as the "best" application to which available work will be sent first.


So, until I manage to put some flops entries into app_info, I can preferentially get GPU work by swapping the entries around? Where's my editor...
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1057067 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1057257 - Posted: 17 Dec 2010, 22:57:18 UTC - in response to Message 1057067.  

If there are no <flops> entries, BOINC assumes each application is the same speed, and (in that case only) the order they're in the app_info.xml will determine which is chosen by the Scheduler code as the "best" application to which available work will be sent first.


So, until I manage to put some flops entries into app_info, I can preferentially get GPU work by swapping the entries around? Where's my editor...

I hope so, at times when there isn't enough available work to satisfy a work request fully.

I'll note that I did not chase the logic back all the way, my statement assumes the order from app_info.xml will control the order in the request to the Scheduler.
                                                                   Joe
ID: 1057257 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1057282 - Posted: 18 Dec 2010, 0:00:20 UTC - in response to Message 1057257.  

I'll note that I did not chase the logic back all the way, my statement assumes the order from app_info.xml will control the order in the request to the Scheduler.
                                                                   Joe

I thought we tested that one out during one of the previous outages? Negative?

I would need time to track down the reference - I think it was more than a year ago. Possibly at Beta?
ID: 1057282 · Report as offensive

Message boards : Number crunching : Work fetch debug enabled - how to read the logs?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.