Concerned over cudu wu completion times??


log in

Advanced search

Message boards : Number crunching : Concerned over cudu wu completion times??

Author Message
Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1030015 - Posted: 1 Sep 2010, 19:09:36 UTC
Last modified: 1 Sep 2010, 19:12:52 UTC

Is this normal or a symptom of a problem?

Im seeing some cuda wu's take an hour to complete!
I have no VLAR or VHARs assigned to the gpu
GPU 1 = gtx480
GPU 2&3 = gtx 470

They were clocked alot higher, but I dropped off the clocks as temps were getting a bit high
Is there an easy way to find out which card is crunching the longer units, to see if it the same card each time?
GPU usage is in general around 85-90% on average..


____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4954
Credit: 72,899,803
RAC: 11,531
Australia
Message 1030017 - Posted: 1 Sep 2010, 19:18:34 UTC

You may have been swamped with a bunch of VLAR resends like I was after the prior outage was over. probably ghosts from a while back timing out, since new tasks don't issue VLARs to GPU )well not suppsoed to anyway ).

Here, I clock the 480 @ 801MHz, 'Normal' mid angle ranges take 12-13 mins running 2 at a time. When VLARs hit they were more like more than an hour for two.

I decided to just leave the machine to munch through them, since it could handle them OK ... Not much fun, though I seem to be through the worst of them now ;)

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1030019 - Posted: 1 Sep 2010, 19:32:17 UTC - in response to Message 1030017.

You may have been swamped with a bunch of VLAR resends like I was after the prior outage was over. probably ghosts from a while back timing out, since new tasks don't issue VLARs to GPU )well not suppsoed to anyway ).

Here, I clock the 480 @ 801MHz, 'Normal' mid angle ranges take 12-13 mins running 2 at a time. When VLARs hit they were more like more than an hour for two.

I decided to just leave the machine to munch through them, since it could handle them OK ... Not much fun, though I seem to be through the worst of them now ;)

Jason


Using Fred's rescheduler, it is showing me as having 14 VLAR's all assigned to the cpu? I assumed that meant that I had no VLAR's on the gpu?
I had both my 470's clocked to 875 @ 1037mV, but the 480 won't go above 750 without producing errors

____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4954
Credit: 72,899,803
RAC: 11,531
Australia
Message 1030023 - Posted: 1 Sep 2010, 19:44:11 UTC - in response to Message 1030019.
Last modified: 1 Sep 2010, 19:47:26 UTC

Using Fred's rescheduler, it is showing me as having 14 VLAR's all assigned to the cpu? I assumed that meant that I had no VLAR's on the gpu?
I had both my 470's clocked to 875 @ 1037mV, but the 480 won't go above 750 without producing errors


Fair enough. msi afterburner doesn't show downclocking in your image either, and you're using the newer driver.

With that many cards running I would back off to 2 tasks per card if running x32f, but that shouldn't account for that much difference either, unless the PCIe bus is under heavy contention or something.

You can look in the slot directories for stderr output to see the angle range & selected GPU for tasks in progress. client_state holds stderr text contents ready for upload/report IIRC.

Otherwise, if none of that reveals anything obvious, I'd do just what you're doing, winding things back to stock & check everything bit by bit.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1030029 - Posted: 1 Sep 2010, 20:04:56 UTC - in response to Message 1030023.
Last modified: 1 Sep 2010, 20:17:06 UTC

OK, cheers for that - will see what I can find..

edit

I have 18 slot folders!!

Folder 0 is seti
1 is MW@home
2 is seti

example output from stderr in folder 0

setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 480, 1503 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 15
clockRate = 810000
Device 2: GeForce GTX 470, 1248 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 14
clockRate = 810000
Device 3: GeForce GTX 470, 1248 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 14
clockRate = 810000
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 480 is okay
SETI@home using CUDA accelerated device GeForce GTX 480
Priority of process raised successfully
Priority of worker thread raised successfully
size 8 fft, is a freaky powerspectrum
size 16 fft, is a cufft plan
size 32 fft, is a cufft plan
size 64 fft, is a cufft plan
size 128 fft, is a cufft plan
size 256 fft, is a freaky powerspectrum
size 512 fft, is a freaky powerspectrum
size 1024 fft, is a freaky powerspectrum
size 2048 fft, is a cufft plan
size 4096 fft, is a cufft plan
size 8192 fft, is a cufft plan
size 16384 fft, is a cufft plan
size 32768 fft, is a cufft plan
size 65536 fft, is a cufft plan
size 131072 fft, is a cufft plan

) _ _ _)_ o _ _
(__ (_( ) ) (_( (_ ( (_ (
not bad for a human... _)

Multibeam x32f Preview, Cuda 3.0

Work Unit Info:
...............
WU true angle range is : 0.423779

will carry on looking at the others..

OK - I realise that each slot represents a task in progress - how can I link this to one of the tasks that took 57 mins?
I presume that will be in the client_state xml? What do I need to look for to identify one of the long running tasks in the client_state xml?
____________

Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1030037 - Posted: 1 Sep 2010, 20:32:18 UTC

Not sure if I am going down the right road here, but taking the name of a task that took 56:53 mins, I then searched the client_state.xml and below is every reference I found for that task....

task = 01jn10ac.760.68450.16.10.141



<file_info>
<name>01jn10ac.760.68450.16.10.141</name>
<nbytes>375194.000000</nbytes>
<max_nbytes>0.000000</max_nbytes>
<md5_cksum>9eba64fc7c47fbe267c18ee2ffcf4c11</md5_cksum>
<status>1</status>
<url>http://boinc2.ssl.berkeley.edu/sah/download_fanout/97/01jn10ac.760.68450.16.10.141</url>
</file_info>


<file_info>
<name>01jn10ac.760.68450.16.10.141_0_0</name>
<nbytes>23643.000000</nbytes>
<max_nbytes>65536.000000</max_nbytes>
<md5_cksum>c8b1eca16e1cb3faa0f36919b32ed4ac</md5_cksum>
<generated_locally/>
<status>1</status>
<upload_when_present/>
<url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url>
<persistent_file_xfer>
<num_retries>0</num_retries>
<first_request_time>1283370201.441504</first_request_time>
<next_request_time>1283370201.441504</next_request_time>
<time_so_far>0.000000</time_so_far>
<last_bytes_xferred>0.000000</last_bytes_xferred>
</persistent_file_xfer>
<signed_xml>
<name>01jn10ac.760.68450.16.10.141_0_0</name>
<generated_locally/>
<upload_when_present/>
<max_nbytes>65536</max_nbytes>
<url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url>
</signed_xml>
<xml_signature>
c0700120c10e4b2b270e774536f4ef042e8e6856f68f47729bda526b73ad4c16
6b499873b789a5f25cc9e8d79a85943af1a78a177f05418a23b3abfac5e0a6e2
9cf00bbb0128e3ca2250641f4e3f45e05b9d1366ea2f06d3a2e396cf50b964b6
48e8b94b8a4f13d6551742cbb220b86215531266d4ff82bf066399a5a9fb5152


<workunit>
<name>01jn10ac.760.68450.16.10.141</name>
<app_name>setiathome_enhanced</app_name>
<version_num>608</version_num>
<rsc_fpops_est>4082840316064.331500</rsc_fpops_est>
<rsc_fpops_bound>40828403160643.312000</rsc_fpops_bound>
<rsc_memory_bound>33554432.000000</rsc_memory_bound>
<rsc_disk_bound>33554432.000000</rsc_disk_bound>
<file_ref>
<file_name>01jn10ac.760.68450.16.10.141</file_name>
<open_name>work_unit.sah</open_name>
</file_ref>
</workunit>




/result>
<result>
<name>01jn10ac.760.68450.16.10.141_0</name>
<final_cpu_time>327.508500</final_cpu_time>
<final_elapsed_time>3413.123220</final_elapsed_time>
<exit_status>0</exit_status>
<state>4</state>
<platform>windows_intelx86</platform>
<version_num>608</version_num>
<plan_class>cuda</plan_class>
<fpops_cumulative>95706580000000.000000</fpops_cumulative>
<stderr_out>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 480, 1503 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 15
clockRate = 810000
Device 2: GeForce GTX 470, 1248 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 14
clockRate = 810000
Device 3: GeForce GTX 470, 1248 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 14
clockRate = 810000
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 480 is okay
SETI@home using CUDA accelerated device GeForce GTX 480
Priority of process raised successfully
Priority of worker thread raised successfully
size 8 fft, is a freaky powerspectrum
size 16 fft, is a cufft plan
size 32 fft, is a cufft plan
size 64 fft, is a cufft plan
size 128 fft, is a cufft plan
size 256 fft, is a freaky powerspectrum
size 512 fft, is a freaky powerspectrum
size 1024 fft, is a freaky powerspectrum
size 2048 fft, is a cufft plan
size 4096 fft, is a cufft plan
size 8192 fft, is a cufft plan
size 16384 fft, is a cufft plan
size 32768 fft, is a cufft plan
size 65536 fft, is a cufft plan
size 131072 fft, is a cufft plan

) _ _ _)_ o _ _
(__ (_( ) ) (_( (_ ( (_ (
not bad for a human... _)

Multibeam x32f Preview, Cuda 3.0

Work Unit Info:
...............
WU true angle range is : 0.419844

Flopcounter: 33576273578419.793000

Spike count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 0
called boinc_finish

____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4954
Credit: 72,899,803
RAC: 11,531
Australia
Message 1030041 - Posted: 1 Sep 2010, 20:41:24 UTC
Last modified: 1 Sep 2010, 20:42:49 UTC

OK, that is choosing the 480, and running a mid range task. I see no problem with execution itself, but time should be more like 8 minutes (if running a single task, or 12-15 for two at a time).

Anything else using the card or preempting Boinc ? like an AV program interfering or automatic backup or something ? Anything else using heavy CPU in task manager while the Cuda tasks are running, besides AKv8b etc ?
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1030044 - Posted: 1 Sep 2010, 20:50:39 UTC - in response to Message 1030041.

Ok - here is another - at least it isn't the 480 this time

result>
<name>01jn10ac.760.68450.16.10.91_1</name>
<final_cpu_time>324.201300</final_cpu_time>
<final_elapsed_time>3667.743784</final_elapsed_time>
<exit_status>0</exit_status>
<state>4</state>
<platform>windows_intelx86</platform>
<version_num>608</version_num>
<plan_class>cuda</plan_class>
<fpops_cumulative>95706580000000.000000</fpops_cumulative>
<stderr_out>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 480, 1503 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 15
clockRate = 810000
Device 2: GeForce GTX 470, 1248 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 14
clockRate = 810000
Device 3: GeForce GTX 470, 1248 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 14
clockRate = 810000
setiathome_CUDA: CUDA Device 2 specified, checking...
Device 2: GeForce GTX 470 is okay
SETI@home using CUDA accelerated device GeForce GTX 470
Priority of process raised successfully
Priority of worker thread raised successfully
size 8 fft, is a freaky powerspectrum
size 16 fft, is a cufft plan
size 32 fft, is a cufft plan
size 64 fft, is a cufft plan
size 128 fft, is a cufft plan
size 256 fft, is a freaky powerspectrum
size 512 fft, is a freaky powerspectrum
size 1024 fft, is a freaky powerspectrum
size 2048 fft, is a cufft plan
size 4096 fft, is a cufft plan
size 8192 fft, is a cufft plan
size 16384 fft, is a cufft plan
size 32768 fft, is a cufft plan
size 65536 fft, is a cufft plan
size 131072 fft, is a cufft plan

) _ _ _)_ o _ _
(__ (_( ) ) (_( (_ ( (_ (
not bad for a human... _)

Multibeam x32f Preview, Cuda 3.0

Work Unit Info:
...............
WU true angle range is : 0.419844

Flopcounter: 33576273578467.793000

Spike count: 2
Pulse count: 1
Triplet count: 0
Gaussian count: 0
called boinc_finish

I cant see any heavy processes running, but maybe it could be one of two things?

Either I haven't allowed enough cpu to each gpu?
(currently set to <avg_ncpus>0.111120</avg_ncpus>)

I have a quad core phenom II, the gpu's take a total of 1.00008 cpu cores and I crunch a mixture of MW@home, Einstein@home and Rosetta on the remining 3 cores, so the cpu is pretty much maxed out all the time

____________

Profile Tim Norton
Volunteer tester
Avatar
Send message
Joined: 2 Jun 99
Posts: 835
Credit: 33,540,164
RAC: 0
United Kingdom
Message 1030047 - Posted: 1 Sep 2010, 21:00:56 UTC - in response to Message 1030037.

It might be a combination of wu variability and you were possibly using your pc for something else at the time the longer running units were being processed

looking back over the last 5 days worth of GPU crunching on one of my 260's i have a variation of 1:19 through to 23:20 elapsed time - usually between 12 and 17 minutes

if you are crunching more than one wu at once of the "longer" running wu then there is an additional over head

if you also look at the names of the wu that are long runners you can see (your screen shot) they are all from the same "batch"

i usually see batches running very similar times

i suspect that you have just got a batch of long runners

If you use BoincTasks you could order the wu by completed time and see if they are the latest run wu or not -if not i suspect you do not have a problem

anyway just a few ideas and observations
____________
Tim

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4954
Credit: 72,899,803
RAC: 11,531
Australia
Message 1030051 - Posted: 1 Sep 2010, 21:13:26 UTC - in response to Message 1030044.

Either I haven't allowed enough cpu to each gpu?
(currently set to <avg_ncpus>0.111120</avg_ncpus>)

I have a quad core phenom II, the gpu's take a total of 1.00008 cpu cores and I crunch a mixture of MW@home, Einstein@home and Rosetta on the remining 3 cores, so the cpu is pretty much maxed out all the time


Just a theory then:
I think it's 'fighting' with the other projects. For testing/isolation purposes: make each multibeam instance grab a whole GPU & whole CPU core - each. see if task times improve... the 480 should be somewhere around 8 minutes for mid AR...

If they improve but not 'enough', then suspend the other projects & see what happens. My guess is that 'something' is being greedy, probably one of the other projects.

Rather than initiate a process priority war between Boinc project developers, it'sd probably be better to see if you can remedy the situation using something like 'Process Lasso' to sanitise the priorities of the culprit. 'Process explorer' would be enough to reveal what processes may be running elevated & so blocking the Cuda threads.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1030052 - Posted: 1 Sep 2010, 21:14:09 UTC

Thanks Tim..

This machine is purely used for crunching..
I dont think they are all from the same range..



I will keep any eye on it for the next day and see how things go...
____________

Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1030415 - Posted: 3 Sep 2010, 11:26:50 UTC

OK, its early days, but I think after 2 days of testing that I might have got to the bottom of my GPU workunits taking 40-50 mins.
I went down the whole reducing clock speeds route, which did diddly squat..
Watching the task manager, all 4 cpu's were maxed out 100%, and the clue came from watching the load time for the cpu to pass a wu to the gpu, on some wu's, it was taking 2 mins before I saw any sign of crunching. This coupled with some earlier comments over gpu usage levels (mine were all over the place!) made me think that the cpu was possibly the cause
Playing around with the <avg_ncpu> value, it would seem that the long workunit times was down to a cpu bottleneck, one cpu couldn't keep the gpu's fed fast enough.
After playing around with multiple wu/card combo's I have set the following

<avg_ncpu>0.223333</avg_ncpu>
This means I have 2 cpu's crunching and 2 feeding the gpu's

I lose an additional processor, but in 12 hrs I have not had a wu take longer than 28:49 :-) at stock clocks
Task manager is showing average cpu usage of around 92-95%, that is with 2 cpu cores crunching and 2 feeding the gpu's which are running 3 wu's each
My gpu usage is also steady at around 90% though it does dip to the 60% mark every now and then

This means that I have slightly more playing around with the <avg_ncpu> value as there is a 5% headroom, which is wasted at the moment.
I will of course be ramping the clocks on the gpu right back up to the 800's as soon as I can..

Thank god for that - it was really bugging me!
____________

Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1030422 - Posted: 3 Sep 2010, 13:11:58 UTC

Might have spoken too soon

Im seeing my gpu usage drop back down for extended periods, even though everything seems to be OK

Is this normal for 3 cards running 3 wu's per card???


____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4954
Credit: 72,899,803
RAC: 11,531
Australia
Message 1030462 - Posted: 3 Sep 2010, 16:16:29 UTC - in response to Message 1030422.

Not unless you're running out of work, or something's interfering. My 480 prefers 2 tasks at a time, so maybe reducing to 2 per GPU would help. If not, something is interfering like some other process or project. If you can't find what, during a period where you catch it, PM me a HiJackThis log taken during that period.

Also double check the CPU isn't downclocking, with CPU-Z.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Message boards : Number crunching : Concerned over cudu wu completion times??

Copyright © 2014 University of California