| Author |
Message |
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 44,526,287 RAC: 61,661

|
|
Is this normal or a symptom of a problem?
Im seeing some cuda wu's take an hour to complete!
I have no VLAR or VHARs assigned to the gpu
GPU 1 = gtx480
GPU 2&3 = gtx 470
They were clocked alot higher, but I dropped off the clocks as temps were getting a bit high
Is there an easy way to find out which card is crunching the longer units, to see if it the same card each time?
GPU usage is in general around 85-90% on average..
____________
|
|
|
jason_gee Volunteer developer Volunteer tester
 Send message
Joined: 24 Nov 06 Posts: 4059 Credit: 60,515,078 RAC: 57,317

|
|
You may have been swamped with a bunch of VLAR resends like I was after the prior outage was over. probably ghosts from a while back timing out, since new tasks don't issue VLARs to GPU )well not suppsoed to anyway ).
Here, I clock the 480 @ 801MHz, 'Normal' mid angle ranges take 12-13 mins running 2 at a time. When VLARs hit they were more like more than an hour for two.
I decided to just leave the machine to munch through them, since it could handle them OK ... Not much fun, though I seem to be through the worst of them now ;)
Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin
|
|
|
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 44,526,287 RAC: 61,661

|
You may have been swamped with a bunch of VLAR resends like I was after the prior outage was over. probably ghosts from a while back timing out, since new tasks don't issue VLARs to GPU )well not suppsoed to anyway ).
Here, I clock the 480 @ 801MHz, 'Normal' mid angle ranges take 12-13 mins running 2 at a time. When VLARs hit they were more like more than an hour for two.
I decided to just leave the machine to munch through them, since it could handle them OK ... Not much fun, though I seem to be through the worst of them now ;)
Jason
Using Fred's rescheduler, it is showing me as having 14 VLAR's all assigned to the cpu? I assumed that meant that I had no VLAR's on the gpu?
I had both my 470's clocked to 875 @ 1037mV, but the 480 won't go above 750 without producing errors
____________
|
|
|
jason_gee Volunteer developer Volunteer tester
 Send message
Joined: 24 Nov 06 Posts: 4059 Credit: 60,515,078 RAC: 57,317

|
Using Fred's rescheduler, it is showing me as having 14 VLAR's all assigned to the cpu? I assumed that meant that I had no VLAR's on the gpu?
I had both my 470's clocked to 875 @ 1037mV, but the 480 won't go above 750 without producing errors
Fair enough. msi afterburner doesn't show downclocking in your image either, and you're using the newer driver.
With that many cards running I would back off to 2 tasks per card if running x32f, but that shouldn't account for that much difference either, unless the PCIe bus is under heavy contention or something.
You can look in the slot directories for stderr output to see the angle range & selected GPU for tasks in progress. client_state holds stderr text contents ready for upload/report IIRC.
Otherwise, if none of that reveals anything obvious, I'd do just what you're doing, winding things back to stock & check everything bit by bit.
Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin
|
|
|
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 44,526,287 RAC: 61,661

|
|
OK, cheers for that - will see what I can find..
edit
I have 18 slot folders!!
Folder 0 is seti
1 is MW@home
2 is seti
example output from stderr in folder 0
setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 480, 1503 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 15
clockRate = 810000
Device 2: GeForce GTX 470, 1248 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 14
clockRate = 810000
Device 3: GeForce GTX 470, 1248 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 14
clockRate = 810000
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 480 is okay
SETI@home using CUDA accelerated device GeForce GTX 480
Priority of process raised successfully
Priority of worker thread raised successfully
size 8 fft, is a freaky powerspectrum
size 16 fft, is a cufft plan
size 32 fft, is a cufft plan
size 64 fft, is a cufft plan
size 128 fft, is a cufft plan
size 256 fft, is a freaky powerspectrum
size 512 fft, is a freaky powerspectrum
size 1024 fft, is a freaky powerspectrum
size 2048 fft, is a cufft plan
size 4096 fft, is a cufft plan
size 8192 fft, is a cufft plan
size 16384 fft, is a cufft plan
size 32768 fft, is a cufft plan
size 65536 fft, is a cufft plan
size 131072 fft, is a cufft plan
) _ _ _)_ o _ _
(__ (_( ) ) (_( (_ ( (_ (
not bad for a human... _)
Multibeam x32f Preview, Cuda 3.0
Work Unit Info:
...............
WU true angle range is : 0.423779
will carry on looking at the others..
OK - I realise that each slot represents a task in progress - how can I link this to one of the tasks that took 57 mins?
I presume that will be in the client_state xml? What do I need to look for to identify one of the long running tasks in the client_state xml?
____________
|
|
|
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 44,526,287 RAC: 61,661

|
|
Not sure if I am going down the right road here, but taking the name of a task that took 56:53 mins, I then searched the client_state.xml and below is every reference I found for that task....
task = 01jn10ac.760.68450.16.10.141
<file_info>
<name>01jn10ac.760.68450.16.10.141</name>
<nbytes>375194.000000</nbytes>
<max_nbytes>0.000000</max_nbytes>
<md5_cksum>9eba64fc7c47fbe267c18ee2ffcf4c11</md5_cksum>
<status>1</status>
<url>http://boinc2.ssl.berkeley.edu/sah/download_fanout/97/01jn10ac.760.68450.16.10.141</url>
</file_info>
<file_info>
<name>01jn10ac.760.68450.16.10.141_0_0</name>
<nbytes>23643.000000</nbytes>
<max_nbytes>65536.000000</max_nbytes>
<md5_cksum>c8b1eca16e1cb3faa0f36919b32ed4ac</md5_cksum>
<generated_locally/>
<status>1</status>
<upload_when_present/>
<url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url>
<persistent_file_xfer>
<num_retries>0</num_retries>
<first_request_time>1283370201.441504</first_request_time>
<next_request_time>1283370201.441504</next_request_time>
<time_so_far>0.000000</time_so_far>
<last_bytes_xferred>0.000000</last_bytes_xferred>
</persistent_file_xfer>
<signed_xml>
<name>01jn10ac.760.68450.16.10.141_0_0</name>
<generated_locally/>
<upload_when_present/>
<max_nbytes>65536</max_nbytes>
<url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url>
</signed_xml>
<xml_signature>
c0700120c10e4b2b270e774536f4ef042e8e6856f68f47729bda526b73ad4c16
6b499873b789a5f25cc9e8d79a85943af1a78a177f05418a23b3abfac5e0a6e2
9cf00bbb0128e3ca2250641f4e3f45e05b9d1366ea2f06d3a2e396cf50b964b6
48e8b94b8a4f13d6551742cbb220b86215531266d4ff82bf066399a5a9fb5152
<workunit>
<name>01jn10ac.760.68450.16.10.141</name>
<app_name>setiathome_enhanced</app_name>
<version_num>608</version_num>
<rsc_fpops_est>4082840316064.331500</rsc_fpops_est>
<rsc_fpops_bound>40828403160643.312000</rsc_fpops_bound>
<rsc_memory_bound>33554432.000000</rsc_memory_bound>
<rsc_disk_bound>33554432.000000</rsc_disk_bound>
<file_ref>
<file_name>01jn10ac.760.68450.16.10.141</file_name>
<open_name>work_unit.sah</open_name>
</file_ref>
</workunit>
/result>
<result>
<name>01jn10ac.760.68450.16.10.141_0</name>
<final_cpu_time>327.508500</final_cpu_time>
<final_elapsed_time>3413.123220</final_elapsed_time>
<exit_status>0</exit_status>
<state>4</state>
<platform>windows_intelx86</platform>
<version_num>608</version_num>
<plan_class>cuda</plan_class>
<fpops_cumulative>95706580000000.000000</fpops_cumulative>
<stderr_out>
< Send message
Joined: 24 Nov 06 Posts: 4059 Credit: 60,515,078 RAC: 57,317

|
|
OK, that is choosing the 480, and running a mid range task. I see no problem with execution itself, but time should be more like 8 minutes (if running a single task, or 12-15 for two at a time).
Anything else using the card or preempting Boinc ? like an AV program interfering or automatic backup or something ? Anything else using heavy CPU in task manager while the Cuda tasks are running, besides AKv8b etc ?
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin
|
|
|
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 44,526,287 RAC: 61,661

|
|
Ok - here is another - at least it isn't the 480 this time
result>
<name>01jn10ac.760.68450.16.10.91_1</name>
<final_cpu_time>324.201300</final_cpu_time>
<final_elapsed_time>3667.743784</final_elapsed_time>
<exit_status>0</exit_status>
<state>4</state>
<platform>windows_intelx86</platform>
<version_num>608</version_num>
<plan_class>cuda</plan_class>
<fpops_cumulative>95706580000000.000000</fpops_cumulative>
<stderr_out>
< Send message
Joined: 24 Nov 06 Posts: 4059 Credit: 60,515,078 RAC: 57,317

|
Either I haven't allowed enough cpu to each gpu?
(currently set to <avg_ncpus>0.111120</avg_ncpus>)
I have a quad core phenom II, the gpu's take a total of 1.00008 cpu cores and I crunch a mixture of MW@home, Einstein@home and Rosetta on the remining 3 cores, so the cpu is pretty much maxed out all the time
Just a theory then:
I think it's 'fighting' with the other projects. For testing/isolation purposes: make each multibeam instance grab a whole GPU & whole CPU core - each. see if task times improve... the 480 should be somewhere around 8 minutes for mid AR...
If they improve but not 'enough', then suspend the other projects & see what happens. My guess is that 'something' is being greedy, probably one of the other projects.
Rather than initiate a process priority war between Boinc project developers, it'sd probably be better to see if you can remedy the situation using something like 'Process Lasso' to sanitise the priorities of the culprit. 'Process explorer' would be enough to reveal what processes may be running elevated & so blocking the Cuda threads.
Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin
|
|
|
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 44,526,287 RAC: 61,661

|
|
Thanks Tim..
This machine is purely used for crunching..
I dont think they are all from the same range..
I will keep any eye on it for the next day and see how things go...
____________
|
|
|
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 44,526,287 RAC: 61,661

|
|
OK, its early days, but I think after 2 days of testing that I might have got to the bottom of my GPU workunits taking 40-50 mins.
I went down the whole reducing clock speeds route, which did diddly squat..
Watching the task manager, all 4 cpu's were maxed out 100%, and the clue came from watching the load time for the cpu to pass a wu to the gpu, on some wu's, it was taking 2 mins before I saw any sign of crunching. This coupled with some earlier comments over gpu usage levels (mine were all over the place!) made me think that the cpu was possibly the cause
Playing around with the <avg_ncpu> value, it would seem that the long workunit times was down to a cpu bottleneck, one cpu couldn't keep the gpu's fed fast enough.
After playing around with multiple wu/card combo's I have set the following
<avg_ncpu>0.223333</avg_ncpu>
This means I have 2 cpu's crunching and 2 feeding the gpu's
I lose an additional processor, but in 12 hrs I have not had a wu take longer than 28:49 :-) at stock clocks
Task manager is showing average cpu usage of around 92-95%, that is with 2 cpu cores crunching and 2 feeding the gpu's which are running 3 wu's each
My gpu usage is also steady at around 90% though it does dip to the 60% mark every now and then
This means that I have slightly more playing around with the <avg_ncpu> value as there is a 5% headroom, which is wasted at the moment.
I will of course be ramping the clocks on the gpu right back up to the 800's as soon as I can..
Thank god for that - it was really bugging me!
____________
|
|
|
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 44,526,287 RAC: 61,661

|
|
Might have spoken too soon
Im seeing my gpu usage drop back down for extended periods, even though everything seems to be OK
Is this normal for 3 cards running 3 wu's per card???
____________
|
|
|
jason_gee Volunteer developer Volunteer tester
 Send message
Joined: 24 Nov 06 Posts: 4059 Credit: 60,515,078 RAC: 57,317

|
|
Not unless you're running out of work, or something's interfering. My 480 prefers 2 tasks at a time, so maybe reducing to 2 per GPU would help. If not, something is interfering like some other process or project. If you can't find what, during a period where you catch it, PM me a HiJackThis log taken during that period.
Also double check the CPU isn't downclocking, with CPU-Z.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin
|
|
|