Concerned over cudu wu completion times??

Author	Message
MadMaC Volunteer tester Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0	Message 1030015 - Posted: 1 Sep 2010, 19:09:36 UTC Last modified: 1 Sep 2010, 19:12:52 UTC Is this normal or a symptom of a problem? Im seeing some cuda wu's take an hour to complete! I have no VLAR or VHARs assigned to the gpu GPU 1 = gtx480 GPU 2&3 = gtx 470 They were clocked alot higher, but I dropped off the clocks as temps were getting a bit high Is there an easy way to find out which card is crunching the longer units, to see if it the same card each time? GPU usage is in general around 85-90% on average.. ID: 1030015 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1030017 - Posted: 1 Sep 2010, 19:18:34 UTC You may have been swamped with a bunch of VLAR resends like I was after the prior outage was over. probably ghosts from a while back timing out, since new tasks don't issue VLARs to GPU )well not suppsoed to anyway ). Here, I clock the 480 @ 801MHz, 'Normal' mid angle ranges take 12-13 mins running 2 at a time. When VLARs hit they were more like more than an hour for two. I decided to just leave the machine to munch through them, since it could handle them OK ... Not much fun, though I seem to be through the worst of them now ;) Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1030017 ·

MadMaC Volunteer tester Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0	Message 1030019 - Posted: 1 Sep 2010, 19:32:17 UTC - in response to Message 1030017. You may have been swamped with a bunch of VLAR resends like I was after the prior outage was over. probably ghosts from a while back timing out, since new tasks don't issue VLARs to GPU )well not suppsoed to anyway ). Here, I clock the 480 @ 801MHz, 'Normal' mid angle ranges take 12-13 mins running 2 at a time. When VLARs hit they were more like more than an hour for two. I decided to just leave the machine to munch through them, since it could handle them OK ... Not much fun, though I seem to be through the worst of them now ;) Jason Using Fred's rescheduler, it is showing me as having 14 VLAR's all assigned to the cpu? I assumed that meant that I had no VLAR's on the gpu? I had both my 470's clocked to 875 @ 1037mV, but the 480 won't go above 750 without producing errors ID: 1030019 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1030023 - Posted: 1 Sep 2010, 19:44:11 UTC - in response to Message 1030019. Last modified: 1 Sep 2010, 19:47:26 UTC Using Fred's rescheduler, it is showing me as having 14 VLAR's all assigned to the cpu? I assumed that meant that I had no VLAR's on the gpu? I had both my 470's clocked to 875 @ 1037mV, but the 480 won't go above 750 without producing errors Fair enough. msi afterburner doesn't show downclocking in your image either, and you're using the newer driver. With that many cards running I would back off to 2 tasks per card if running x32f, but that shouldn't account for that much difference either, unless the PCIe bus is under heavy contention or something. You can look in the slot directories for stderr output to see the angle range & selected GPU for tasks in progress. client_state holds stderr text contents ready for upload/report IIRC. Otherwise, if none of that reveals anything obvious, I'd do just what you're doing, winding things back to stock & check everything bit by bit. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1030023 ·

MadMaC Volunteer tester Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0	Message 1030029 - Posted: 1 Sep 2010, 20:04:56 UTC - in response to Message 1030023. Last modified: 1 Sep 2010, 20:17:06 UTC OK, cheers for that - will see what I can find.. edit I have 18 slot folders!! Folder 0 is seti 1 is MW@home 2 is seti example output from stderr in folder 0 setiathome_CUDA: Found 3 CUDA device(s): Device 1: GeForce GTX 480, 1503 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 15 clockRate = 810000 Device 2: GeForce GTX 470, 1248 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 14 clockRate = 810000 Device 3: GeForce GTX 470, 1248 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 14 clockRate = 810000 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 480 is okay SETI@home using CUDA accelerated device GeForce GTX 480 Priority of process raised successfully Priority of worker thread raised successfully size 8 fft, is a freaky powerspectrum size 16 fft, is a cufft plan size 32 fft, is a cufft plan size 64 fft, is a cufft plan size 128 fft, is a cufft plan size 256 fft, is a freaky powerspectrum size 512 fft, is a freaky powerspectrum size 1024 fft, is a freaky powerspectrum size 2048 fft, is a cufft plan size 4096 fft, is a cufft plan size 8192 fft, is a cufft plan size 16384 fft, is a cufft plan size 32768 fft, is a cufft plan size 65536 fft, is a cufft plan size 131072 fft, is a cufft plan ) _ _ _)_ o _ _ (__ (_( ) ) (_( (_ ( (_ ( not bad for a human... _) Multibeam x32f Preview, Cuda 3.0 Work Unit Info: ............... WU true angle range is : 0.423779 will carry on looking at the others.. OK - I realise that each slot represents a task in progress - how can I link this to one of the tasks that took 57 mins? I presume that will be in the client_state xml? What do I need to look for to identify one of the long running tasks in the client_state xml? ID: 1030029 ·

MadMaC Volunteer tester Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0	Message 1030037 - Posted: 1 Sep 2010, 20:32:18 UTC Not sure if I am going down the right road here, but taking the name of a task that took 56:53 mins, I then searched the client_state.xml and below is every reference I found for that task.... task = 01jn10ac.760.68450.16.10.141 <file_info> <name>01jn10ac.760.68450.16.10.141</name> <nbytes>375194.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <md5_cksum>9eba64fc7c47fbe267c18ee2ffcf4c11</md5_cksum> <status>1</status> <url>http://boinc2.ssl.berkeley.edu/sah/download_fanout/97/01jn10ac.760.68450.16.10.141</url> </file_info> <file_info> <name>01jn10ac.760.68450.16.10.141_0_0</name> <nbytes>23643.000000</nbytes> <max_nbytes>65536.000000</max_nbytes> <md5_cksum>c8b1eca16e1cb3faa0f36919b32ed4ac</md5_cksum> <generated_locally/> <status>1</status> <upload_when_present/> <url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url> <persistent_file_xfer> <num_retries>0</num_retries> <first_request_time>1283370201.441504</first_request_time> <next_request_time>1283370201.441504</next_request_time> <time_so_far>0.000000</time_so_far> <last_bytes_xferred>0.000000</last_bytes_xferred> </persistent_file_xfer> <signed_xml> <name>01jn10ac.760.68450.16.10.141_0_0</name> <generated_locally/> <upload_when_present/> <max_nbytes>65536</max_nbytes> <url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url> </signed_xml> <xml_signature> c0700120c10e4b2b270e774536f4ef042e8e6856f68f47729bda526b73ad4c16 6b499873b789a5f25cc9e8d79a85943af1a78a177f05418a23b3abfac5e0a6e2 9cf00bbb0128e3ca2250641f4e3f45e05b9d1366ea2f06d3a2e396cf50b964b6 48e8b94b8a4f13d6551742cbb220b86215531266d4ff82bf066399a5a9fb5152 <workunit> <name>01jn10ac.760.68450.16.10.141</name> <app_name>setiathome_enhanced</app_name> <version_num>608</version_num> <rsc_fpops_est>4082840316064.331500</rsc_fpops_est> <rsc_fpops_bound>40828403160643.312000</rsc_fpops_bound> <rsc_memory_bound>33554432.000000</rsc_memory_bound> <rsc_disk_bound>33554432.000000</rsc_disk_bound> <file_ref> <file_name>01jn10ac.760.68450.16.10.141</file_name> <open_name>work_unit.sah</open_name> </file_ref> </workunit> /result> <result> <name>01jn10ac.760.68450.16.10.141_0</name> <final_cpu_time>327.508500</final_cpu_time> <final_elapsed_time>3413.123220</final_elapsed_time> <exit_status>0</exit_status> <state>4</state> <platform>windows_intelx86</platform> <version_num>608</version_num> <plan_class>cuda</plan_class> <fpops_cumulative>95706580000000.000000</fpops_cumulative> <stderr_out> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 3 CUDA device(s): Device 1: GeForce GTX 480, 1503 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 15 clockRate = 810000 Device 2: GeForce GTX 470, 1248 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 14 clockRate = 810000 Device 3: GeForce GTX 470, 1248 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 14 clockRate = 810000 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 480 is okay SETI@home using CUDA accelerated device GeForce GTX 480 Priority of process raised successfully Priority of worker thread raised successfully size 8 fft, is a freaky powerspectrum size 16 fft, is a cufft plan size 32 fft, is a cufft plan size 64 fft, is a cufft plan size 128 fft, is a cufft plan size 256 fft, is a freaky powerspectrum size 512 fft, is a freaky powerspectrum size 1024 fft, is a freaky powerspectrum size 2048 fft, is a cufft plan size 4096 fft, is a cufft plan size 8192 fft, is a cufft plan size 16384 fft, is a cufft plan size 32768 fft, is a cufft plan size 65536 fft, is a cufft plan size 131072 fft, is a cufft plan ) _ _ _)_ o _ _ (__ (_( ) ) (_( (_ ( (_ ( not bad for a human... _) Multibeam x32f Preview, Cuda 3.0 Work Unit Info: ............... WU true angle range is : 0.419844 Flopcounter: 33576273578419.793000 Spike count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 called boinc_finish ID: 1030037 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1030041 - Posted: 1 Sep 2010, 20:41:24 UTC Last modified: 1 Sep 2010, 20:42:49 UTC OK, that is choosing the 480, and running a mid range task. I see no problem with execution itself, but time should be more like 8 minutes (if running a single task, or 12-15 for two at a time). Anything else using the card or preempting Boinc ? like an AV program interfering or automatic backup or something ? Anything else using heavy CPU in task manager while the Cuda tasks are running, besides AKv8b etc ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1030041 ·

MadMaC Volunteer tester Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0	Message 1030044 - Posted: 1 Sep 2010, 20:50:39 UTC - in response to Message 1030041. Ok - here is another - at least it isn't the 480 this time result> <name>01jn10ac.760.68450.16.10.91_1</name> <final_cpu_time>324.201300</final_cpu_time> <final_elapsed_time>3667.743784</final_elapsed_time> <exit_status>0</exit_status> <state>4</state> <platform>windows_intelx86</platform> <version_num>608</version_num> <plan_class>cuda</plan_class> <fpops_cumulative>95706580000000.000000</fpops_cumulative> <stderr_out> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 3 CUDA device(s): Device 1: GeForce GTX 480, 1503 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 15 clockRate = 810000 Device 2: GeForce GTX 470, 1248 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 14 clockRate = 810000 Device 3: GeForce GTX 470, 1248 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 14 clockRate = 810000 setiathome_CUDA: CUDA Device 2 specified, checking... Device 2: GeForce GTX 470 is okay SETI@home using CUDA accelerated device GeForce GTX 470 Priority of process raised successfully Priority of worker thread raised successfully size 8 fft, is a freaky powerspectrum size 16 fft, is a cufft plan size 32 fft, is a cufft plan size 64 fft, is a cufft plan size 128 fft, is a cufft plan size 256 fft, is a freaky powerspectrum size 512 fft, is a freaky powerspectrum size 1024 fft, is a freaky powerspectrum size 2048 fft, is a cufft plan size 4096 fft, is a cufft plan size 8192 fft, is a cufft plan size 16384 fft, is a cufft plan size 32768 fft, is a cufft plan size 65536 fft, is a cufft plan size 131072 fft, is a cufft plan ) _ _ _)_ o _ _ (__ (_( ) ) (_( (_ ( (_ ( not bad for a human... _) Multibeam x32f Preview, Cuda 3.0 Work Unit Info: ............... WU true angle range is : 0.419844 Flopcounter: 33576273578467.793000 Spike count: 2 Pulse count: 1 Triplet count: 0 Gaussian count: 0 called boinc_finish I cant see any heavy processes running, but maybe it could be one of two things? Either I haven't allowed enough cpu to each gpu? (currently set to <avg_ncpus>0.111120</avg_ncpus>) I have a quad core phenom II, the gpu's take a total of 1.00008 cpu cores and I crunch a mixture of MW@home, Einstein@home and Rosetta on the remining 3 cores, so the cpu is pretty much maxed out all the time ID: 1030044 ·

Tim Norton Volunteer tester Send message Joined: 2 Jun 99 Posts: 835 Credit: 33,540,164 RAC: 0	Message 1030047 - Posted: 1 Sep 2010, 21:00:56 UTC - in response to Message 1030037. It might be a combination of wu variability and you were possibly using your pc for something else at the time the longer running units were being processed looking back over the last 5 days worth of GPU crunching on one of my 260's i have a variation of 1:19 through to 23:20 elapsed time - usually between 12 and 17 minutes if you are crunching more than one wu at once of the "longer" running wu then there is an additional over head if you also look at the names of the wu that are long runners you can see (your screen shot) they are all from the same "batch" i usually see batches running very similar times i suspect that you have just got a batch of long runners If you use BoincTasks you could order the wu by completed time and see if they are the latest run wu or not -if not i suspect you do not have a problem anyway just a few ideas and observations Tim ID: 1030047 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1030051 - Posted: 1 Sep 2010, 21:13:26 UTC - in response to Message 1030044. Either I haven't allowed enough cpu to each gpu? (currently set to <avg_ncpus>0.111120</avg_ncpus>) I have a quad core phenom II, the gpu's take a total of 1.00008 cpu cores and I crunch a mixture of MW@home, Einstein@home and Rosetta on the remining 3 cores, so the cpu is pretty much maxed out all the time Just a theory then: I think it's 'fighting' with the other projects. For testing/isolation purposes: make each multibeam instance grab a whole GPU & whole CPU core - each. see if task times improve... the 480 should be somewhere around 8 minutes for mid AR... If they improve but not 'enough', then suspend the other projects & see what happens. My guess is that 'something' is being greedy, probably one of the other projects. Rather than initiate a process priority war between Boinc project developers, it'sd probably be better to see if you can remedy the situation using something like 'Process Lasso' to sanitise the priorities of the culprit. 'Process explorer' would be enough to reveal what processes may be running elevated & so blocking the Cuda threads. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1030051 ·

MadMaC Volunteer tester Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0	Message 1030052 - Posted: 1 Sep 2010, 21:14:09 UTC Thanks Tim.. This machine is purely used for crunching.. I dont think they are all from the same range.. I will keep any eye on it for the next day and see how things go... ID: 1030052 ·

MadMaC Volunteer tester Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0	Message 1030415 - Posted: 3 Sep 2010, 11:26:50 UTC OK, its early days, but I think after 2 days of testing that I might have got to the bottom of my GPU workunits taking 40-50 mins. I went down the whole reducing clock speeds route, which did diddly squat.. Watching the task manager, all 4 cpu's were maxed out 100%, and the clue came from watching the load time for the cpu to pass a wu to the gpu, on some wu's, it was taking 2 mins before I saw any sign of crunching. This coupled with some earlier comments over gpu usage levels (mine were all over the place!) made me think that the cpu was possibly the cause Playing around with the <avg_ncpu> value, it would seem that the long workunit times was down to a cpu bottleneck, one cpu couldn't keep the gpu's fed fast enough. After playing around with multiple wu/card combo's I have set the following <avg_ncpu>0.223333</avg_ncpu> This means I have 2 cpu's crunching and 2 feeding the gpu's I lose an additional processor, but in 12 hrs I have not had a wu take longer than 28:49 :-) at stock clocks Task manager is showing average cpu usage of around 92-95%, that is with 2 cpu cores crunching and 2 feeding the gpu's which are running 3 wu's each My gpu usage is also steady at around 90% though it does dip to the 60% mark every now and then This means that I have slightly more playing around with the <avg_ncpu> value as there is a 5% headroom, which is wasted at the moment. I will of course be ramping the clocks on the gpu right back up to the 800's as soon as I can.. Thank god for that - it was really bugging me! ID: 1030415 ·

MadMaC Volunteer tester Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0	Message 1030422 - Posted: 3 Sep 2010, 13:11:58 UTC Might have spoken too soon Im seeing my gpu usage drop back down for extended periods, even though everything seems to be OK Is this normal for 3 cards running 3 wu's per card??? ID: 1030422 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1030462 - Posted: 3 Sep 2010, 16:16:29 UTC - in response to Message 1030422. Not unless you're running out of work, or something's interfering. My 480 prefers 2 tasks at a time, so maybe reducing to 2 per GPU would help. If not, something is interfering like some other process or project. If you can't find what, during a period where you catch it, PM me a HiJackThis log taken during that period. Also double check the CPU isn't downclocking, with CPU-Z. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1030462 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.