Message boards :
Number crunching :
New AstroPulse for GPU ( ATi & NV) released (r1316)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next
Author | Message |
---|---|
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
My 9800GTX+ was still using almost 100% CPU with 301.24 & 302.59 drivers and the r1305 app (but i was purposly leaving a core free), not tried the r1316 app yet on it, will try it once my present Cuda testing is complete, Claggy |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
I did the same, leaving 1 'thread' free which ups the GPU load by 5-7%, also the 2nd has an even higher load as the first!? 98%-99% doing 2 MB wus. This was with MB work, but noticed the same with AstroPulse work. Unfortunatly no AstroPulse tasks available. In the mean time, also no 610 work, I did some MW work, with a resource share of 650 (SETI) and 75(MW) it's OK. AMD ATI 5870 GPUs But it crashed (heat?) don't know, I immediatly received an survey from HP as to what has happened and wanted to help?! Now doing 4 610 tasks on the GPUs. But some, better alot, AsroPulse work would be nice since the rev.1316, which really is faster and 2 results were validated, out of 2. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Please stay on topic here. With each crime and every kindness we birth our future. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
With 275.33 NVIDIA driver and one APv6 6.04 (r1316) application/work unit on my GPU one CPU-thread had ~ 10 % usage/support. As I changed to 2 work units/GPU CPU-thread usage/support increased to ~ 40 - 50 % for each application/work unit. I guess this OpenCL BUG is not happen (or not so big) if only one work unit on my system (Intel Core2 Duo E7600 with NVIDIA GeForce GTX260-216 O/C, WinXP 32bit). * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Yes Sutaru. We figured that pre Fermi cards are not that much affected. Maybe its because your 260 has much more compute units than 460 for example. But thats only my imagination. With each crime and every kindness we birth our future. |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
Ah, so maybe that's the key? I'm only running one WU at a time on the GTX 260 Core 216 as well - it seems that multiple simultaneous WUs don't scale well unless on Fermi or later cards. For my Fermi and Kepler cards, I'm running two at a time. And yes, I also noticed that the pre-Fermi G200-era GPU designs have several more compute units than Fermi and Kepler. Different architecture, different implementation. Edit: This is interesting... with the current lack of AP WUs, one of my Fermi cards received only one AP WU to work on... even though it was processing a MB/CUDA WU simultaneously, the AP WU it was working on exhibited lower CPU usage similar to that seen on the pre-Fermi card. So perhaps the contributing factor to the high CPU usage isn't the GPU architecture, but rather how many OpenCL tasks you try to run on it simultaneously. MB/CUDA WUs don't appear to suffer high CPU usage. Soli Deo Gloria |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
it seems that multiple simultaneous WUs don't scale well unless on Fermi or later cards. Yes, that's the way they were built. Tasks don't actually run 'simultaneously' (any more than they do on a single-core CPU under a multi-tasking operating system). The hardware is switched from task to task on a scale of milliseconds, perhaps even microseconds. Fermis and later have specialised silicon to handle this task-switching at high speed: earlier GPUs don't. |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
Ah, thanks for the clarification. I thought the workload would be split among the numerous processing cores on the GPU, so clearly I was mistaken there. Well, it would seem, then, that for OpenCL tasks at least, this 'high-speed switching' consumes excessive CPU resources with the current NV drivers. Soli Deo Gloria |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
I made 2 posts about current situation with driver support for OpenCL on both vendors forums recently: I thought I'd do a little light reading tonite and check out the Nvidia thread mentioned above (since all I run are their cards), and got this notice when I went to their site: Posted July 12, 2012 NVIDIA suspended operations today of the NVIDIA Developer Zone (developer.nvidia.com). We did this in response to attacks on the site by unauthorized third parties who may have gained access to hashed passwords. We are investigating this matter and working around the clock to ensure that secure operations can be restored. As a precautionary measure, we strongly recommend that you change any identical passwords that you may be using elsewhere. NVIDIA does not request sensitive information by email. Do not provide personal, financial or sensitive information (including new passwords) in response to any email purporting to be sent by an NVIDIA employee or representative. We will post updates about this matter here. For any questions, email us at devzoneupdate@nvidia.com. For technical support, go to www.nvidia.com/support. Bummer. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Yes, I recived E-mail with same message, NV was hacked recently. Someone tired from their bugs perhaps ;) Well, last reports imply change in latest NV drivers actually. When I reproduced this bug on my GPU I saw CPU usage increase on GTX 260, signle MB task. So, apparently something was changed since then. Will update to latest drivers from rock-stable 263.06 (no CPU usage bug there at all, even if 2 AP tasks running) and check what we really have now. EDIT: As our inner investigation starts to show GPU-host synching very complex matter, not only on NV but on ATi hardware too. CPU time increase on low end ATi GPUs with r1316 over r1305 I attribute now to change in synching mode inside APP runtime when wait time increases. Will do post with these observations on ATi forums and post link here. So one who interesting in topic could follow discussion with ATi specialists (in case they bother to answer of course). |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Here is the link to the synchronization issue discussion on AMD site: http://devgurus.amd.com/message/1282663#1282663 |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
I don't fully understand all the details of that, but I think I get the gist of it. I suppose the lower end GPUs have different implementations which are more sensitive to the differences in the two algorithms you tried. AP WUs are being (slowly) distributed again. Hopefully I can get more testing done, provided the blanking percentages don't vary too wildly (85+% blanking is really horrible, especially on slower CPUs). Soli Deo Gloria |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I don't fully understand all the details of that, but I think I get the gist of it. I suppose the lower end GPUs have different implementations which are more sensitive to the differences in the two algorithms you tried. Not quite... low-end GPUs just manifest it very noticeable. Mike did test on his quite fast GPU - he got 4s CPU time vs 10s CPU time... Values are low so more testing will be needed, but difference is very big in relative scale... From the other side, I think it's some switch between 2 sync modes inside driver itself, depending on time to wait. If it's true high-end GPU could still not pass threshold for such switch and not show so big increase in CPU time... will see, more testing needed. |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
Edit: I only just saw your private message (after making the below post). I feel pretty stupid now. x.x --- <GLaDOS>Continue testing...</GLaDOS> Ten seconds, wow. What card is that, out of curiosity? There were no instructions so it took me a while, but I finally managed to work out how to run your dummy work-unit in stand-alone mode and here are my results. I kept parameters the same across all executions (for the same GPU) and also made sure that the binary caches were built before making any timings (so run-time should be just the actual GPU processing and not skewed by the time for the CPU to create those caches). I'm including CPUs in this summary for reference though I don't expect it should have too great an influence on the times. Intel Core 2 Q9550 + Radeon HD 6950 r555: 0:46 r1305: 0:50 r1316: 0:43 Athlon 64 X2 6400+ + Radeon HD 5670 r555: 3:24 r1305: 3:10 r1316: 2:47 Pentium 4 HT 3.06 + Radeon HD 4670 r555: 5:54 r1305: 6:01 r1316: 4:03 Fusion C-50 (Radeon HD 6250) r555: 10:36 r1305: 9:11 r1316: 8:42 Not sure what to make of these trends, but hopefully they'll be of some use to you. Maybe. I suppose only one kind of work-unit doesn't reflect performance increase or decrease as much as we'd like - I'm not sure if the differences are significant enough to overcome the margin of error. Soli Deo Gloria |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
All right, tried it again with the APbench tool... Intel Core 2 Q9550 + Radeon HD 6950 AP6_win_x86_SSE2_OpenCL_ATI_r1305.exe -unroll 11 : Elapsed 49.028 secs CPU 20.734 secs AP6_win_x86_SSE2_OpenCL_ATI_r1316.exe -unroll 11 : Elapsed 42.324 secs CPU 18.938 secs AP6_win_x86_SSE2_OpenCL_ATI_r555.exe -unroll 11 : Elapsed 45.582 secs CPU 16.375 secs Athlon 64 X2 6400+ + Radeon HD 5670 AP6_win_x86_SSE2_OpenCL_ATI_r1305.exe -unroll 5 : Elapsed 191.719 secs CPU 26.016 secs AP6_win_x86_SSE2_OpenCL_ATI_r1316.exe -unroll 5 : Elapsed 169.219 secs CPU 14.766 secs AP6_win_x86_SSE2_OpenCL_ATI_r555.exe -unroll 5 : Elapsed 210.141 secs CPU 34.609 secs Pentium 4 HT 3.06 + Radeon HD 4670 AP6_win_x86_SSE2_OpenCL_ATI_r1305.exe -unroll 4 : Elapsed 695.047 secs CPU 188.047 secs AP6_win_x86_SSE2_OpenCL_ATI_r1316.exe -unroll 4 : Elapsed 513.266 secs CPU 88.875 secs AP6_win_x86_SSE2_OpenCL_ATI_r555.exe -unroll 4 : Elapsed 711.469 secs CPU 204.234 secs Fusion C-50 (Radeon HD 6250) AP6_win_x86_SSE2_OpenCL_ATI_r1305.exe -unroll 2 : Elapsed 1379.849 secs CPU 511.418 secs AP6_win_x86_SSE2_OpenCL_ATI_r1316.exe -unroll 2 : Elapsed 969.014 secs CPU 558.187 secs AP6_win_x86_SSE2_OpenCL_ATI_r555.exe -unroll 2 : Elapsed 983.876 secs CPU 575.191 secs The times seem to confirm my manual testing for the mid-range and high-end cards - not sure what the script does differently that the other two GPUs took twice as long compared with my manual testing. But the results should still be relative to each other here. Soli Deo Gloria |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Thanks for testing 1) strange result for very first GPU, r555 looks faster %) [C-50 - the same...] 2) results for C-50 resemble resuls from my own C-60, looks like it likes r1305 more. 3) "twice as long" - noidea, it's worth to understand why. So could you upload full TestData directory archived somewhere? Or if you prefer just E-mail me archive. I will upload modified version soon then post link here.It should have same performance as r1305 & r1316 but is able to switch between different modes of execution via command line switch. EDIT: btw, for C-60 I reciving very inconsistent results. Time fluctuations are big. So it's worth to make few copies of the same task and run them all orjust rep[eat run few times to get estimation of error range. BTW, did you test with CPU free or CPU was busy with BOINC tasks ? |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
No worries.
Soli Deo Gloria |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Wedge. Offline tests should always be made under same conditions. That means always CPU busy or idle. Specially for speed comparison boinc should be turned off. With each crime and every kindness we birth our future. |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
Raistmer, I did another series of tests and I'm satisfied with these results so I won't be doing any more testing until the next build you want tested. These times are average of three runs, all tests had BOINC fully suspended and all parameters (on a given host) were kept the same throughout. The variances in times seems to be no more than about 5% for all tests - in most cases even less than that (this includes the C-50) - so I'm confident these times are accurate for my particular hosts. HD 6950 (Cayman) r555: 46.386 / 16.318 r1305: 48.799 / 20.490 r1316: 42.653 / 18.979 HD 5670 (Redwood) r555: 200.969 / 30.463 r1305: 185.953 / 23.380 r1316: 164.547 / 13.104 HD 4670 (R730) r555: 588.982 / 105.443 r1305: 599.636 / 109.239 r1316: 490.278 / 40.162 Fusion C-50 (HD 6250 - Wrestler) r555: 960.101 / 573.803 r1305: 865.306 / 549.155 r1316: 778.431 / 589.679 In conclusion, I still think r1316 is the overall winner out of these three versions across a broad range of GPU architectures. About the only disadvantage I can see is an increase in CPU usage on the C-50, even though it produced the shortest overall run-times. Now, I'm curious to know how GCN compares... x.x I'm still limited to WinXP / Catalyst 12.1 at the moment, so no HD 7000-series for me just yet. Soli Deo Gloria |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Thanks! Some slowdown of r1305 on fast GPU can be from defaults change. Default FFA block size was decreased vs r555. If you want to change this try to add -ffa_block N and -ffa_block_fetch N params (where N should be the same for all revisions of course). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.