Are we ATI Guys being got at?

Message boards : Number crunching : Are we ATI Guys being got at?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14532
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1095400 - Posted: 9 Apr 2011, 20:40:17 UTC - in response to Message 1095396.  

Well, yu use 4 GPUs i single host... AMD's OpenCL runtime still has troubles with such configs. Try either update to recently released SDK 2.4 or try to find info (I can't recall exact wording now) about special environment variable that change synching stype for ATI OpenCL runtime.
[EDIT: Morten use many GPUs in single host too and AFAIK found that setting helpful, look/seqarch into lunatics ATi-related threads]

The environment variable I set in for better usage of both GPUs in HD 5970 is "GPU_USE_SYNC_OBJECTS=1". This enables about 85% GPU usage for both GPUs, opposed to very low and erratic usage without the setting.

This is environment variable is undocumented, and can therefore be removed by AMD in a future release.

That sounds very similar to the SWAN_SYNC=0 environment variable used by GPUGrid to invoke polling mode. In their case, the increased GPU usage comes at the expense of increased CPU usage too.

jravin seems to have got the increased CPU usage already, and (as mentioned in message 1095331) 80%-95% GPU usage.

Err, you couldn't have set the environment deliberately at some point in the past, could you?

Morten, what CPU usage are you getting?
ID: 1095400 · Report as offensive
Morten Ross
Volunteer tester
Avatar

Send message
Joined: 30 Apr 01
Posts: 183
Credit: 385,664,915
RAC: 0
Norway
Message 1095415 - Posted: 9 Apr 2011, 21:10:04 UTC - in response to Message 1095400.  
Last modified: 9 Apr 2011, 21:20:20 UTC

I'm currently running 2 tasks per GPU, and most tasks are around 2% cpu utilization, while about 3 tasks are around 6-7% cpu utilization.

EDIT: looking at this for a while, it seems to level out at all tasks around 5%, and 3 out of 8 CPU cores are fully servicing these tasks.
Morten Ross
ID: 1095415 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1095520 - Posted: 10 Apr 2011, 0:01:40 UTC

Well, with my 4 HD 5550s, I am getting the following: (all AP, but I believe applies to MB also): 4 tasks running, CPU 100%; 3, 75%; 2, 50% and 1 task (WU) running, CPU = 25%. That is on a quad core 3GHz Phenom II processor. So 1 GPU requires 1 CPU. Gah!

Interestingly, it takes several hours (4 - 8) to run an AP, so I am getting roughly 100 credits/hour, or about 2000-2500 a day. About 1/2 what my GT 240s get. This is with BOINC saying GT 240 - max 257 GFlops, HD 5550 - max 352 GFlops.

So still no solution to my "overCPUing"; I will try backing off to Cat 11.2 tomorrow; right now, I have to make sure 3 old Athlon dual MP macines are working, as I am giving them away tomorrow morning...
ID: 1095520 · Report as offensive
JLConawayII

Send message
Joined: 2 Apr 02
Posts: 188
Credit: 2,840,460
RAC: 0
United States
Message 1095521 - Posted: 10 Apr 2011, 0:23:56 UTC

I was thrilled to get 486 WUs from MW@home. I thought, finally they got rid of the core-based limitation. Then the project went *splat*, and I was sad. :o(
ID: 1095521 · Report as offensive
unnefer

Send message
Joined: 23 Jan 11
Posts: 5
Credit: 83,513
RAC: 0
Australia
Message 1095590 - Posted: 10 Apr 2011, 6:34:47 UTC - in response to Message 1095336.  
Last modified: 10 Apr 2011, 6:53:20 UTC

unnefer, your app_info and preferences are fine, you just have to wait, AP is fairly rare, you might ask for work 20 times, and get none, then get 20 AP tasks on the 21st attempt.

Ah ok thanks for that.

I just added the Multibeam 6.10 r177 gpu app as well now and at least it's downloading the multibeam enhanced GPU tasks :)

I changed the "count" to 0.5 so it would crunch 2 workunits per GPU, and for me, that seems to offer the best estimated speed per instance.

I also had an issue with it using 100% CPU usage and only ~70% GPU usage, so I changed "instances per device" to 1 and "period_iterations_num" to 4 and that dropped CPU usage down to 55% and increased GPU usage to ~94% :D

For anyone else who wanted to crunch GPU only and never quite got it working, here is my app_info.xml file for reference:
<app_info> 

<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>MB_6.10_win_SSE3_ATI_HD5_r177.exe</name>
<executable/>
</file_info>
<file_info>
<name>MultiBeam_Kernels.cl</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>610</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>0.05</max_ncpus>
<plan_class>ati13ati</plan_class>
<cmdline>-period_iterations_num 4 -instances_per_device 1</cmdline>
<flops>20987654321</flops>
<file_ref>
<file_name>MB_6.10_win_SSE3_ATI_HD5_r177.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>MultiBeam_Kernels.cl</file_name>
<copy_file/>
</file_ref>
<coproc>
<type>ATI</type>
<count>0.5</count>
</coproc>
</app_version>

<app>
<name>astropulse_v505</name>
</app>
<file_info>
<name>ap_5.06_win_x86_SSE2_OpenCL_ATI_r516.exe</name>
<executable/>
</file_info>
<file_info>
<name>AstroPulse_Kernels.cl</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v505</app_name>
<version_num>506</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>0.05</max_ncpus>
<plan_class>ati13ati</plan_class>
<cmdline>-instances_per_device 1 -hp -unroll 10 -ffa_block 4096 -ffa_block_fetch 2048</cmdline>
<flops>30987654321</flops>
<file_ref>
<file_name>ap_5.06_win_x86_SSE2_OpenCL_ATI_r516.exe</file_name>
<main_program/>                           
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels.cl</file_name>
<copy_file/>
</file_ref>
<coproc>
<type>ATI</type>
<count>0.5</count>
</coproc>
</app_version>

</app_info> 


For it to work, you'll need to download download the r177 multibeam GPU only app files from HERE and the r516 astropulse GPU only app files from HERE, then add them to your seti project folder.

Also note, I have ATI HD5850 GPUs and so I am using the HD5 version ap (MB_6.10_win_SSE3_ATI_HD5_r177.exe)for multibeam. If you don't have a HD5 series or HD69XX series GPU, or you experience errors or driver restarts, then use the default ap (MB_6.10_win_SSE3_ATI_r177.exe) for multibeam instead.

make sure you have this line in your startup messages:

09/04/2011 19:42:34 SETI@home Found app_info.xml; using anonymous platform

and that you're actually asking for GPU work,

Claggy

Cheers Claggy. yeah I was getting that message, but nothing ever downloaded. But as soon as I added the r177 multibeam GPU app it started downloading tasks and crunching :)

How long should a multibeam GPU workunit take on average using a HD5850?

Cheers,
Leslie
ID: 1095590 · Report as offensive
unnefer

Send message
Joined: 23 Jan 11
Posts: 5
Credit: 83,513
RAC: 0
Australia
Message 1095597 - Posted: 10 Apr 2011, 7:47:27 UTC - in response to Message 1095590.  
Last modified: 10 Apr 2011, 7:48:57 UTC


I changed the "count" to 0.5 so it would crunch 2 workunits per GPU, and for me, that seems to offer the best estimated speed per instance.

I also had an issue with it using 100% CPU usage and only ~70% GPU usage, so I changed "instances per device" to 1 and "period_iterations_num" to 4 and that dropped CPU usage down to 55% and increased GPU usage to ~94% :D

I made three errors with the above comments.


  1. It seems like the main thing that drops CPU usage is the "instances per device" value. When set to 1, CPU usage drops to ~27% per GPU, I have 2x GPU's, so total CPU usage ends up ~55%. When set to 2, CPU usage sat at 100% and GPU usage actually dropped down to ~70% !
  2. Changing "period_iterations_num" to 4 did not do anything. I also tried values from 1 through to 10 and again, I didn't notice anything.
  3. Even though I set "count" to 0.5 and 4 workunits looked to be crunching, because I also had "instances per device" to 1 only 1 workunit per GPU was actually crunched.



So anyhoo, if anyone knows how to decrease CPU usage so that I can run 2x instances on each GPU without topping out at 100% CPU usage, that would be great.

ID: 1095597 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1095605 - Posted: 10 Apr 2011, 8:30:27 UTC - in response to Message 1095597.  
Last modified: 10 Apr 2011, 8:34:26 UTC


I changed the "count" to 0.5 so it would crunch 2 workunits per GPU, and for me, that seems to offer the best estimated speed per instance.

I also had an issue with it using 100% CPU usage and only ~70% GPU usage, so I changed "instances per device" to 1 and "period_iterations_num" to 4 and that dropped CPU usage down to 55% and increased GPU usage to ~94% :D

I made three errors with the above comments.


  1. It seems like the main thing that drops CPU usage is the "instances per device" value. When set to 1, CPU usage drops to ~27% per GPU, I have 2x GPU's, so total CPU usage ends up ~55%. When set to 2, CPU usage sat at 100% and GPU usage actually dropped down to ~70% !
  2. Changing "period_iterations_num" to 4 did not do anything. I also tried values from 1 through to 10 and again, I didn't notice anything.
  3. Even though I set "count" to 0.5 and 4 workunits looked to be crunching, because I also had "instances per device" to 1 only 1 workunit per GPU was actually crunched.



So anyhoo, if anyone knows how to decrease CPU usage so that I can run 2x instances on each GPU without topping out at 100% CPU usage, that would be great.


Getting two Wu's to run on a device is a two part operation, the count value and the -instances per device per device both need to be changed,

The period iterations splits single longest PulseFind kernels calls, if you get a laggy GUI or driver restarts increase this value,
there is no need to change to the non HD5 version if you get driver restarts.

there is no way of lowering CPU usage by adjusting settings, the app will use what it needs,
the high CPU usage might be a byproduct of the speedups of SDK_2.4_RC1, that's why i suggested to jravin to try Cat 11.2/SDK_2.3 to see if that reduced his CPU usage,

Claggy
ID: 1095605 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 33590
Credit: 79,922,639
RAC: 80
Germany
Message 1095644 - Posted: 10 Apr 2011, 12:51:59 UTC
Last modified: 10 Apr 2011, 12:52:44 UTC

Would make sense.

New version is called APP instead of SDK

AMD parallel processing


With each crime and every kindness we birth our future.
ID: 1095644 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 57
United States
Message 1095646 - Posted: 10 Apr 2011, 13:08:32 UTC - in response to Message 1095644.  

Would make sense.

New version is called APP instead of SDK

AMD parallel processing

SDK normally stands for Software Development Kit.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1095646 · Report as offensive
unnefer

Send message
Joined: 23 Jan 11
Posts: 5
Credit: 83,513
RAC: 0
Australia
Message 1095647 - Posted: 10 Apr 2011, 13:10:44 UTC
Last modified: 10 Apr 2011, 13:15:08 UTC

np Claggy. I'm happy with a single workunit per GPU and ~55% CPU usage. The workunits are being processed pretty quick - my HD5850's complete a workunit in 30-40mins.

Interestingly, I also have a HD4870 running the HD5 app and it is working fine - it completes a workunit in 60-80mins - I didn't think it would run the HD5 app tbh, but it's working without any errors or driver restarts for now.

I'm not too bothered with the higher CPU usage and can still crunch some CPU only workunits without too much impact now.
ID: 1095647 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1095648 - Posted: 10 Apr 2011, 13:37:35 UTC - in response to Message 1095644.  

Would make sense.

New version is called APP instead of SDK

AMD parallel processing

Yep, it's called APP (Accelerated Parallel Processing) now, but it's AMD APP SDK now, instead of ATI Steam SDK:

Name: Juniper
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.1332
Version: OpenCL 1.1 AMD-APP-SDK-v2.4-rc1 (595.9)

Claggy
ID: 1095648 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1095676 - Posted: 10 Apr 2011, 15:54:48 UTC - in response to Message 1095602.  



C'mon Mark, you know me better than that, no doubt you saw the smiley after my first post! I just though we might get some interesting comments ;-)

I note that also a third ATI project went down yesterday for a while, namely Primegrid. Not a good week for ATI folks, but I expect it'll sort itself out. Anyhow these are the Seti boards, so I guess we'd better stick to Seti stuff.

But of course....
I was just poking too.
No doubt the ATI cards will come into their own on Seti as development work on the apps for them continues.
The main reason that NV has a leg up on them right now was due to the development work that NV helped with to get things rolling.
Many claim that the ATI cards are superior on other projects.
Only a matter of time until they are able to show what they can do here.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1095676 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1095703 - Posted: 10 Apr 2011, 16:53:09 UTC

@unnefer
change count back to 1 or you will get -177 error eventually.
For now you tell BOINC to run 2 instances per GPU while tell app to process only single task at once.
That is, 2 tasks launched, one processes, second sitting suspended. BOINC increase elapsed time for both so you will get incorrect run times at least or -177 errors if second will be suspended too long time.
ID: 1095703 · Report as offensive
unnefer

Send message
Joined: 23 Jan 11
Posts: 5
Credit: 83,513
RAC: 0
Australia
Message 1095994 - Posted: 11 Apr 2011, 6:23:04 UTC - in response to Message 1095703.  

@unnefer
change count back to 1 or you will get -177 error eventually.
For now you tell BOINC to run 2 instances per GPU while tell app to process only single task at once.
That is, 2 tasks launched, one processes, second sitting suspended. BOINC increase elapsed time for both so you will get incorrect run times at least or -177 errors if second will be suspended too long time.

Cheers, yeah I noticed that after I initially set up my app_info.xml file as per my original post - I have since changed that so my "count" is '1" and my "instances_per_device" is 1 :)

So now it only works on a single workunit per GPU and everything is working ok now :)


I tried to edit my earlier post and correct those couple of things but it will not let me edit it?
ID: 1095994 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 33590
Credit: 79,922,639
RAC: 80
Germany
Message 1096009 - Posted: 11 Apr 2011, 7:10:46 UTC

You can only edit it 1 hour after it was created.



With each crime and every kindness we birth our future.
ID: 1096009 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1096979 - Posted: 13 Apr 2011, 22:37:40 UTC - in response to Message 1096009.  
Last modified: 13 Apr 2011, 22:47:18 UTC

Is it beter to run 2 WU's on 1x HD5870, do have 2 and i7-2600? (rev.177?)
Gettin little late, 'summer-time', nice in the summer, not now!
Will take a look at LUNATICs and try to choose the right apps, should be no
trouble only time.
And a complete app_info.xml file, running the 0.37 Installer give the optimized apps, also wrong app for OpenCL HD5870, but will add this info.
I run BOINC 6.10.60 (64BIT), a vlar marked WU goes to CPU, but if it's an older WU
and doesn't have it, what happens? Or can r177 and a 5000 series GPU, also crunch a
vlar?
ID: 1096979 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 33590
Credit: 79,922,639
RAC: 80
Germany
Message 1096980 - Posted: 13 Apr 2011, 22:41:33 UTC - in response to Message 1096979.  

Is it beter to run 2 WU's on 1x HD5870, do have 2 and i7-2600?
Gettin little late, 'summer-time', nice in the summer, not now!
Will take a look at LUNATICs and try to choose the right apps, should be no
trouble only time.


I would say so Fred.

I run 2 on my 5850 with no problems.



With each crime and every kindness we birth our future.
ID: 1096980 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1096983 - Posted: 13 Apr 2011, 22:51:04 UTC - in response to Message 1096980.  

Well then their should be no single reason, to put these apps on their proper place.
I'll put Collatz C. on N.N.T. Sometimes one has to set his priorities ;).


ID: 1096983 · Report as offensive
Previous · 1 · 2 · 3

Message boards : Number crunching : Are we ATI Guys being got at?


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.