CPU/ATI GPU hybrid AstroPulse for Windows released


log in

Advanced search

Message boards : Number crunching : CPU/ATI GPU hybrid AstroPulse for Windows released

1 · 2 · 3 · 4 · Next
Author Message
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,227,647
RAC: 7,591
Russia
Message 945477 - Posted: 6 Nov 2009, 19:43:07 UTC

http://lunatics.kwsn.net/12-gpu-crunching/cpu-ati-gpu-hybrid-astropulse-for-windows-released.msg23108.html#msg23108

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 945508 - Posted: 6 Nov 2009, 22:43:54 UTC - in response to Message 945477.
Last modified: 7 Nov 2009, 4:09:11 UTC

very nice. I just wish I could get an AP WU once in a while

/edit
the last line of your code should be </app_info> not <app> BOINC kept kicking a syntax error back on the app_info took me a minute to notice the problem
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Profile Timi
Volunteer tester
Send message
Joined: 7 Oct 99
Posts: 25
Credit: 5,986,312
RAC: 0
Greece
Message 945618 - Posted: 7 Nov 2009, 6:38:53 UTC
Last modified: 7 Nov 2009, 6:39:21 UTC

Nice one Raistmer!!!

Let's see now, how lucky are we going to be for getting one Astropulse WU.
Haven't got one since the 8th of June 2009!!! :(
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,227,647
RAC: 7,591
Russia
Message 945641 - Posted: 7 Nov 2009, 9:55:32 UTC - in response to Message 945508.

very nice. I just wish I could get an AP WU once in a while

/edit
the last line of your code should be </app_info> not <app> BOINC kept kicking a syntax error back on the app_info took me a minute to notice the problem

Ah, thanks, I copied section from live app_info and probably took too much in copy range :)

GrandMasterD
Avatar
Send message
Joined: 14 Jan 03
Posts: 34
Credit: 27,280,159
RAC: 0
United Kingdom
Message 945771 - Posted: 7 Nov 2009, 21:44:28 UTC

Been running this overnight as I had 5 AP units saved up :D

My Q6600 @ 3.2GHZ can do an AP unit by itself in 10-12Hrs depending on usage of PC. Most take 11 hours +

With this ATI GPU app I can now do an AP unit in around 8-10 hours! Most of the AP I have completed have been done in 8~ hours!

I noticed that the way this GPU APP works is that MOST if the work is done on the CPU and uses the GPU every now and again. It will use the GPU for a short burst and then for a longer burst. IT DOES NOT USE THE GPU CONSTANTLY! In the longer burst of GPU usage I was looking at 15secs to do 0.500%!

I have also noticed that BOINC doesn't seem to see the GPU APP as using the CPU at all, it will try running 4 MB + 2 AP + 1 CUDA units and six units running on my quad brought it all to a crawl... I had to reduce the BOINC app down to 25% core usage as this then ment I had 1 MB + 2 AP + 1 Cuda app running... Everything then went swimmingly with 80-85% usage on the CPU.

I'm so glad to see ATI being used with SETI at last, if anyone EVER needs an ATI Beta Tester come to me!


Comp Specs:

Q6600 @ 3.24GHz
4GB RAM
4870 512MB
8800GT 512MB

Any questions please ask!

I will add that it did take me a while to get the app_info working correctly, I will post my app_info in a mo

Cheers for getting this going and looking forward to a whole AP being done on the GPU!

Doug
____________


Please Vist The Seti City

GrandMasterD
Avatar
Send message
Joined: 14 Jan 03
Posts: 34
Credit: 27,280,159
RAC: 0
United Kingdom
Message 945776 - Posted: 7 Nov 2009, 22:19:37 UTC

This is what I use in my APP_INFO that gets the app working for me. THIS HAS TO OVERWRITE THE AP 5.05 part that already exists in the APP_INFO.


</app>
<file_info>
<name>ap_5.05_win_x86_SSE3_BROOK_r280.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v505</app_name>
<version_num>505</version_num>
<avg_ncpus>0.1</avg_ncpus>
<max_ncpus>1</max_ncpus>
<coproc>
<type>ATI</type>
<count>0.5</count>
</coproc>
<file_ref>
<file_name>ap_5.05_win_x86_SSE3_BROOK_r280.exe</file_name>
<main_program/>
</file_ref>
</app_version>

I have changed the amount the ATI card can use to .5, I'm not sure if this has made any difference but I had a problem in the begginning which doing this seemed to fix.

Cheers,

Doug
____________


Please Vist The Seti City

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,227,647
RAC: 7,591
Russia
Message 945878 - Posted: 8 Nov 2009, 6:42:14 UTC - in response to Message 945771.
Last modified: 8 Nov 2009, 6:42:48 UTC

IT DOES NOT USE THE GPU CONSTANTLY!

Yes, that's why it called CPU/GPU hybrid app. Technically, it does FFA on GPU.


I had to reduce the BOINC app down to 25% core usage as this then ment I had 1 MB + 2 AP + 1 Cuda app running... Everything then went swimmingly with 80-85% usage on the CPU.

So you did just very same thing I advised against.
You turned to setup with <100% CPU load, this will probably drops host performance instead of increasing it. CPU should be overcommitted to recive noticeable performance benefit.
Unfortunately, w/o steady AP tasks flow it would be hard to measure, but try to run in different configs and chose that one that gives best overall performance. When GPU does FFA CPU core will sit idle with reduced core usage config. This CPU time could be used to make progress for some other CPU task in overcommitted config but will just be wasted in "reduces core usage" config.
My measurements for quad Q9450 + ATI HD4870 showed tath slowdown from CPU overcommitting overhead considerably less than gain from full CPU usage.
The only thing still unknown in this "equation" - your CUDA GPU.
You need to investigate how CUDA GPU performance changed. If not changed - I recommend to return to 100% CPU usage.

Profile Peter M. Ferrie
Volunteer tester
Send message
Joined: 28 Mar 03
Posts: 85
Credit: 9,399,402
RAC: 0
United States
Message 945908 - Posted: 8 Nov 2009, 8:12:34 UTC

this also could be the first step to using a multi core cpu to do 1 astropulse workunit using all cores so they could be completed in a fraction of the time ...not that we have many astropluse wu.

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,227,647
RAC: 7,591
Russia
Message 945914 - Posted: 8 Nov 2009, 8:47:21 UTC - in response to Message 945908.

this also could be the first step to using a multi core cpu to do 1 astropulse workunit using all cores so they could be completed in a fraction of the time ...not that we have many astropluse wu.

This app is still single threaded in regards to task processing. Only single worker thread that will issue Sleep() while waiting results from GPU. That's why having overcommitted CPU (by running let say AKv8 on each core in system) will have better overall performance (even if some of MB tasks will be processed much longer than usual)

GrandMasterD
Avatar
Send message
Joined: 14 Jan 03
Posts: 34
Credit: 27,280,159
RAC: 0
United Kingdom
Message 945940 - Posted: 8 Nov 2009, 13:14:21 UTC

I agree that if I only had 4 MB + 2 GPU AP WU running then the performance hit wouldn't have been as big but when I had a Cuda WU running as well everything took a muich bigger hit, the Cuda unit in paticular which makes up most of my WU ;)

I had to have a spare CPU, or at least only 4 CPU tasks running on the CPU instead of 6, to not see the massive hit in performance on the Cuda app as it seems that, in my case at least, I need a little CPU left to feed the 8800GT

After I have played around I can have the CPU @ 100% and everything is ok but I can only have 4 WU's running on my CPU to not see a performance hit and seeing as these use mainly the CPU I have to reduce my BOINC usage to 50% when two GPU AP WU are running, 75% when only 1 is running and 100% when I have no AP units...

Is there anyway to make BOINC think that this is mainly running on the CPU whilst still being able to offload parts to the GPU?

Hope this helps and makes sense!

Doug
____________


Please Vist The Seti City

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 357,745
RAC: 38
Germany
Message 945949 - Posted: 8 Nov 2009, 14:27:26 UTC - in response to Message 945940.

Is there any way to make BOINC think that this is mainly running on the CPU whilst still being able to offload parts to the GPU?

I guess that would be by changing
<avg_ncpus>0.1</avg_ncpus>
from your second post in this thread. However, it's really a guess, based on the name of the option.

Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,227,647
RAC: 7,591
Russia
Message 946103 - Posted: 9 Nov 2009, 9:35:27 UTC - in response to Message 945940.

but when I had a Cuda WU running as well everything took a muich bigger hit, the Cuda unit in paticular which makes up most of my WU ;)

1)What CUDA app do you use?
2) it seems you have i7 CPU, right?
3) Could you try to set CUDA app process affinity (via task manager in windows) to last 2 CPUs (and check if AP processes take first 4 CPUs only) and check if there will be noticeable change in processing time for CUDA app? (it should be done for each new task though so do it only for few tasks to get info will it help or not).


Is there anyway to make BOINC think that this is mainly running on the CPU whilst still being able to offload parts to the GPU?

Doug

Well, indeed you can set <avg_ncpus> to 1 then BOINC will allocate full CPU for each AP instance. Not sure it will bring overall performance benefit for your config though.
Unfortunately, there were no reports regarding ATI/CUDA apps interactions on app testing phase, that is, such configs are terra incognita. Will see if I could reproduce such config on my dual PCI-E duo....

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,227,647
RAC: 7,591
Russia
Message 946104 - Posted: 9 Nov 2009, 9:49:57 UTC - in response to Message 946103.

And another experiment:
If you run one of my CUDA MB builds, try to rise CUDA MB process priority to "high" or "normal" in windows task manager. Will it improve CUDA tasks timings ?

Profile Careface
Send message
Joined: 6 Jun 03
Posts: 115
Credit: 11,626,751
RAC: 0
New Zealand
Message 946259 - Posted: 10 Nov 2009, 1:15:37 UTC - in response to Message 946104.

And another experiment:
If you run one of my CUDA MB builds, try to rise CUDA MB process priority to "high" or "normal" in windows task manager. Will it improve CUDA tasks timings ?


I'm guessing you mean just the CUDA MB build? (i.e. not using an ATI/CUDA hybrid system) If so, then yes, raising priority to High gives a bit of a boost, but setting it to Realtime cuts ~20-30seconds off my normal WU times (usually around 11min 30s to 12mins for 0.39AR on a GTX216)

Bearing in mind this is on a single core CPU, with CPU crunching disabled.

Sorry if this isn't what you're looking for - I've been up for over 30hours now, and just got home from one of my end of year exams.. I'm tired! :( lol

Also, thanks for the app mate :) I know a lot of people on the overclockers forum I frequent are very grateful for this :)

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,227,647
RAC: 7,591
Russia
Message 946296 - Posted: 10 Nov 2009, 4:03:42 UTC - in response to Message 946259.

In this particular case I'm interesting will afiinity set or priority rise for CUDA app solve its slowdown by ATI/CPU app.
So the question is specific to ATI/nVidia combos hosts.
This slowdown can come from elevated priority of hybrid app. It needs such priority for the same reason as CUDA app - to get CPU quickly when needed to feed GPU. That is, it will directly compete with CUDA MB for CPU (CPU MB is out of competition due its idle priority setting).

GrandMasterD
Avatar
Send message
Joined: 14 Jan 03
Posts: 34
Credit: 27,280,159
RAC: 0
United Kingdom
Message 946597 - Posted: 11 Nov 2009, 22:10:23 UTC

I have a Q6600 @ 3.2ghz, I normally run 4 MB + 1 CUDA MB on a 8800GT.

When I am only running the above it takes around 2hrs per MB WU and around 30 mins (or quicker) to run a CUDA MB.

When I then began running the CPU/ATI Hybrid the above times jump from 2hrs to around 3+ hrs per MB wu and the CUDA MB jumps to around 1:30 hrs.

I have tried setting the priority higher on the cuda app and this does help although it only helps for that WU, as soon as it finishes that one and starts another I have to change the priority again... This also doesn't help the rest of the WU that have increased in time taken to run a WU.

With this it seems to me that it might be better setting these tasks so BOINC thinks they are CPU tasks. That way the CPU will contuine doing the same amount of work it was before this app was installed ;)

The way I see this app is that around 80% of the work is still done on the CPU with bits and pieces being off-loaded at intervals. I have noticed the drop in CPU usage when GPU is crunching but this is normally no longer than 20 secs. (On my 4870 at least)

I guess maybe having 5 WU going at once would make sense and out of those 2 could be AP and 3 could be MB as then the other tasks would pick up the unused CPU Cycles from 2 AP when they are using the GPU.

With my setup though it seems the only way I can get it to run, with minimal disruption to my CUDA card, is to make the MB_6.08_CUDA_V12_VLARKILL app higher priority everytime a new WU starts... or keep changing the amount of cpu's BOINC can use (having a very bad effect if I forget to change it!)

Is there a way I can auto the CUDA APP to a higher priority? (normal seems to work ok) or set it so these new Hybird WU's take the place of a MB WU?

If I am wrong anywhere in what I have typed please correct me as I am here to learn!

Hope this helps and everyone can make sense of it!

Cheers,

Doug
____________


Please Vist The Seti City

Wembley
Volunteer tester
Avatar
Send message
Joined: 16 Sep 09
Posts: 415
Credit: 888,257
RAC: 0
United States
Message 946649 - Posted: 12 Nov 2009, 2:56:49 UTC - in response to Message 946597.

try changing the <avg_ncpus>0.1</avg_ncpus> line in your app_info section for the hybrid cpu/ati app to <avg_ncpus>1</avg_ncpus>
____________


Donate with your searches and online buys:
http://www.goodsearch.com/toolbar/university-of-california-setihome

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,227,647
RAC: 7,591
Russia
Message 946756 - Posted: 12 Nov 2009, 19:36:55 UTC - in response to Message 946597.


With this it seems to me that it might be better setting these tasks so BOINC thinks they are CPU tasks. That way the CPU will contuine doing the same amount of work it was before this app was installed ;)

No, then BOINC will preempt other CPU tasks to do AP. And CPU will be idle when it could be used for another work.


I guess maybe having 5 WU going at once would make sense and out of those 2 could be AP and 3 could be MB as then the other tasks would pick up the unused CPU Cycles from 2 AP when they are using the GPU.

If you have quad core, then 2 AP+ 4MB better. When both AP tasks busy with GPU (one will actually use GPU, second will wait until first one frees GPU) one CPU core will be still idle in 2 AP +3 MB configuration.


With my setup though it seems the only way I can get it to run, with minimal disruption to my CUDA card, is to make the MB_6.08_CUDA_V12_VLARKILL app higher priority everytime a new WU starts

Is there a way I can auto the CUDA APP to a higher priority? (normal seems to work ok) or set it so these new Hybird WU's take the place of a MB WU?


Yes, it's possible. Try to use Process Lasso application for now.
Later I add same affinity splitting algorithm to CUDA MB as used in ATI AP now.
If all GPU-related apps will be scheduled to run on each own core there should be much less CPU contention.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3993
Credit: 109,869,243
RAC: 135,180
United States
Message 948192 - Posted: 19 Nov 2009, 4:42:15 UTC

I guessing with a name like ap_5.05_win_x86_SSE3_BROOK_r280.exe this requires a CPU that has SSE3 support? I was looking at this to run on one of my old boxen with an ATI 3850, 5004381, but since it's an older P4 only has SSE2.
Always figure it's better to ask before sticking it in and gettting failed wu's.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,227,647
RAC: 7,591
Russia
Message 948337 - Posted: 19 Nov 2009, 21:39:15 UTC - in response to Message 948192.

I guessing with a name like ap_5.05_win_x86_SSE3_BROOK_r280.exe this requires a CPU that has SSE3 support? I was looking at this to run on one of my old boxen with an ATI 3850, 5004381, but since it's an older P4 only has SSE2.
Always figure it's better to ask before sticking it in and gettting failed wu's.


Yes, it's SSE3 and up only app.
I'm aware that some older AMD Athlons 64 with SSE2-only but PCI-E bus exist.
So, there are some P4s too. Will see maybe it's worth to do SSE2 (actually, SSE) build too.

1 · 2 · 3 · 4 · Next

Message boards : Number crunching : CPU/ATI GPU hybrid AstroPulse for Windows released

Copyright © 2014 University of California