CPU/ATI GPU hybrid AstroPulse for Windows released

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 945477 - Posted: 6 Nov 2009, 19:43:07 UTC http://lunatics.kwsn.net/12-gpu-crunching/cpu-ati-gpu-hybrid-astropulse-for-windows-released.msg23108.html#msg23108 ID: 945477 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 945508 - Posted: 6 Nov 2009, 22:43:54 UTC - in response to Message 945477. Last modified: 7 Nov 2009, 4:09:11 UTC very nice. I just wish I could get an AP WU once in a while /edit the last line of your code should be </app_info> not <app> BOINC kept kicking a syntax error back on the app_info took me a minute to notice the problem In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 945508 ·

Timi Volunteer tester Send message Joined: 7 Oct 99 Posts: 25 Credit: 6,533,108 RAC: 0	Message 945618 - Posted: 7 Nov 2009, 6:38:53 UTC Last modified: 7 Nov 2009, 6:39:21 UTC Nice one Raistmer!!! Let's see now, how lucky are we going to be for getting one Astropulse WU. Haven't got one since the 8th of June 2009!!! :( ID: 945618 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 945641 - Posted: 7 Nov 2009, 9:55:32 UTC - in response to Message 945508. very nice. I just wish I could get an AP WU once in a while /edit the last line of your code should be </app_info> not <app> BOINC kept kicking a syntax error back on the app_info took me a minute to notice the problem Ah, thanks, I copied section from live app_info and probably took too much in copy range :) ID: 945641 ·

GrandMasterD Send message Joined: 14 Jan 03 Posts: 34 Credit: 31,236,412 RAC: 0	Message 945771 - Posted: 7 Nov 2009, 21:44:28 UTC Been running this overnight as I had 5 AP units saved up :D My Q6600 @ 3.2GHZ can do an AP unit by itself in 10-12Hrs depending on usage of PC. Most take 11 hours + With this ATI GPU app I can now do an AP unit in around 8-10 hours! Most of the AP I have completed have been done in 8~ hours! I noticed that the way this GPU APP works is that MOST if the work is done on the CPU and uses the GPU every now and again. It will use the GPU for a short burst and then for a longer burst. IT DOES NOT USE THE GPU CONSTANTLY! In the longer burst of GPU usage I was looking at 15secs to do 0.500%! I have also noticed that BOINC doesn't seem to see the GPU APP as using the CPU at all, it will try running 4 MB + 2 AP + 1 CUDA units and six units running on my quad brought it all to a crawl... I had to reduce the BOINC app down to 25% core usage as this then ment I had 1 MB + 2 AP + 1 Cuda app running... Everything then went swimmingly with 80-85% usage on the CPU. I'm so glad to see ATI being used with SETI at last, if anyone EVER needs an ATI Beta Tester come to me! Comp Specs: Q6600 @ 3.24GHz 4GB RAM 4870 512MB 8800GT 512MB Any questions please ask! I will add that it did take me a while to get the app_info working correctly, I will post my app_info in a mo Cheers for getting this going and looking forward to a whole AP being done on the GPU! Doug Please Vist The Seti City ID: 945771 ·

GrandMasterD Send message Joined: 14 Jan 03 Posts: 34 Credit: 31,236,412 RAC: 0	Message 945776 - Posted: 7 Nov 2009, 22:19:37 UTC This is what I use in my APP_INFO that gets the app working for me. THIS HAS TO OVERWRITE THE AP 5.05 part that already exists in the APP_INFO. </app> <file_info> <name>ap_5.05_win_x86_SSE3_BROOK_r280.exe</name> <executable/> </file_info> <app_version> <app_name>astropulse_v505</app_name> <version_num>505</version_num> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>1</max_ncpus> <coproc> <type>ATI</type> <count>0.5</count> </coproc> <file_ref> <file_name>ap_5.05_win_x86_SSE3_BROOK_r280.exe</file_name> <main_program/> </file_ref> </app_version> I have changed the amount the ATI card can use to .5, I'm not sure if this has made any difference but I had a problem in the begginning which doing this seemed to fix. Cheers, Doug Please Vist The Seti City ID: 945776 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 945878 - Posted: 8 Nov 2009, 6:42:14 UTC - in response to Message 945771. Last modified: 8 Nov 2009, 6:42:48 UTC IT DOES NOT USE THE GPU CONSTANTLY! Yes, that's why it called CPU/GPU hybrid app. Technically, it does FFA on GPU. I had to reduce the BOINC app down to 25% core usage as this then ment I had 1 MB + 2 AP + 1 Cuda app running... Everything then went swimmingly with 80-85% usage on the CPU. So you did just very same thing I advised against. You turned to setup with <100% CPU load, this will probably drops host performance instead of increasing it. CPU should be overcommitted to recive noticeable performance benefit. Unfortunately, w/o steady AP tasks flow it would be hard to measure, but try to run in different configs and chose that one that gives best overall performance. When GPU does FFA CPU core will sit idle with reduced core usage config. This CPU time could be used to make progress for some other CPU task in overcommitted config but will just be wasted in "reduces core usage" config. My measurements for quad Q9450 + ATI HD4870 showed tath slowdown from CPU overcommitting overhead considerably less than gain from full CPU usage. The only thing still unknown in this "equation" - your CUDA GPU. You need to investigate how CUDA GPU performance changed. If not changed - I recommend to return to 100% CPU usage. ID: 945878 ·

Peter M. Ferrie Volunteer tester Send message Joined: 28 Mar 03 Posts: 86 Credit: 9,967,062 RAC: 0	Message 945908 - Posted: 8 Nov 2009, 8:12:34 UTC this also could be the first step to using a multi core cpu to do 1 astropulse workunit using all cores so they could be completed in a fraction of the time ...not that we have many astropluse wu. ID: 945908 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 945914 - Posted: 8 Nov 2009, 8:47:21 UTC - in response to Message 945908. this also could be the first step to using a multi core cpu to do 1 astropulse workunit using all cores so they could be completed in a fraction of the time ...not that we have many astropluse wu. This app is still single threaded in regards to task processing. Only single worker thread that will issue Sleep() while waiting results from GPU. That's why having overcommitted CPU (by running let say AKv8 on each core in system) will have better overall performance (even if some of MB tasks will be processed much longer than usual) ID: 945914 ·

GrandMasterD Send message Joined: 14 Jan 03 Posts: 34 Credit: 31,236,412 RAC: 0	Message 945940 - Posted: 8 Nov 2009, 13:14:21 UTC I agree that if I only had 4 MB + 2 GPU AP WU running then the performance hit wouldn't have been as big but when I had a Cuda WU running as well everything took a muich bigger hit, the Cuda unit in paticular which makes up most of my WU ;) I had to have a spare CPU, or at least only 4 CPU tasks running on the CPU instead of 6, to not see the massive hit in performance on the Cuda app as it seems that, in my case at least, I need a little CPU left to feed the 8800GT After I have played around I can have the CPU @ 100% and everything is ok but I can only have 4 WU's running on my CPU to not see a performance hit and seeing as these use mainly the CPU I have to reduce my BOINC usage to 50% when two GPU AP WU are running, 75% when only 1 is running and 100% when I have no AP units... Is there anyway to make BOINC think that this is mainly running on the CPU whilst still being able to offload parts to the GPU? Hope this helps and makes sense! Doug Please Vist The Seti City ID: 945940 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 945949 - Posted: 8 Nov 2009, 14:27:26 UTC - in response to Message 945940. Is there any way to make BOINC think that this is mainly running on the CPU whilst still being able to offload parts to the GPU? I guess that would be by changing <avg_ncpus>0.1</avg_ncpus> from your second post in this thread. However, it's really a guess, based on the name of the option. GruÃŸ, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours ID: 945949 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 946103 - Posted: 9 Nov 2009, 9:35:27 UTC - in response to Message 945940. but when I had a Cuda WU running as well everything took a muich bigger hit, the Cuda unit in paticular which makes up most of my WU ;) 1)What CUDA app do you use? 2) it seems you have i7 CPU, right? 3) Could you try to set CUDA app process affinity (via task manager in windows) to last 2 CPUs (and check if AP processes take first 4 CPUs only) and check if there will be noticeable change in processing time for CUDA app? (it should be done for each new task though so do it only for few tasks to get info will it help or not). Is there anyway to make BOINC think that this is mainly running on the CPU whilst still being able to offload parts to the GPU? Doug Well, indeed you can set <avg_ncpus> to 1 then BOINC will allocate full CPU for each AP instance. Not sure it will bring overall performance benefit for your config though. Unfortunately, there were no reports regarding ATI/CUDA apps interactions on app testing phase, that is, such configs are terra incognita. Will see if I could reproduce such config on my dual PCI-E duo.... ID: 946103 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 946104 - Posted: 9 Nov 2009, 9:49:57 UTC - in response to Message 946103. And another experiment: If you run one of my CUDA MB builds, try to rise CUDA MB process priority to "high" or "normal" in windows task manager. Will it improve CUDA tasks timings ? ID: 946104 ·

Careface Send message Joined: 6 Jun 03 Posts: 128 Credit: 16,561,684 RAC: 0	Message 946259 - Posted: 10 Nov 2009, 1:15:37 UTC - in response to Message 946104. And another experiment: If you run one of my CUDA MB builds, try to rise CUDA MB process priority to "high" or "normal" in windows task manager. Will it improve CUDA tasks timings ? I'm guessing you mean just the CUDA MB build? (i.e. not using an ATI/CUDA hybrid system) If so, then yes, raising priority to High gives a bit of a boost, but setting it to Realtime cuts ~20-30seconds off my normal WU times (usually around 11min 30s to 12mins for 0.39AR on a GTX216) Bearing in mind this is on a single core CPU, with CPU crunching disabled. Sorry if this isn't what you're looking for - I've been up for over 30hours now, and just got home from one of my end of year exams.. I'm tired! :( lol Also, thanks for the app mate :) I know a lot of people on the overclockers forum I frequent are very grateful for this :) ID: 946259 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 946296 - Posted: 10 Nov 2009, 4:03:42 UTC - in response to Message 946259. In this particular case I'm interesting will afiinity set or priority rise for CUDA app solve its slowdown by ATI/CPU app. So the question is specific to ATI/nVidia combos hosts. This slowdown can come from elevated priority of hybrid app. It needs such priority for the same reason as CUDA app - to get CPU quickly when needed to feed GPU. That is, it will directly compete with CUDA MB for CPU (CPU MB is out of competition due its idle priority setting). ID: 946296 ·

GrandMasterD Send message Joined: 14 Jan 03 Posts: 34 Credit: 31,236,412 RAC: 0	Message 946597 - Posted: 11 Nov 2009, 22:10:23 UTC I have a Q6600 @ 3.2ghz, I normally run 4 MB + 1 CUDA MB on a 8800GT. When I am only running the above it takes around 2hrs per MB WU and around 30 mins (or quicker) to run a CUDA MB. When I then began running the CPU/ATI Hybrid the above times jump from 2hrs to around 3+ hrs per MB wu and the CUDA MB jumps to around 1:30 hrs. I have tried setting the priority higher on the cuda app and this does help although it only helps for that WU, as soon as it finishes that one and starts another I have to change the priority again... This also doesn't help the rest of the WU that have increased in time taken to run a WU. With this it seems to me that it might be better setting these tasks so BOINC thinks they are CPU tasks. That way the CPU will contuine doing the same amount of work it was before this app was installed ;) The way I see this app is that around 80% of the work is still done on the CPU with bits and pieces being off-loaded at intervals. I have noticed the drop in CPU usage when GPU is crunching but this is normally no longer than 20 secs. (On my 4870 at least) I guess maybe having 5 WU going at once would make sense and out of those 2 could be AP and 3 could be MB as then the other tasks would pick up the unused CPU Cycles from 2 AP when they are using the GPU. With my setup though it seems the only way I can get it to run, with minimal disruption to my CUDA card, is to make the MB_6.08_CUDA_V12_VLARKILL app higher priority everytime a new WU starts... or keep changing the amount of cpu's BOINC can use (having a very bad effect if I forget to change it!) Is there a way I can auto the CUDA APP to a higher priority? (normal seems to work ok) or set it so these new Hybird WU's take the place of a MB WU? If I am wrong anywhere in what I have typed please correct me as I am here to learn! Hope this helps and everyone can make sense of it! Cheers, Doug Please Vist The Seti City ID: 946597 ·

Wembley Volunteer tester Send message Joined: 16 Sep 09 Posts: 429 Credit: 1,844,293 RAC: 0	Message 946649 - Posted: 12 Nov 2009, 2:56:49 UTC - in response to Message 946597. try changing the <avg_ncpus>0.1</avg_ncpus> line in your app_info section for the hybrid cpu/ati app to <avg_ncpus>1</avg_ncpus> ID: 946649 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 946756 - Posted: 12 Nov 2009, 19:36:55 UTC - in response to Message 946597. With this it seems to me that it might be better setting these tasks so BOINC thinks they are CPU tasks. That way the CPU will contuine doing the same amount of work it was before this app was installed ;) No, then BOINC will preempt other CPU tasks to do AP. And CPU will be idle when it could be used for another work. I guess maybe having 5 WU going at once would make sense and out of those 2 could be AP and 3 could be MB as then the other tasks would pick up the unused CPU Cycles from 2 AP when they are using the GPU. If you have quad core, then 2 AP+ 4MB better. When both AP tasks busy with GPU (one will actually use GPU, second will wait until first one frees GPU) one CPU core will be still idle in 2 AP +3 MB configuration. With my setup though it seems the only way I can get it to run, with minimal disruption to my CUDA card, is to make the MB_6.08_CUDA_V12_VLARKILL app higher priority everytime a new WU starts Is there a way I can auto the CUDA APP to a higher priority? (normal seems to work ok) or set it so these new Hybird WU's take the place of a MB WU? Yes, it's possible. Try to use Process Lasso application for now. Later I add same affinity splitting algorithm to CUDA MB as used in ATI AP now. If all GPU-related apps will be scheduled to run on each own core there should be much less CPU contention. ID: 946756 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 948192 - Posted: 19 Nov 2009, 4:42:15 UTC I guessing with a name like ap_5.05_win_x86_SSE3_BROOK_r280.exe this requires a CPU that has SSE3 support? I was looking at this to run on one of my old boxen with an ATI 3850, 5004381, but since it's an older P4 only has SSE2. Always figure it's better to ask before sticking it in and gettting failed wu's. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 948192 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 948337 - Posted: 19 Nov 2009, 21:39:15 UTC - in response to Message 948192. I guessing with a name like ap_5.05_win_x86_SSE3_BROOK_r280.exe this requires a CPU that has SSE3 support? I was looking at this to run on one of my old boxen with an ATI 3850, 5004381, but since it's an older P4 only has SSE2. Always figure it's better to ask before sticking it in and gettting failed wu's. Yes, it's SSE3 and up only app. I'm aware that some older AMD Athlons 64 with SSE2-only but PCI-E bus exist. So, there are some P4s too. Will see maybe it's worth to do SSE2 (actually, SSE) build too. ID: 948337 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.