Message boards :
Number crunching :
OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 18 · Next
Author | Message |
---|---|
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Yeah, I've been running my Macs with a build of Tbars with pretty good success for a while so I have a good baseline for comparison. I'll see if he will compile one with the SoG switch. Thanks, Chris |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
The right Kernel (MultiBeam_Kernels_r3381.cl) seem to lower my CPU usage:) More FYI. -v 8 switch test with right kernel (MultiBeam_Kernels_r3381.cl). Hopefully some better data. I'll be going back to rev. 3366, because it's faster with less CPU usage for my system. With or without using the -v 8 switch. Someone with CPU core's to spare, might be better off with this newer revision (rev. 3381). If you need some more testing, trying different parameter's from the test results or a different revision to try, just let me know. Joe With -v 8 switch http://setiathome.berkeley.edu/workunit.php?wuid=2075019060 http://setiathome.berkeley.edu/workunit.php?wuid=2075019072 http://setiathome.berkeley.edu/workunit.php?wuid=2075018859 http://setiathome.berkeley.edu/workunit.php?wuid=2074685866 Without -v 8 switch http://setiathome.berkeley.edu/workunit.php?wuid=2073773400 http://setiathome.berkeley.edu/workunit.php?wuid=2074976160 http://setiathome.berkeley.edu/workunit.php?wuid=2074975852 http://setiathome.berkeley.edu/workunit.php?wuid=2074936776 Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Next time you will see such coherence please record AR of involved tasks also. When I started this last night I was looking at the angles to see what affect the AR was since these were about 2 minutes slower. AR was 0.42-0.44 which tends to be the majority of what I normally see I've been reviewing those that processed over night and except for few high angle most are within the range of .42-.44 I've never got the hang of off line testing, so I stuck a exclusion line into a cc_config and prevented seti from running on 3 of the 4 GPU and currently only have my GPU 0 running. I restarted it with all the instances at the same time it's using anywhere from 22-25% CPU Utilization of 16 Hyperthread cores. (looks like it's spread the load across several cores 6-7) As they approach 60% complete they begin to increase CPU demand up to a final amount of 30% total of CPU Utilization. Once all complete and new task start the value drops to 22-25% total CPU Utilization I had a series of 20dec10 with AR 1.36 those used significantly less CPU (8-10% total CPU) than the other work units. Once those cleared I ran with 2 GPUs next. CPU Utilization of low 27% rising up to 46% total CPU as completion approaches. I'm going to remove the -v 8 as I can't find the AR with it in there. I've been forced to look at my wingman's stderr for the AR Think I'm going to switch back to r3366 for now. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I can't see any indication the build is actually using the SoG feature, However, it's giving an Unknown Error the other builds didn't. So, I suppose the -DUSE_SIGNALS_ON_GPU option worked; ERROR: Available memory buffer of 128MB too small for PulseFind (168.7MB required), increase -sbs N value; exiting... It seems it's a little slower than the normal non-SoG build and uses a little more CPU. I haven't tried the nVidia SoG build yet... |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Yeah "signals_on_gpu" shows up in the build features of the Windows app... Doesn't look like its get included for some reason. Chris |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, something is causing it to require more SBS and the only change was adding signals_on_gpu to the configure line. It is a newer repository version, if that somehow matters. The nVidia version isn't any different; Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_CHIRP3 ASYNC_SPIKE FFTW SSSE3 64bit It also doesn't show much change from the non-SoG version... I wonder if it would be better with just -DSIGNALS_ON_GPU instead of; ...-DUSE_OPENCL -DUSE_OPENCL_HD5xxx -DUSE_SIGNALS_ON_GPU -DUSE_SSSE3 -DUSE_FFTWF -DSETI7 -DSETI8 -DOCL_CHIRP3 -DASYNC_SPIKE... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
SIGNALS_ON_GPU just one of possible paths and other defines regulate another paths. To switch config lines not too hard actually. And don't build from head - it's under development currently. Use same rev as for published windows build. It's more stable though lack of recently added features. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Hmmm, it seems -DSIGNALS_ON_GPU has awoken the beast. But now it says; #error: SIGNALS_ON_GPU path currently implemented only for ZERO_COPY path From my experience ZERO_COPY slows down a shorty by about a minute, and ASYNC_SPIKE is good for a few seconds faster times.... We'll see. Now I'm seeing the same Errors as the last build; sah_v7_opt/src/counters.h:251:9: error: use of undeclared identifier '__rdtsc' Is there an Easy way to Fix these Errors? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I gave up on the counters and just slashed out the offending lines...as before. If I knew how I'd just disable the counters entirely, similar to the older builds that don't have the counters. The ATI App was just a little slower than the non-SoG build, the nVidia App was quite a bit slower. I went back to the non-SoG builds. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I updated whole builds set. Now better CU loading on PulseFind implemented, maybe result in faster app. Worth to check. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
where can we download from? Link to new apps? |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Good deal. I was already getting outstanding performace on my 570 just running one wu at a time. Would the new pulsefind implementation show up on a fresh compile of the non-SoG AMD app as well or just the SoG version? Thanks, Chris |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
|
Mike Send message Joined: 17 Feb 01 Posts: 34381 Credit: 79,922,639 RAC: 80 |
Good deal. I was already getting outstanding performace on my 570 just running one wu at a time. Would the new pulsefind implementation show up on a fresh compile of the non-SoG AMD app as well or just the SoG version? Yes, both SoG and Non SoG have new Pulsefind. With each crime and every kindness we birth our future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Good deal. I was already getting outstanding performace on my 570 just running one wu at a time. Would the new pulsefind implementation show up on a fresh compile of the non-SoG AMD app as well or just the SoG version? yes. And the link is: https://cloud.mail.ru/public/DMkN/x4BRCYuAV |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
FYI Here's some SoG 3401 Wu's. http://setiathome.berkeley.edu/workunit.php?wuid=2086366753 http://setiathome.berkeley.edu/workunit.php?wuid=2086366685 http://setiathome.berkeley.edu/workunit.php?wuid=2086358459 http://setiathome.berkeley.edu/workunit.php?wuid=2086374320 Joe Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
Rasputin42 Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0 |
I am getting a very spiky utilization and it takes much longer than r3366. What are the requirements for r3401? |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Yea I'm seeing large continuous kernal usage. Times look to be about 4 minutes slower than Version 3366 Tomorrow I'm going to try the NonSoG version and see what that does in regards to CPU and times |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
Version 3401 is using more CPU. My CPU usage is 85 - 100%, closer to 100%. I'll run all night with this Version. Hopefully it gives some good data. My GTX 560 Ti is now using SoG Version 3366. I'm also running 1 CPU Wu. Joe Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
If you see slowdown versus r3366 please try to play with this parameters: -pref_wg_size N New one, older default would correspond -pref_wg_size 128 for ATi and 32 for NV Now default for ATi is 64 (for NV should be same 32 but maybe defaults screwed so try -pref_wg_size from 32 to 256 in step of 64 for ATi and 32 for NV). And better to do this offline cause with some configs high WG sizes caused total OS freeze (yeah, we have "truly preemptive multitasking OS" all these years called Windows :/ ) -sbs N default is 128, try different values around. Not nessessary in 64MB steps (!). this value used @decision how many WG will be. Non-standard size could change that decision to be more speedy. Also, would be good to use -v 8 option and note what WG numbers formed in r3366 and r3401 for similar PulseFind launches. r3401 should load all available CUs and load them more fully in case of memory limit, but this can have side-effects of different memory access patterns. Quite possible that new memory access pattern causes more slowdown than few idle CUs would do in prev revision. And that memory access pattern can be changed in some extent with these 2 options. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.