Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 58 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
-use_sleep to reduce CPU time? (if implemented on OS X) The more slow GPU the less negative impact -use_sleep will have on its performance, but could save some CPU cycles. Also need to check GPU counters I don't see enabled for OS X build you use. usually they ignored but they can be good indication why CPU time increases a lot. There is possibility that your GPU returns wrong results with some of GPU search kernels, but doesn't damage data array. In such case app in whole will return valid results but will spend much more CPU time than usually being "semi-CPU" one (CPU processing will fix errors that GPU made). This will result in sharp increase of some search misses in counters. Look for example this windows result: http://setiathome.berkeley.edu/result.php?resultid=4692482310 here are the counters I speak of:
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
Also, when all else fails, look at how similar machines are running, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=77941&offset=60 Note how that machine has much more consistent times than Yours. That is usually caused by having a Freed CPU core. Without a Freed core you get inconsistent times and results, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=77827&offset=20 So, try freeing a CPU core and use the same settings as Beta; -sbs 128 -oclfft_tune_gr 64 -oclfft_tune_wg 64 -period_iterations_num 32 -no_caching |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 ![]() ![]() |
Any idea why my max working group is not getting updated to 256? Here's the top of my stderr for the following wu: http://setiathome.berkeley.edu/result.php?resultid=4709802139 Seems to look like the max work group is still 64. On my windows machine that line reads 256 as I would expect. <stderr_txt> Running on device number: 0 DATA_CHUNK_UNROLL set to:18 oclFFT plan class overrides requested: global radix 256; local radix 16; max workgroup size 256 FFA thread block override value:16384 FFA thread fetchblock override value:8192 TUNE: kernel 1 now has workgroup size of (64,4,1) TUNE: kernel 2 now has workgroup size of (64,4,1) OpenCL platform detected: Apple Number of OpenCL devices found : 2 BOINC assigns slot on device #0. Info: BOINC provided OpenCL device ID used Used GPU device parameters are: Number of compute units: 32 Single buffer allocation size: 256MB Total device global memory: 6144MB max WG size: 64 local mem type: Real |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
I'd say there is something in the Code causing that WG setting. If you remember back about a year ago, there was a problem trying to compile the first series of Apple MB Apps because Apple uses 1024 for a WG size. The code didn't allow for a Work Group setting of 1024, only 256. There was some work around that wasn't included in the code people such as myself download. I just Compile what is present at the Repository, if there is something somewhere else I don't see or receive it. If you look at the CLinfo program in Mountain Lion you can see it correctly lists the Work Group as 1024 which is where the app was compiled. So, I'd say there is some work around in place *somewhere* that changes the Correct Apple Mountain Lion Work Group Size on ATI GPUs from 1024 to 256, or in the case of the files in the Repository, to 64. It would be nice if the SETI Code used the Correct Apple WG size on ATI GPUs, I just compile it. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Before marking smth correct or incorrect worth to look back and see why additional workaround was needed. If such restriction was added then it was needed apparently. It means particular config allows correct operation only with such WG size. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 ![]() ![]() |
Oh yeah, I remember that problem with the WG size. I had totally forgotten about that. Kinda hinders performance compared to Windows but not by a whole lot. Maybe a comparable fix is possible and will work its way into the repository in the not too distant future.:) Thanks! Chris |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
I think I remember it was 1024 in Mountain Lion, but changed to 256 in Mavericks. The current App was compiled in Mountain Lion as I still get the Linker Error when trying to Compile in any OSX higher. So, any idea how to fix it? Apparently there is a workaround somewhere as the Stock App says WG 256. Either it's reporting wrong, or there is something being used that isn't in the Files I'm downloading from the Repository. I've got someplace to be, so, I'll have to get back to it later... Before I go, one last look. It is set for WG 128, but reports WG 256? http://setiathome.berkeley.edu/result.php?resultid=4707438342 Maximum single buffer size set to:128MB oclFFT global radix override set to:256 oclFFT max WG size override set to:128 Used GPU device parameters are: Number of compute units: 14 Single buffer allocation size: 128MB Total device global memory: 1024MB max WG size: 256 ??? |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
oclFFT WG size used for FFT plan class and WG size used through app itself are different things. And size reported by Apple's runtime is just third distinct thing. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
Ok, so I shouldn't expect the WG settings to show up in the Used GPU device parameters field. After looking over the post by Chris I NOW see he is referring to the Stock AP7r2750. Hmmm, I had nothing to do with that one. I do remember seeing the WG 64, but it doesn't seem to be causing any problems. In fact, apparently whatever was causing that was gone by AP7r2934 as the AP App I built has the WG listed as 256, http://setiathome.berkeley.edu/result.php?resultid=4710552601 Used GPU device parameters are: Number of compute units: 14 Single buffer allocation size: 224MB Total device global memory: 1024MB max WG size: 256 Comparing the Builds on My machine there isn't any noticeable difference between r2750 & r2934, they preform about the same. All it would take then, is to compile the App with a more recent source version to have it show WG 256. Whether it would make any difference on a different machine is unknown. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
There is a Real problem with the OSX ATI AP App though. Seems the signal strength, or whatever, is just enough off that it misses a signal ever now and then. Usually it's only noticeable when there is only One or Zero signals found, as in this task, http://setiathome.berkeley.edu/workunit.php?wuid=2054897008. It's been happening Forever and happens in both r2750 and 2934. It is the reason I built r2934 and the ones before it. I see it on other machines as well. Anyway to fix that? single pulses: 0 repetitive pulses: 1 percent blanked: 0.00 single pulses: 0 repetitive pulses: 0 percent blanked: 0.00 |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 ![]() |
The opencl_ati_mac app currently being tested on beta finally works properly on a HD4XXX without having to add -no_caching to the command line file.Tom, Urs - I have run a few WUs on Beta on my iMac with the ATI Radeon HD 4670. It runs them, but not well. The first one it ran took a long time and got a computation error. After that it seems to run them correctly. The problem is that it creates rectangles of messed up screen image randomly around the iMac's screen. I suspect the GPU just doesn't have enough VRAM to run the app properly. Here are three that have run so far: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606611 - Computation error http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606866 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606854 - Tom |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
There is a Real problem with the OSX ATI AP App though. Seems the signal strength, or whatever, is just enough off that it misses a signal ever now and then. Usually it's only noticeable when there is only One or Zero signals found, as in this task, http://setiathome.berkeley.edu/workunit.php?wuid=2054897008. It's been happening Forever and happens in both r2750 and 2934. It is the reason I built r2934 and the ones before it. I see it on other machines as well. Anyway to fix that? Failed task: ffa total=1.967E+12 , N=999 , <>=1.969E+09 , min=8.032E+08 , max=1.192E+10 correct result: class T_ffa: total=1.32e+012, N=999, <>=1.32e+009, min=5.48e+008, max=7.55e+009 FFA blocks counters: class T_FFA_fetch: total=0.00e+000, N=0, <>=0.00e+000, min=1.84e+019, max=0.00e+000 class T_FFA_tt_build: total=0.00e+000, N=0, <>=0.00e+000, min=1.84e+019, max=0.00e+000 class T_FFA_compare: total=1.80e+007, N=8, <>=2.26e+006, min=5.30e+005, max=6.25e+006 class T_FFA_coadd: total=5.70e+008, N=124771, <>=4.57e+003, min=2.86e+003, max=3.01e+005 class T_FFA_stride_add: total=9.07e+004, N=7, <>=1.30e+004, min=1.06e+004, max=1.69e+004 class T_GPU_buffer_read_backs: total=0, N=0, <>=0, min=0 max=0 correct result 2: ffa total=2.838E+12 , N=999 , <>=2.841E+09 , min=1.005E+09 , max=1.665E+10 FFA blocks counters: FFA_fetch total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00 FFA_tt_build total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00 FFA_compare total=9.362E+06 , N=8 , <>=1.170E+06 , min=2.410E+05 , max=3.157E+06 FFA_coadd total=9.261E+09 , N=242209 , <>=3.823E+04 , min=2.521E+04 , max=1.631E+07 FFA_stride_add total=3.891E+05 , N=7 , <>=5.559E+04 , min=5.390E+04 , max=5.771E+04 GPU_buffer_read_backs total=0.0000E+00, N=0 , <>=0 , min=0 , max=0 So, failed task never tried to find signal after pre-compute. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
There is a Real problem with the OSX ATI AP App though. Seems the signal strength, or whatever, is just enough off that it misses a signal ever now and then. Usually it's only noticeable when there is only One or Zero signals found, as in this task, http://setiathome.berkeley.edu/workunit.php?wuid=2054897008. It's been happening Forever and happens in both r2750 and 2934. It is the reason I built r2934 and the ones before it. I see it on other machines as well. Anyway to fix that? ...and the fix is? Note the cards, I have 3 of them, Don't have this problem in Linux or Windows. They only fail with the Zero or 1 signal task in OSX. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
...and the fix is? Try to catch task for offline benchmarking. In offline run try -v 3 -skip_ffa_precompute save log for comparison with another build (windows/linux) running with the same parameters. Preferable to set -ffa_block and other values to the same to simplify comparison. Then provide both logs for analys. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
I'll keep an eye on the inconclusives and see if another suspect shows. Usually though I only see one every couple hundred tasks or so. It might be a while. Right now I don't see any, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=0&show_names=0&state=3&appid=20 |
Urs Echternacht ![]() Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 ![]() ![]() |
The opencl_ati_mac app currently being tested on beta finally works properly on a HD4XXX without having to add -no_caching to the command line file.Tom, Ok Tom. First thanks for running the beta test app on the ATI HD 4670. The times for the results are what to expect because of GPU core frequency and RAM type from these GPUs, similar to the runtimes i get on linux but with somewhat higher clocked ATI HD 4670. Now could you test how using less RAM influences the screen garbage you reported to see. Please add to a commandline only " -sbs 88", nothing else. Same way like you did it before for the no-caching option. _\|/_ U r s |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
...and the fix is? This is another strange one, http://setiathome.berkeley.edu/result.php?resultid=4698650636 That was a HD 7750 in Linux. The same machine now has a 6850 that spent a couple years in the Mac. The basic count was the same on All three; single pulses: 2 repetitive pulses: 0 percent blanked: 4.98 But the 7750 got an Invalid. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 ![]() |
Ok Tom. Urs - I ran the Beta app on both of my iMacs last night with "-sbs 88" in the command line. It worked with no screen glitches. iMac with the ATI Radeon HD 4670: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606662 It is currently 6.5 hours into a task and at only 25%. I suspect it will be another computation error. iMac with the ATI Radeon HD 4850: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22664176 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22664113 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22663993 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22663984 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22663954 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22663952 I will keep running tasks on the 4670 and report back. - Tom |
Urs Echternacht ![]() Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 ![]() ![]() |
iMac with the ATI Radeon HD 4670: My fear is that the long runtime still points to a low memory problem. Could you try to reduce to " -sbs 80" let some tasks (10+) run one by one and watch if another too long running wu happens on your ATI HD 4670 ? If it happens again try to reduce more, until times for similar angle ranged wus are also always similar. That will be the new default for this app. _\|/_ U r s |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 ![]() |
I will try -sbs 80. Adding -sbs 88 made a big difference. The screen glitches went away and it is completing most tasks in just under 10k seconds. I think the 29 k task was part way through when I added -sbs 88. The results are here: http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=78060 I will keep running WUs when the computer is available and watch the results with -sbs 80. - Tom |
©2023 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.