Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 48 · 49 · 50 · 51 · 52 · 53 · 54 . . . 58 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
What about App_Config.xml??? (I'm currently using one with the CUDA75 App to run 2 Units at a time...) I'm assuming that I want to ONLY run one Unit at a time...??? (Unless the OpenCL App will allow me to run two Units at a time, like CUDA...???) Running 2 GPU WUs at a time using SoG with anything other than high end hardware will result in even less work being done. And each WU being processed requires 1 CPU core to support it. Grant Darwin NT |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
What about App_Config.xml??? (I'm currently using one with the CUDA75 App to run 2 Units at a time...) I'm assuming that I want to ONLY run one Unit at a time...??? (Unless the OpenCL App will allow me to run two Units at a time, like CUDA...???) OK; so, (for me), the Max_Concurrent Line would be "2". What other changes would need to be made to the App_Config that I've pasted here? I'm assuming that the two CommandLine Parameters need to be changed to work with my two 750TI SC cards, right? TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Ah, you should be running One OpenCL task at a time, you are Not running any CPU tasks. There isn't any need for an App_Config. None. Just use the files in the download with the changes I posted. Either rename the App_Config so it isn't used or remove it from the folder. If you want to try a speedup, just lower the cmd -period_iterations_num 16 to something lower say; -period_iterations_num 10 BTW, behold...the fastest Mac App at SETI, https://setiathome.berkeley.edu/results.php?hostid=8424399&offset=220 Now that is a Fast Mac. |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
Ah, you should be running One OpenCL task at a time, you are Not running any CPU tasks. There isn't any need for an App_Config. None. Just use the files in the download with the changes I posted. Either rename the App_Config so it isn't used or remove it from the folder. OK - no App_Config... Once my CUDA75 Units complete, I'll remove the file. Now, from the setiathome.berkeley.edu Folder; I need to remove the x41zi-CUDA75 File, the App_Config - and then, what??? Copy and paste the five files from the r3709 Extracted Folder...??? (Three "exec" Files, one .txt, (Command Line File), and the App_Info.xml File.) Let me know if this is right. Thanks. :-) TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, I found this http://setiathome.berkeley.edu/result.php?resultid=6412085953 An NVIDIA quadro 4000 is experiencing difficulties when launching a kernel. Looks like it is an official cuda 7.5 version (not an anonymous platform). "Too many resources reuested" indicates too much shared mem, blocks, registers or threads requested at launch. First guess: If it is not asking for too much shared mem, then there may be a __launch_bounds__(threads, blocks) directive at source code for the kernel requesting too many simultaneous blocks. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
There's too much wrong with that scenario to even bother with; 1) The Quadro 4000 is a Fermi GPU, which we know doesn't work with CUDA 7.5; The CUDA 6.0 Special App is for the older Kepler CC 3.5 GPUs that might not work well with CUDA 7.5... 2) Fermi GPUs are supposed to be Blocked from the 7.5 App; I added a compute capability >= 3.0 to the plan class. 3) It might still work on that Fermi GPU if He were using close to the Correct Driver; a) He is using Driver 4600.61 b) He should be using Driver 5243.59 I gave up on that machine a while back. It's similar to the ATI HD4 GPUs that keep being sent the HD5 Apps. There doesn't seem to be a way to Stop it. Kinda like how All the versions of the CUDA Special zi3x Apps I've built keep producing Invalids.... ;-) |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
@TBar, Just finished crunching the last of the CUDA75 Units. I still need to know about the five extracted files... (Three "Exec" Files, one App_Info.xml, and the CommandLine-".txt" file.) Do I place all five of these files into the "setiathome.berkeley.edu" Project Folder??? I'd like to start crunching with the new OpenCL App as soon as possible. :-) Thanks. :-) TL [EDIT:] Out of 200 Units assigned to Andromeda/Hackintosh, 150 are now Valid, 48 are Pending, and 2 are marked as Inconclusive. (CUDA75 App.) Even though this is a slower App, these numbers are an improvement over what I had when I was on 10.11.4. I'm looking forward to even greater improvement with the new OpenCL App. TL [EDIT 2:] [Modified App_Info.xml] <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.22r3709_NV_ssse3_x86_64-apple-darwin</name> <executable/> </file_info> <file_info> <name>MultiBeam_Kernels_r3709.cl</name> </file_info> <file_info> <name>mb_cmdline_mac_OpenCL_NV_sah.txt</name> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-apple-darwin</platform> <version_num>819</version_num> <plan_class>opencl_nvidia_mac</plan_class> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <coproc> <type>CUDA</type> <count>1</count> </coproc> <file_ref> <file_name>MBv8_8.22r3709_NV_ssse3_x86_64-apple-darwin</file_name> <main_program/> </file_ref> <file_ref> <file_name>MultiBeam_Kernels_r3709.cl</file_name> </file_ref> <file_ref> <file_name>mb_cmdline_mac_OpenCL_NV_sah.txt</file_name> <open_name>mb_cmdline.txt</open_name> </file_ref> </app_version> </app_info> Is this right, now??? I think I've taken out the CPU Section properly; but, would like you to double check what I've done. (This is my first "hack" at an App_Info.xml file. (I'm more proficient with App_Config.xml...)) TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
[Update:] 2-17-2018 at 6:45 PM - PST I took a chance, copied and pasted the modified App_Info.xml, the CommandLine.txt File, and two of the three "Exec" Files. (I did NOT copy over the CPU "Exec" File.) I Resumed SETI processing, but Event Log states No Tasks Available... No errors, though, in the Event Log; I think everything is OK... TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I Resumed SETI processing, but Event Log states No Tasks Available... Seti is still having problems getting data from Green Bank & Arecibo, hence no data to split, and so no work at the moment, Grant Darwin NT |
Wiggo Send message Joined: 24 Jan 00 Posts: 34754 Credit: 261,360,520 RAC: 489 |
I Resumed SETI processing, but Event Log states No Tasks Available...T.L. you should know to check the current Panic Mode On thread 1st before posting. :-O Anyhow there's more fodder to chew on now. :-D Cheers. |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
[Update:] StdErr Report on a recent OpenCL Unit: [Task ID: 6414974466 Name blc12_2bit_guppi_58137_27028_HIP45688_0013.4244.0.22.45.197.vlar_1 Workunit 2865978324 Created 18 Feb 2018, 8:25:53 UTC Sent 18 Feb 2018, 8:25:54 UTC Report deadline 12 Apr 2018, 13:25:36 UTC Received 18 Feb 2018, 10:14:15 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 7952666 Run time 13 min 44 sec CPU time 5 min 4 sec Validate state Valid Credit 62.68 Device peak FLOPS 1,605.76 GFLOPS Application version SETI@home v8 Anonymous platform (NVIDIA GPU) Peak working set size 64.48 MB Peak swap size 3,205.29 MB Peak disk usage 0.04 MB <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> Running on device number: 0 Maximum single buffer size set to:256MB SpikeFind FFT size threshold override set to:2048 TUNE: kernel 1 now has workgroup size of (64,1,4) Number of period iterations for PulseFind set to 16 OpenCL platform detected: Apple Number of OpenCL devices found : 2 BOINC assigns slot on device #1 of 2 devices. Info: BOINC provided OpenCL device ID used Build features: SETI8 Non-graphics OpenCL USE_OPENCL_INTEL OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW SSSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 CPU : Intel(R) Core(TM)2 Extreme CPU X9650 @ 3.00GHz GenuineIntel x86, Family 6 Model 23 Stepping 6 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 OpenCL-kernels filename : MultiBeam_Kernels_r3709.cl ar=0.007971 NumCfft=94343 NumGauss=0 NumPulse=24008612992 NumTriplet=36950910368 Currently allocated 313 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 OS X optimized setiathome_v8 application Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSSE3x OS X 64bit Build 3709 , Ported by : Raistmer, JDWhale, Urs Echternacht OpenCL version by Raistmer, r3709 Number of OpenCL platforms: 1 OpenCL Platform Name: Apple Number of devices: 2 Max compute units: 5 Max work group size: 1024 Max clock frequency: 1254Mhz Max memory allocation: 536870912 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: No Name: GeForce GTX 750 Ti Vendor: NVIDIA Driver version: 10.11.14 346.03.15f12 Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 Max compute units: 5 Max work group size: 1024 Max clock frequency: 1254Mhz Max memory allocation: 536870912 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: No Name: GeForce GTX 750 Ti Vendor: NVIDIA Driver version: 10.11.14 346.03.15f12 Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.007971 Used GPU device parameters are: Number of compute units: 5 Single buffer allocation size: 256MB Total device global memory: 2048MB max WG size: 1024 local mem type: Real LotOfMem path: no LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=16 Triplet: peak=11.30171, time=30.11, period=17.81, d_freq=6904594590.93, chirp=-28.995, fft_len=128 Pulse: peak=5.341069, time=45.99, period=12.35, d_freq=6904598221.83, score=1.009, chirp=-34.258, fft_len=4k Pulse: peak=7.414871, time=45.99, period=21.36, d_freq=6904602743.84, score=1, chirp=-42.308, fft_len=4k Pulse: peak=3.600743, time=45.99, period=9.037, d_freq=6904595177.28, score=1.006, chirp=-47.361, fft_len=4k Triplet: peak=13.23665, time=49.21, period=31.9, d_freq=6904602060.89, chirp=-53.529, fft_len=1024 Triplet: peak=11.6078, time=58.25, period=8.73, d_freq=6904600604.13, chirp=-59.105, fft_len=128 Pulse: peak=0.3184023, time=45.82, period=0.1789, d_freq=6904601057.76, score=1.025, chirp=-84.755, fft_len=128 Pulse: peak=3.855636, time=45.9, period=7.807, d_freq=6904602425.63, score=1.058, chirp=87.264, fft_len=2k Best spike: peak=22.21977, time=42.95, d_freq=6904600030.2, chirp=17.943, fft_len=64k Best autocorr: peak=17.60487, time=85.9, delay=4.9566, d_freq=6904600619.42, chirp=27.752, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=3.855636, time=45.9, period=7.807, d_freq=6904602425.63, score=1.058, chirp=87.264, fft_len=2k Best triplet: peak=13.23665, time=49.21, period=31.9, d_freq=6904602060.89, chirp=-53.529, fft_len=1024 Spike count: 0 Autocorr count: 0 Pulse count: 5 Triplet count: 3 Gaussian count: 0 Time cpu in use since last restart: 304.5 seconds GPU device sync requested... ...GPU device synched 02:07:25 (4231): called boinc_finish(0) </stderr_txt> ]]> ------------------------------------------------- BREAK ------------------------------------------------ I guess this is good...??? It is MUCH faster than the old CUDA75. Units seem to be finishing at 12.5 to 13.5 Min each. :-) :-D Now have 200 active in queue. One Inconclusive still listed; but, this is from the old CUDA75 App. Also, something like 84+ Units in Pending. I still have 20+ Tabs open in FF Quantum 58.0.2. :-) I'm glad to be contributing more, now. :-) TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
@TBar, A friend and I just built another Hackintosh, (for him), and it has an integrated HD630 on an Intel i7 7770 CPU. Is there an Intel OpenCL App, yet??? He could make use of it, if there is one... Thanks in advance. TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Ah, it's Right below the App you downloaded earlier; * ATi5r3710&CPU-AVX.7z (1897.05 kB - downloaded 5 times.) * ATi5r3710&CPU-AVX2.7z (2398.86 kB - downloaded 9 times.) * nVidia_r3709&CPUr3711.7z (1269.87 kB - downloaded 8 times.) * Intel_r3708&CPUr3711.7z (1269.68 kB - downloaded 3 times.) If you look above that, you will also find; The Intel iGPU App is the same as the nVidia GPU App and is completely Untested. Please Report if it is better than the App on SETI Main. If you look at earlier posts in this thread, for over a Year now, You will see that the OSX 10.11.4 Update Broke OpenCL on nVidia, I think you should remember that? Since 10.11.4 the only OpenCL build that works decently on the nVidia cards is the iGPU build. For over a Year now, the Mac nVidia App has been the Intel iGPU build...it's in this thread. Whatever... Besides that, Most of the iGPUs work Badly using the iGPU App even though the App works very well on the nVidia GPUs. In Most cases you Should Not Use the Intel iGPU as it not only produces around 50% Inconclusive tasks, it Slows down the CPU tasks by 100%, i.e. the CPU tasks take Twice as long when using the iGPU. About Half of my Inconclusive Results are from People using the Mac Intel iGPU App. If you still insist on trying the New 3708 version, I suggest you stop using it if it produces 50% Inconclusive Results. A 7770 should have AVX2, you would be much better off just running the AVX2 App on the CPU. One of the New iMac Pros switched from the Stock Mac CPU App to the AVX2 App and so far his RAC has increased by 10,000 and is still rising, it's a good thing he can't use an iGPU, https://setiathome.berkeley.edu/show_host_detail.php?hostid=8427868 |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
TBar, Sorry, I typed in too many "7's"... His CPU is a 7700, not a 7770... I assume the 7700 is still capable of AVX2 support... I've e-Mailed him Links to obtain Keka, and the AVX2 App. If he's still interested, I'll help him get set up on his Hackintosh. Thanks, TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
They need to run multiple WU’s if they can. That’s the only way I get better output from the ATI cards on the Mac. Right now I’m running roughy the same average times they are getting by running 3 at a time and my cards only have 32 compute units. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I was going to suggest using different cmdline settings as some people are getting much better results with the AMD GPUs using detailed cmdlines. The problem is, some settings work on some platforms but not on others, and can cause the App to not start or create false overflows with the wrong settings. Then there is the problem with having to reset the file permissions after editing the cmdline file when you restart BOINC. It would be best to make all the changes to the cmdline file before restarting BOINC so you only have to reset the file permissions once. This machine running the AMD Vega seems to be getting nice run-times but has a very long list of settings and it only takes One bad setting to cause problems; https://setiathome.berkeley.edu/result.php?resultid=6427152128 It's much more complicated than the CUDA Special App where about the only setting is to either use a full CPU or not. BTW, the first change I would make is to change the 'Number of period iterations for PulseFind set to 16' to 'Number of period iterations for PulseFind set to 1' and see how that worked. It's possible that One setting will result is most of the gains possible, https://setiathome.berkeley.edu/result.php?resultid=6426684584 |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Hi, It seems the Server Finally sent him another CUDA 42 task, and it seems he has updated to a suitable driver. The results are the same as the One other CUDA 42 task he has been sent, it errored out immediately with; CUFFT error in file 'cuda/cudaAcc_fft.cu' in line 37. Now, that Fermi Quadro should work with the CUDA 42 App. The CUDA 42 App works with All my CUDA cards, everything from an 8800 GT to a GTX 1060. The closest I have to a Fermi card is a GTS 250, and the GTS 250 works fine with the CUDA 42 App in every OS that supports CUDA on the GTS 250. Recently even a GTX 1070 has worked with the CUDA 42 App, https://setiathome.berkeley.edu/result.php?resultid=6399215767, So, I really don't think it's the App, that has Passed through even SETI Beta. The full stderr is here, just so it doesn't disappear again, because so far the Server has only sent the CUDA 42 task to the Quadro Twice; <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> v8 task detected setiathome_CUDA: Found 1 CUDA device(s): Device 1: Quadro 4000, 2047 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 8 pciBusID = 5, pciSlotID = 0 clockRate = 950 MHz In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: Quadro 4000 is okay SETI@home using CUDA accelerated device Quadro 4000 setiathome enhanced x41zi (baseline v8), Cuda 4.20 setiathome_v8 task detected Detected Autocorrelations as enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.010184 re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes Thread call stack limit is: 1k CUFFT error in file 'cuda/cudaAcc_fft.cu' in line 37. </stderr_txt> ]]> So, why is this Quadro failing with the CUDA 42 App when all other GPUs that I know of doesn't? The line 37 code area reads; } void cudaAcc_execute_dfts(int FftNum) { CUFFT_SAFE_CALL( (cufftExecC2C(fft_analysis_plans[FftNum][0], dev_cx_ChirpDataArray, dev_WorkData, CUFFT_INVERSE)) ); } |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
[Update:] Been crunching with the new OpenCL App since the 18th. :-) :-D <<------- RAC is CLIMBING!!! :-) Have broken 4K in 5 Days and a few hours. (Still ONLY crunching from 6 PM to 9 AM - PST.) Things are looking MUCH brighter, now. :-) Thanks TBar!!! :-) TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Hey Chris, CreditFew is going to give you a hard time with that setup. Since CF works from an App APR average it gives the faster card fewer credits. You would think it would give the slower cards more credit, but what usually happens is the slower GPUs get the same as usual and the faster gets less. I don't know if the Tahiti would work in the Mac, but that would probably be better due to the AMD card using a different App with a different APR. Of course, it would be better with Two 1080 Ti by themselves. Anyway, lets see how far it will go. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Hey Chris, Yeah, was noticing that going on. I was half hoping the lower APR due to the 2 750ti's in there would give me higher credit on to 1080 ti’s work. I also thought about getting an external thunderbolt enclosure for the 2013 Mac Pro but it is on High Sierra (required for some of my software) and the last time I tried the special app on high Sierra it barfed... I’ll see how this goes and see if I’m “forced†to get another 1080ti in there.=). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.