Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 58 · Next
Author | Message |
---|---|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yeah I'm seeing <2% inconclusive to pending ratio here on the Windows host, so it bodes well for Project health and app accuracies across the board. Personal v8 design goal was better than 5%. Can't see any reason it shouldn't hold with Mac and Linux too. Now with the replica database all caught up, wheels can start turning again. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Any more Success stories with the Mac CUDA App? Judging from My GTS250 results with cuda42, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71141 and the results from the GT 650M with El Capitan, http://setiathome.berkeley.edu/results.php?hostid=7366840&offset=300 the GT 650M should be up to Twice as Fast and the GT 750M up to Four times as Fast with the cuda65 App. Depending on your Mobile GPU you could see similar results. It also appears the AVX CPU App is Faster than the Stock CPU App on the i7-3635QM CPU @ 2.40GHz , and the CPU sse41 App is certainly Faster than the stock App on My Xeon E5472 @ 3.00GHz. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Been dealing with some unrelated issues here (acquaintance' funeral on short notice). The project holds up this week and I see no reason some builds wouldn;t go to beta (depending on Eric's business, and any unexpected things that might crop up before Tuesday) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Another Success story. A GT 750M went from; Run time: 1 hours 1 min 54 sec CPU time: 4 min 27 sec ar=2.585027 To; SETI@home v8 Multibeam Cuda 6.50 Run time: 16 min 45 sec CPU time: 3 min 9 sec ar=2.726943 The CPU also enjoyed a triple digit percentage increase in performance. I just posted an Update to the AVX Apps, http://www.arkayn.us/forum/index.php?topic=191.msg4369#msg4369 Anyone using the older versions might want to try the newer versions and see if they are any better. |
Gianfranco Lizzio Send message Joined: 5 May 99 Posts: 39 Credit: 28,049,113 RAC: 87 |
Anyone using the older versions might want to try the newer versions and see if they are any better. CPU Intel(R) Core(TM) i7-4770K @3.70GHz (running 8 instances of SETI= AVX Build 3352 Vs AVX Build 3366 Build 3352 Run time: 1 h 54 min 52 sec CPU time: 1 h 48 min 7 sec VLAR=0.010316 Build 3366 Run time: 1 h 45 min 40 sec CPU time: 1 h 40 min 10 sec VLAR=0.010306 Build 3366 is 8,6% faster! I don't want to believe, I want to know! |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Yup, seeing between 8-11% boost depending on the AR as well. Anyone using the older versions might want to try the newer versions and see if they are any better. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
TBar - The opencl_ati_mac app currently being tested on beta finally works properly on a HD4XXX without having to add -no_caching to the command line file. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, it's nice to hear that problem is finally fixed. It would be even nicer to see the problem with the Mac nVidia Laptops fixed as there is an even easier solution already available. This is a typical nVidia Laptop, http://setiathome.berkeley.edu/results.php?hostid=7601028&state=3. The problem is a good number just don't work very well with the OpenCL App, especially in El Capitan although the problem has existed since even Mavericks for some models. The solution is simple, the CUDA Apps work very well on these Laptops and not only solves the Inconclusive problem but increases the performance to 'nearly normal'. Here is an example, a GT775M should Not take 2 hours 4 min 32 sec for a task with an AR of 0.44, http://setiathome.berkeley.edu/result.php?resultid=4701934370. The task should take well less than 30 minutes on that GPU. Here's another example, shorties should take around 18 minutes, 0.44 ARs should take around 30 minutes; http://setiathome.berkeley.edu/results.php?hostid=7413462&state=3 http://setiathome.berkeley.edu/results.php?hostid=6956650&state=3 |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
The opencl_ati_mac app currently being tested on beta finally works properly on a HD4XXX without having to add -no_caching to the command line file.Tom, could you try also on your ATI Radeon HD 4670 ? Need to be convinced that the current beta 8.06 works on that lower class GPU, too. Other testers at beta with HD 4670 seem to have problems to finish work units with valid results. Maybe there is some other problem ... _\|/_ U r s |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
The opencl_ati_mac app currently being tested on beta finally works properly on a HD4XXX without having to add -no_caching to the command line file.Tom, I will. That GPU does struggle since it only has 256MB of VRAM, but it completed a number of tasks using TBar's builds. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
TBar - I've been running your ATI OpenCL app (MBv8_8.4r3323_clGPU_ssse3_x86_64-apple-darwin) for a while on my 27" iMac with the ATI Radeon HB 4850 512MB. It works well. It finishes a WU in about the same time as a CPU task using your SSE4.1 app (MBv8_8.05r3344_sse41_x86_64-apple-darwin), so it is like having an extra 1/2 core in my machine. I can run 9 WUs at a time instead of 8. For an individual WU, the GPU app run time is a little over twice as long as the reported CPU time, and it doesn't seem to matter if I reserve a core for the GPU process or not. I get the same result either way. I typically run 8 CPU processes and the one GPU process. Is there anything I can try to improve the GPU run time? My mb_cmdline_mac_OpenCL_sah.txt has the following in it: -sbs 64 -oclfft_tune_gr 128 -oclfft_tune_wg 64 -period_iterations_num 64 -no_caching I'm not sure what each parameter does, so I don't know what to change to try to get a better result. Thanks. - Tom |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
You might be able to receive better results by doubling the 3 main settings. I'm not sure if those higher settings will work on the 4850, so, it would be best if you suspended all but 1 GPU task in case it fails it will only fail on 1 task. First try; -sbs 64 -oclfft_tune_gr 256 -oclfft_tune_wg 64 -period_iterations_num 64 -no_caching If that works, then try increasing the other 2 to -sbs 128 & -oclfft_tune_wg 128. Those settings will probably be the best for that GPU if it will accept them. In other news, the CUDA SuperCode has been added to the Repository, https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha/PetriR_raw Those who have the ability to use such things will know what to do... |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
We (I) just need a windows version of his app and all will be well in my little kingdom of crunchers lol. You might be able to receive better results by doubling the 3 main settings. I'm not sure if those higher settings will work on the 4850, so, it would be best if you suspended all but 1 GPU task in case it fails it will only fail on 1 task. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Patience :) Xbranch worked out by playing the 'long game' (spacesuited tortoise on my website graphic isn't there by accident, lol). [Straight Build has a lot of Caveats/issues to iron out for a widescale release] Integration of that, and unspecified other stuff, is there as a proving ground for some new technologies & techniques. v8 transition dust settles (without servers blowing up every week), then you'll get a roadmap :) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Oh I know, I see all the inconclusives it makes, I know y'all wanna get that sorted out before it hits beta even. In the mean time I'm keeping myself occupied by trying to eek out max performance from my new machine. lol Chris Patience :) Xbranch worked out by playing the 'long game' (spacesuited tortoise on my website graphic isn't there by accident, lol). |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yeah, I get as excited as the next person to see the 980 tear it up, lol, and there's a lot more to come that's been tried, and some not even tried yet. I know Petri knows what's going on and continues working on things too :D. I think the next big cheek clench will be as GBT/Breakthrough data starts to flow, then we get to find out if the v8 apps even hold up (let alone the servers) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Once multibeam has proven itself with the new data, will astropulse get the same treatment to work with the new data sources or is it going to be relegated to aricebo data? Mainly hoping one day there will be copious amounts of work for my ATI cards since they shine best on those wu's lol. Chris |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
*probably* will eventually, though in the world of science funding making presumptions can be dangerous, unless you live in Germany where the Chancellor is a Physics doctorate so knows the deal. At the very least I'd be throwing similar precision refinements as I make Cuda support for AP, which can trickle back to other applications. I suppose a lot will depend on the nature of the se other telescope searches though, of which I have no knowledge other than 'bigger data' "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
You might be able to receive better results by doubling the 3 main settings. I'm not sure if those higher settings will work on the 4850, so, it would be best if you suspended all but 1 GPU task in case it fails it will only fail on 1 task. I tried both: -sbs 64 -oclfft_tune_gr 256 -oclfft_tune_wg 64 -period_iterations_num 64 -no_caching and -sbs 128 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -period_iterations_num 64 -no_caching http://setiathome.berkeley.edu/result.php?resultid=4708804590 They don't seem to make a difference. The run time is still about twice the CPU time. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Hmmm, looks as though it's maxed out. You could lower the Pulsefind but that may introduce ScreenLag. Usually lower numbers save a few seconds; -sbs 128 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -period_iterations_num 32 -no_caching |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.