Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 36 · 37 · 38 · 39 · 40 · 41 · 42 . . . 58 · Next
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Running the Baseline App 2 at a time... Now that's what I call S-L-O-W. I just clear the Active GPU tasks from client_state.xml every time I restart, it takes about 15 seconds. Another method would be to just Suspend all the non-running GPU tasks when you want to Stop, once the GPU tasks finish, quit. Or best yet, just Don't stop crunching. If you want to run the Much slower App, fine. If you want to remind yourself you made the right decision, just check these numbers every once in a while. It's a similar machine running similar cards, http://setiathome.berkeley.edu/results.php?hostid=7942417&offset=340 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
That is, they run for a couple seconds, their Estimated Times may still have say 10 Min of crunching time left; BUT the Units "Finish" at the point of Resuming. In viewing Tasks on the Web, these Units immediately show up in Inconclusives. This means that checkpointing mechanism is broken in that build. If app targets for high-end GPUs maybe it's OK to go w/o checkpoint at all. But in this case better to state this directly by state.sah write/read omission and clear exit with error state (computational error) in case app detects resume attempt. Reporting invalid result is worst way. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It appears to be broken in the current builds as well. Any quick and easy fixes you can offer would be appreciated. I think Petri's busy with the AutoCorr problem right now, but, it would be nice to have a working checkpoint. The Older x41p_zi App has basically been surpassed by the newer zi3 versions. The Older version is the best version of the Mac Special App available right now though, and it is listed as 'available for testing'. It also says, "See the Notes in the docs folder...". Note 5) Restarted tasks could produce Incorrect Results. That's about as much as I can do. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Any quick and easy fixes you can offer would be appreciated. I'm afraid there is no quick and easy one cause any fix would imply diving into new code and all changes where made. Not that I could afford right now being in depths of new TwichChirp OpenCL path. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
It appears to be broken in the current builds as well. Any quick and easy fixes you can offer would be appreciated. I think Petri's busy with the AutoCorr problem right now, but, it would be nice to have a working checkpoint. The Older x41p_zi App has basically been surpassed by the newer zi3 versions. The Older version is the best version of the Mac Special App available right now though, and it is listed as 'available for testing'. It also says, "See the Notes in the docs folder...". Well, I didn't seem to see this issue in the "Regular" CUDA75 App. Both the SETI Beta Testing App, and the Cruncher's Anonymous App. for use here at SETI Main seem to be working fine on my Hackintosh. The system can be Suspended and Resumed at any point and time, and no issues occur for me. TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I thought there are some peoples interesting in fastest possible CUDA binaries deployment on main for OS X :/ http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2334 SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Yep, the last time I looked at the Mac Hosts at Beta all the Hosts idled by the block on Darwin 15.x are just sitting there doing nothing. They are either not aware they could be testing the new Apps or they don't want to be like everyone else and install a nVidia driver to run CUDA on their Mac. It won't hurt people, everyone running Windows and Linux also have to install a Driver to run SETI work. Just install the latest driver and update it when you update the OS, http://www.nvidia.com/object/mac-driver-archive.html If you're running an older card in Mountain Lion or Lion install this one, http://www.nvidia.com/object/macosx-cuda-5.5.47-driver.html I built another OpenCL App earlier, it's not any better than the last one in Darwin 15.6; Running on TomsMacPro.local at Thu Sep 15 13:57:29 2016 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : reference_work_unit_r3215.wu sniff.wu Listing executable(s) in /APPS : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: reference_work_unit_r3215.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 2110 seconds --------------------------------------------------- Running app with command : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -device 2 Elapsed Time : ……………………………… 418 seconds Speed compared to default : 504 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 9 11 13 0 0 9 11 13 0 Autocorr 0 1 1 1 0 0 1 1 1 0 Gaussian 0 0 0 1 5 0 0 0 1 5 Pulse 0 0 0 0 0 0 0 0 0 2 Triplet 0 1 1 2 0 0 1 1 2 1 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 0 0 0 0 1 0 0 0 0 1 Best Pulse 0 0 0 0 1 0 0 0 0 1 Best Triplet 0 1 1 1 0 0 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 0 14 16 20 7 0 14 16 20 10 Unmatched signal(s) in R1 at line(s) 499 526 580 607 634 694 720 Unmatched signal(s) in R2 at line(s) 482 509 526 569 595 649 676 703 763 789 For R1:R2 matched signals only, Q= 7.885% Result : Weakly similar. --------------------------------------------------- Done with reference_work_unit_r3215.wu. Current WU: sniff.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 199 seconds --------------------------------------------------- Running app with command : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -device 2 Elapsed Time : ……………………………… 25 seconds Speed compared to default : 796 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 2 5 10 1 0 2 5 10 0 Autocorr 0 1 1 2 0 0 1 1 2 0 Gaussian 0 0 0 7 4 0 0 0 7 4 Pulse 0 1 1 1 2 0 1 1 1 2 Triplet 2 2 2 2 0 2 2 2 2 0 Best Spike 0 0 1 1 0 0 0 1 1 0 Best Autocorr 0 0 0 1 0 0 0 0 1 0 Best Gaussian 0 0 0 0 1 0 0 0 0 1 Best Pulse 0 0 0 0 1 0 0 0 0 1 Best Triplet 1 1 1 1 0 1 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 3 7 11 25 9 3 7 11 25 8 Unmatched signal(s) in R1 at line(s) 554 613 738 765 792 808 834 894 920 Unmatched signal(s) in R2 at line(s) 586 695 738 765 792 818 878 904 For R1:R2 matched signals only, Q= ???? Result : Weakly similar. --------------------------------------------------- Bad juju going on there with MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin The CUDA App is looking much better, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959 |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Yep, the last time I looked at the Mac Hosts at Beta all the Hosts idled by the block on Darwin 15.x are just sitting there doing nothing. They are either not aware they could be testing the new Apps or they don't want to be like everyone else and install a nVidia driver to run CUDA on their Mac. It won't hurt people, everyone running Windows and Linux also have to install a Driver to run SETI work. Just install the latest driver and update it when you update the OS, http://www.nvidia.com/object/mac-driver-archive.html If you're running an older card in Mountain Lion or Lion install this one, http://www.nvidia.com/object/macosx-cuda-5.5.47-driver.html Hi, Should anything show with zi3i as a no go / stop working .. then ... How about, for MAC, rigging the old and reliable zi with unroll? I need your cudaAcceleration.cu and cudaAcceleration.h plus cudaAcc_pulsefind.cu and confsettings.cpp in addition to main.cpp With those files (in a zip to my email) I'll return a zi+ version to test with MAC. (I hope you're not too tired yet to testing, testing, testing, ... "Is this thing even on", testing, ..., "now it works. So ..") That would give a nice guppi speed boost and hopefully maintain usability, I guess. Should this give you any kind of a ahead ache I will not mention it again :) To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
ok, getting ready for the weekend run, the Mac Pro here is on no new tasks for main, and project reset on beta to see what it gets there. Probably will get a bit too toasty if the Radeon kicks on as well, and confuse the Cuda testing, so will exclude it a bit later "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
...For R1:R2 matched signals only, Q= ???? I'll send you the RAW folder from r3470. It's the last sah_v7_opt.zip I have that still has the PetriR_raw folder. The problems with x41p_zi is it fails on most resumed tasks and doesn't have the Device Selection fix. Otherwise, it works on my Mac the same way it works on this Mac, http://setiathome.berkeley.edu/results.php?hostid=7942417&offset=340 Right now the x41p_zi3 Apps fail to work correctly with my 750Ti cards with the current 7.5.30 driver. Using the Beta 8.0.29 driver appears to work in Darwin 15.4, but, I'm still getting about twice as many Inconclusives with x41p_zi3i as with x41p_zi. Of course, the GPUs older than Compute Capability 3.2 will still have to use the Baseline Apps currently at Beta. Most of the Mac nVidia GPUs are in Laptops & iMacs and have Compute Capability 3.0, so, they can only use the Baseline Apps. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp. Second, My Linux machine is GTX 680 ( Kepler class but only compute capability 3.0). I'll be pretty determined to refactor the 1 or 2 compute capability 3.2+ demanding kernels in alpha early, since dividing support in the middle of of a major compute capability is bound to cause issues in the same Way Boinc does, by changing behaviour in the middle of a major version. Probably at least the second issue will remain an issue until I can take the time to inject the proper preprocessor controls in the .cu files, however that's probably a comparatively minor issue compared to the major ones likely to crop up soon. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
@TBar Hi, I'm working on the 'zi plus unroll' right now. Some adjustment needed to get the unroll going. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp. Thanks, will do. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp. Also, the older x41p_zi needs the Blocking Sync. I built a version last week with the Older BS changes, but it was producing infrequent Overflows with 30 Gaussians. I built another without the BS on the Gaussian line, but haven't had a chance to test it. I suppose I could try it now even though about all I've got are GUPPIs. It would be nice to have a few Arecibo tasks to breakup all these GUPPIs. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp. Yeah the blocking sync things behave (slightly) differently on each of the 3 platforms afaict. That's, logically, probably an artefact of artefact of OS+driver differences, so where present will probably be imperfect until down the road. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I built another OpenCL App earlier, it's not any better than the last one in Darwin 15.6; Here's the same MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin running in Darwin 14.5; Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : reference_work_unit_r3215.wu sniff.wu Listing executable(s) in /APPS : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin setiathome_8.11_x86_64-apple-darwin__cuda75_mac setiathome_x41p_zi_x86_64-apple-darwin_cuda75 Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: reference_work_unit_r3215.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 2198 seconds --------------------------------------------------- Running app with command : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -device 2 248.18 real 57.34 user 85.84 sys Elapsed Time : ……………………………… 248 seconds Speed compared to default : 886 % ----------------- Comparing results Result : Strongly similar, Q= 99.49% --------------------------------------------------- Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac -device 2 224.89 real 42.92 user 31.60 sys Elapsed Time : ……………………………… 225 seconds Speed compared to default : 976 % ----------------- Comparing results Result : Strongly similar, Q= 99.82% --------------------------------------------------- Running app with command : setiathome_x41p_zi_x86_64-apple-darwin_cuda75 -device 2 115.73 real 32.38 user 9.21 sys Elapsed Time : ……………………………… 116 seconds Speed compared to default : 1894 % ----------------- Comparing results Result : Strongly similar, Q= 99.81% --------------------------------------------------- Done with reference_work_unit_r3215.wu. Result : Strongly similar, Q= 99.49% Quite a bit different than in Darwin 15.6. The x41p_zi in that test is the newer build with the blocking sync. The setiathome_8.11_x86_64-apple-darwin__cuda75_mac in the test is the App at Beta. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp. I'll add blocking sync -bs flag and the device seletion thingy too. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The last x41p_zi build is working very well. Since starting it around 1500 UTC I haven't received a single Inconclusive running it in Darwin 14.5 with driver 7.5.27. No problems with the 750Ti, or anything else. I used this for the blocking sync; cudaEventCreateWithFlags(&chirpDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&fftDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&summaxDoneEvent, cudaEventDisableTiming|cudaEventBlockingSync); cudaEventCreateWithFlags(&powerspectrumDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&autocorrelationDoneEvent, cudaEventDisableTiming|cudaEventBlockingSync); cudaEventCreateWithFlags(&autocorrelationRepackDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&ac_reduce_partialEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&tripletsDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&tripletsDoneEvent1, cudaEventDisableTiming|cudaEventBlockingSync); cudaEventCreateWithFlags(&pulseDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&pulseDoneEvent1, cudaEventDisableTiming|cudaEventBlockingSync); cudaEventCreateWithFlags(&gaussDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&gaussDoneEvent2, cudaEventDisableTiming|cudaEventBlockingSync); As usual, I commented out the last cudaDeviceReset to keep from getting the SIGBUS errors; #if(CUDART_VERSION >= 4000) This is much better than what happens with the 750Ti running x41p_zi3i with driver 7.5.x; http://setiathome.berkeley.edu/results.php?hostid=6796479&state=5 5156682094 2264940596 14 Sep 2016, 18:51:33 UTC 15 Sep 2016, 14:59:46 UTC Completed, marked as invalid 40.43 13.49 0.00 SETI@home v8 Anonymous platform (NVIDIA GPU) 5156675797 2264937406 14 Sep 2016, 18:46:25 UTC 15 Sep 2016, 14:59:46 UTC Completed, marked as invalid 10.30 3.82 0.00 SETI@home v8 Anonymous platform (NVIDIA GPU) 5156634063 2264917766 14 Sep 2016, 18:15:14 UTC 15 Sep 2016, 5:19:06 UTC Completed, marked as invalid 234.82 167.19 0.00 SETI@home v8 Anonymous platform (NVIDIA GPU) 5156620485 2264911671 14 Sep 2016, 18:04:57 UTC 15 Sep 2016, 5:05:30 UTC Completed, marked as invalid 104.69 76.00 0.00 SETI@home v8 Anonymous platform (NVIDIA GPU) 5156600475 2264902185 14 Sep 2016, 17:49:27 UTC 15 Sep 2016, 14:41:07 UTC Completed, marked as invalid 289.13 89.06 0.00 SETI@home v8 Anonymous platform (NVIDIA GPU) Hopefully the unroll feature will result in the x41p_zi being as fast as x41p_zi3i with the GUPPIs. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
@TBar I'got now a running zi+ with -unroll and -bs Preliminary test show that it is working at least on my 1080 linux system. The newer zi3 is a bit faster, but I'll let you to run the MAC 750Ti tests. Then we will know if the -unroll and -bs do work as intended. Now I'll put the device selection in to it. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
@TBar you've got email. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.