Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 40 · 41 · 42 · 43 · 44 · 45 · 46 . . . 58 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Eric deployed new OpenCL builds for OS X on beta: Mac OS X/64-bit Intel 8.10 (opencl_ati_mac) 7 Apr 2016, 1:01:54 UTC 17 GigaFLOPS Mac OS X/64-bit Intel 8.10 (opencl_nvidia_SoG_mac) 7 Apr 2016, 1:01:54 UTC 2 GigaFLOPS Mac OS X/64-bit Intel 8.11 (cuda42_mac) 6 Aug 2016, 4:12:08 UTC 33 GigaFLOPS Mac OS X/64-bit Intel 8.11 (cuda75_mac) 6 Aug 2016, 4:12:08 UTC 44 GigaFLOPS Mac OS X/64-bit Intel 8.19 (opencl_ati5_mac) 8 Nov 2016, 23:03:25 UTC 0 GigaFLOPS Mac OS X/64-bit Intel 8.19 (opencl_intel_gpu_sah) 8 Nov 2016, 23:03:25 UTC 0 GigaFLOPS Mac OS X/64-bit Intel 8.19 (opencl_nvidia_mac) 8 Nov 2016, 23:03:25 UTC 0 GigaFLOPS Mac OS X/64-bit Intel 8.20 (opencl_ati5_SoG_mac) 8 Nov 2016, 23:03:25 UTC 6 GigaFLOPS But I'm not sure if older ones (8.10) should remain or need to be deprecated. Any comments on that? BTW, anticipated date of CUDA75/42 release is this week. Fingers crossed... SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The new Apps are experiencing Download errors. I was able to supply the needed DL myself to get the SoG App to work on my machine. It seems All the Apps are having problems with at least one needed file, and, I was never sent the nVidia App at all. If the 8.19 (opencl_ati5_mac) & 8.19 (opencl_nvidia_mac) Apps work with Darwin 11.4.2 there shouldn't be any need for the older Apps. There is a major problem with the CUDA App as people are Not using the correct CUDA Drivers. It's pretty simple, just go to the Dock, Open System Preferences, then use the CUDA Preference Pane to update to the latest CUDA Driver. It Will Not Work using the Out Dated Driver with a Newer version of OSX. The people using the correct Drivers aren't having any trouble. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
The new Apps are experiencing Download errors. I was able to supply the needed DL myself ... Please identify exactly which file is needed but not being downloaded: look at the <app_version> file declarations to identify what the problem is: and tell Eric your findings so that he can correct the problem at source and enable others to test the app as intended. Simply posting here without analysis isn't going to get anything solved. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It seems Beta is down. The missing file(s) show up in the Tasks labeled as Errors. For setiathome_8.20_x86_64-apple-darwin__opencl_ati5_SoG_mac the missing file is setiathome-8.20-opencl_ati5_SoG_mac_darwin_README_OPENCL The setiathome_8.19_x86_64-apple-darwin__opencl_ati5_mac App is missing MultiBeam_Kernels_r3552.cl The MBv8_8.19r3553_Intel_ssse3_x86_64-apple-darwin App is missing MultiBeam_Kernels_r3553.cl. I couldn't convince the Server to send MBv8_8.19r3551_NV_ssse3_x86_64-apple-darwin. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
If the 8.19 (opencl_ati5_mac) & 8.19 (opencl_nvidia_mac) Apps work with Darwin 11.4.2 there shouldn't be any need for the older Apps. I see. So before any deprecation we should ensure new build works on Darwin 11.4.2. I'll mail Eric about download issues. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
That's why NV OpenCL build still needed for OS X hosts even being slower. Rather we could just lose that computing power at all. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
So far it seems 5 out of 6 Apps at Beta are working well. The only exception being the notorious Intel iGPU on an iGPU. The same App works quite well on a nVidia GPU. Even the people that can't seem to update the CUDA driver are finding success with the OpenCL App, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=69829. The nVidia GPUs are working nicely with the iGPU OpenCL App, check these times, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=80723&offset=60 The ATI/AMD Apps are working well, the nVidia App is working well, it looks promising. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Prompted by Petri's queries about efficiency/GFlops, I requested some numbers from Eric which he graciously supplied. That's to try to make sense of some missing fpops. It would help to backtrack/correlate with some observations. Would any or all of the following statements about stock CPU apps ring (fully or partially) true? - The 8.00 and 8.05 (windows x86, & Linux both bittages) appear 'reasonably good', - The 8.03 (darwin/OSX intel) app is half to two-thirds the performance it 'should be' - Out of those apps, the x86_64-linux build seems considerably more efficient than the others (for whatever reasons) - the PPC (8.03) & Arm(8.02) builds aren't particularly slow for the devices they are running on (compared to what these devices are capable of anyway) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
JLDun Send message Joined: 21 Apr 06 Posts: 573 Credit: 196,101 RAC: 0 |
{snip}& Arm(8.02) builds aren't particularly slow for the devices they are running on (compared to what these devices are capable of anyway) I've "gathered", based on personal experience, the ARM app will run for xx hours, and CPU Run Time will be xx-≤1 Hour if the device is in use if there's not a lot of restarts. Don't know about slow, but not especially inefficient. As for xx, the WU's lately have been in the 25-35 hour range (at most) lately, where an Android/x86_64 will run under 13 hours; CPU seems to be a big influence in this case. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
{snip}& Arm(8.02) builds aren't particularly slow for the devices they are running on (compared to what these devices are capable of anyway) Good to know thanks. Never had much luck with the Android variants myself. If you have fairly steady APRs for these apps, would you say the ARM app GFlops is roughly 50% of Boinc Whetstone, while the x86_64 variant <20% of its Boinc Whetstone, despite being noticeably more efficient ? [Or other way around perhaps ?] [Edit:] looking at some of yours seems to suggest other way around indeed, with APRs of 2*Boinc_Whetstone [for the Arm], If I'm looking right. Will have to look for where APR is calculated (i.e. that's not supposed to happen) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
JLDun Send message Joined: 21 Apr 06 Posts: 573 Credit: 196,101 RAC: 0 |
Not technically minded, so 'lost in the lingo', but... to point out some specific details: Current Android Handset: Host 8100533. What I'm currently using as a phone. Android/x86_64 Host 8038053. Regardless of usage, (Run Time)-(CPU Usage) is (almost always) under 60 minutes. [Edit] Mixed use; used to be my phone handset Host 7915058. (Sorry. Had problems editing. I've been sending feedback to Google/Chrome about it.) The other two Android entries USED to be phone handsets, but I 'ignore' them now. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Not technically minded, so 'lost in the lingo', but... to point out some specific details: Yeah, very strange numbers. Will see what TBar says on the Mac side of things. your phone: Measured floating point speed 803.53 million ops/sec (~0.8 GFlops) Application details Average Processing rates ( ~1.5-1.6 GFlops ) I'm wondering if they took the neon/vfp Boinc whetstone code out of the arm/android clients (quite possible) [Edit:] Looks like no vectorised form of the whetstone was completed. Though the android client code comments imply vfp/neon, there isnt any vfp/neon code in it. 'Just means the Gflops numbers will be úpside down' like the Intel ones. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I'm not sure what you're looking for. From my experience the current stock CPU App works differently on different CPUs. On the Older CPUs, such as mine, it can be almost half as fast as the optimized App compiled from the AKv8 folder. Seems to be a lot of variance with different CPUs. Mine says 3633.84 million ops/sec and the App section says 25.68 GFLOPS using the SSE41 App. A machine running stock with CPU W3580 @ 3.33GHz is showing 3879.41 million ops/sec and 12.58 GFLOPS. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I'm not sure what you're looking for. From my experience the current stock CPU App works differently on different CPUs. On the Older CPUs, such as mine, it can be almost half as fast as the optimized App compiled from the AKv8 folder. Seems to be a lot of variance with different CPUs. Mine says 3633.84 million ops/sec and the App section says 25.68 GFLOPS using the SSE41 App. A machine running stock with CPU W3580 @ 3.33GHz is showing 3879.41 million ops/sec and 12.58 GFLOPS. Exactly that ratio thanks. ~3.9 'Device Peak GFlops' (supposed), running with APR 12.58 GFlops Actual. (A discrepancy of >3x, which more or less fills a missing piece of a 3 year old puzzle. 2x in the ARM case.) Many of the questions I had are now fairly moot, as probing since I posted turned up some things. I've been able to verify the APR higher number is the 'truer' value from the source (php of the application details page, and the 'Device Peak Flops'derived from Boinc Whetstone (CPU) and fudge factors. It has ramifications for scheduling new apps or hosts coming online that have been problematic in specific situations, and I've passed on a suggestion to Eric and Richard, should either consider it worthwhile looking at any deeper. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
It has ramifications for scheduling new apps or hosts coming online that have been problematic in specific situations, and I've passed on a suggestion to Eric and Richard, should either consider it worthwhile looking at any deeper. Ack receipt of that email, but it's going to take a while (and much coffee) to get my head back to where we were two years ago. We're only going to get one chance at this, so let's make sure we get it right first time (and get it right for all the other projects, too). |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
It has ramifications for scheduling new apps or hosts coming online that have been problematic in specific situations, and I've passed on a suggestion to Eric and Richard, should either consider it worthwhile looking at any deeper. Yeah, slow, steady and carefully acknowledged. It isn't about credit here'. This is correctness of estimates first (which controls the whole scheduling chain). A little odd the right numbers seem to be there for visual display and not propagated to function. There has been a schism, most likely at a similar point where past attempts stopped. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
It has ramifications for scheduling new apps or hosts coming online that have been problematic in specific situations, and I've passed on a suggestion to Eric and Richard, should either consider it worthwhile looking at any deeper. In broad terms, I've not seen any problem with runtime estimates, once the two separate onramp stages have been successfully negotiated (the initial conditions stages are complete cobblers, of course). And that's paying fairly close attention to runtime estimates, both under Anonymous Platform here, and under stock running at other projects - except at projects where, despite protestations, the administrators have acknowledged that "our automated work submission tools" are incapable of adjusting rsc_fpops_est to the {known a priori, deterministic} task performance. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yeah, you won't see the problem since normalisation fixes that. The discrepancy is purely the two different GFlops Numbers [In Plain sight]. One connected to what you see, and the other connected to the actual backend drive scheduling. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Yeah, you won't see the problem since normalisation fixes that. The discrepancy is purely the two different GFlops Numbers [In Plain sight]. One connected to what you see, and the other connected to the actual backend drive scheduling. OK, I'll finish lunch and head downstairs to code-walk the line numbers in your email. That may take some time... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yeah, you won't see the problem since normalisation fixes that. The discrepancy is purely the two different GFlops Numbers [In Plain sight]. One connected to what you see, and the other connected to the actual backend drive scheduling. If you can explain two different GFlops estimates for the same device as anything better than "WTF', then I will owe you even more respect than I already grant you. If you can explain to me why we should deliberately underestimate by a factor of four or more, then that's bonus points. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.