Linux CUDA 'Special' App finally available, featuring Low CPU use

Author	Message
Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1856887 - Posted: 20 Mar 2017, 19:08:39 UTC - in response to Message 1856864. . . I am not privvy to what happens within Nvidia Corp or the manufacturing companies but I feel confident that they have access to much lower level routines to effect fan and clock control than a user level interface such as xserver. But I found the documentation ironic and that amused me, if you are unable to appreciate the irony in that I am sorry. You have been very helpful and it was never my intention to offend you. Just curious. Since the Coolbits options are not included in the Public release of nVidia Settings, it would appear they are not intended for the general public. So, who do you think they where intended for? Now that the Vendors are releasing Software that uses the built-in nVidia tweaks, the choice nVidia has is to either make them available to those that want them, or see those people use someone else's software not under Nvidia's control. It's somewhat similar to an Automobile that can do well over 100mph. You will almost never be able to reach over 100mph as in most cases you will be penalized for trying. You can play around with Coolbits all you wish, but if you go to extremes and damage the hardware, there will be penalties. Nothing unusual there. I just leave the BOINC Manager on my desktop, sometimes with NVIDIA-SMI displayed. It works for most people. . . Well I guess then it is safe to say, I am not most people, or perhaps I am? Stephen .. ID: 1856887 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1856939 - Posted: 20 Mar 2017, 23:10:49 UTC It actually wouldn't be that hard to script a fan control. SMI can output a steady stream of temp #s from it's options. One could average the last 10 readings and increase/decrease fan speed 2% (or proportional to the difference) to keep them at a desired temp (the same as any software company does). Sure it would work, as long as it is bullet proof coding. ID: 1856939 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1856946 - Posted: 21 Mar 2017, 0:14:04 UTC - in response to Message 1856939. It actually wouldn't be that hard to script a fan control. SMI can output a steady stream of temp #s from it's options. One could average the last 10 readings and increase/decrease fan speed 2% (or proportional to the difference) to keep them at a desired temp (the same as any software company does). Sure it would work, as long as it is bullet proof coding. . . Then I guess that leaves an amateur like me out of the running Stephen :) ID: 1856946 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1858230 - Posted: 29 Mar 2017, 12:32:18 UTC . . Hi guys, . . For what it is worth I seem to be running at about 8% to 9% inconclusives, but still zero invalids. I have corrected the PCIe config on the Pentium and the runtimes are now even across the two GPUs and regular. Stephen :) ID: 1858230 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1858609 - Posted: 31 Mar 2017, 15:58:11 UTC Petri, Are you seeing a 40% increase in performance with the 1080 Ti going from 20 to 28 CU's? I'm curious, been window shopping :) ID: 1858609 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1858900 - Posted: 1 Apr 2017, 9:27:08 UTC - in response to Message 1858609. Hi, nearly. Run time with 1080 is 140+ seconds and with ti it is 107 seconds. That is for vlar. See my results. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1858900 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1858906 - Posted: 1 Apr 2017, 10:20:32 UTC - in response to Message 1858900. Hi, nearly. Run time with 1080 is 140+ seconds and with ti it is 107 seconds. That is for vlar. See my results. Once we get to 12 seconds then we're obsolete, since that is the task observation time, and it might be cheaper to crowdfund 1080tis to Berkeley for realtime analysis, for Arecibo/multibeam anyway. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1858906 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1858992 - Posted: 1 Apr 2017, 19:26:40 UTC - in response to Message 1858906. Last modified: 1 Apr 2017, 20:12:31 UTC Hi, nearly. Run time with 1080 is 140+ seconds and with ti it is 107 seconds. That is for vlar. See my results. Once we get to 12 seconds then we're obsolete, since that is the task observation time, and it might be cheaper to crowdfund 1080tis to Berkeley for realtime analysis, for Arecibo/multibeam anyway. The ~~12 second~~ 1.4. seconds version is running on my test host. It uses 4 Ti's simultaneously for one task. Three for long pulse finds and one for all the rest (Gauss, Triplet, Autocorrelations and Spikes). For shorties it balances the load differently: 2 for pulsefind, 2 for all the rest. CPU is used for chirping and it uses AVX2. EDIT: got the run time wrong. EDI2: makes mee feel like a fool. I meant 4.1. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1858992 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1859010 - Posted: 1 Apr 2017, 20:36:28 UTC - in response to Message 1858609. Petri, Are you seeing a 40% increase in performance with the 1080 Ti going from 20 to 28 CU's? I'm curious, been window shopping :) Hi again, The performance scales quite well. The Wattage does not do nearly as good. See image .. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1859010 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1859013 - Posted: 1 Apr 2017, 20:49:19 UTC - in response to Message 1858992. The 12 second 1.4. seconds version is running on my test host. It uses 4 Ti's simultaneously for one task. Three for long pulse finds and one for all the rest (Gauss, Triplet, Autocorrelations and Spikes). For shorties it balances the load differently: 2 for pulsefind, 2 for all the rest. CPU is used for chirping and it uses AVX2. EDIT: got the run time wrong. EDI2: makes mee feel like a fool. I meant 4.1. Petri That brings to mind a saying ... "The difference between men and boys, is the price of their toys!" ID: 1859013 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1859015 - Posted: 1 Apr 2017, 21:05:11 UTC - in response to Message 1859013. Last modified: 1 Apr 2017, 21:10:35 UTC The 12 second 1.4. seconds version is running on my test host. It uses 4 Ti's simultaneously for one task. Three for long pulse finds and one for all the rest (Gauss, Triplet, Autocorrelations and Spikes). For shorties it balances the load differently: 2 for pulsefind, 2 for all the rest. CPU is used for chirping and it uses AVX2. EDIT: got the run time wrong. EDI2: makes mee feel like a fool. I meant 4.1. Petri That brings to mind a saying ... "The difference between men and boys, is the price of their toys!" Yeah! It IS fun to play even in this age and time -- regardles of the recent outages. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1859015 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1859037 - Posted: 1 Apr 2017, 22:14:16 UTC - in response to Message 1859013. Last modified: 1 Apr 2017, 22:16:59 UTC The 12 second 1.4. seconds version is running on my test host. It uses 4 Ti's simultaneously for one task. Three for long pulse finds and one for all the rest (Gauss, Triplet, Autocorrelations and Spikes). For shorties it balances the load differently: 2 for pulsefind, 2 for all the rest. CPU is used for chirping and it uses AVX2. EDIT: got the run time wrong. EDI2: makes mee feel like a fool. I meant 4.1. Petri That brings to mind a saying ... "The difference between men and boys, is the price of their toys!" . . So when do we get our Lamborghinis ?? :) Stephen :) ID: 1859037 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1859140 - Posted: 2 Apr 2017, 6:08:41 UTC Hi, Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762 The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1859140 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1859144 - Posted: 2 Apr 2017, 6:31:37 UTC - in response to Message 1859140. I ran across another one that is pretty simple. Zero signals and a Bad Best Pulse, https://setiathome.berkeley.edu/workunit.php?wuid=2488317742 Ran the task on my CPU and got; Best pulse: peak=4.564702, time=67.24, period=0.5079, d_freq=1420128564.62, score=0.8974, chirp=71.618, fft_len=64 http://boinc2.ssl.berkeley.edu/sah/download_fanout/3ba/16fe08aa.12502.25021.6.33.13 I've been running different builds since the outage, we'll see how they go. ID: 1859144 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34748 Credit: 261,360,520 RAC: 489	Message 1859158 - Posted: 2 Apr 2017, 8:18:59 UTC - in response to Message 1859140. Hi, Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762 The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too. Petri That other rig is spitting out garbage on its 560 Ti which you came up against there on that w/u. ;-) Cheers. ID: 1859158 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1859160 - Posted: 2 Apr 2017, 8:30:50 UTC - in response to Message 1859140. Last modified: 2 Apr 2017, 8:36:36 UTC Hi, Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762 The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too. Petri used to run a 560ti, and early factory OC models were shipped with insufficient core voltage by default. They also tend to get pretty toasty. My feeling is we'll have to eventually embed some monitoring (e.g. NVML sensors during run) and possibly some lightweight spotchecks. Another potentially handy thing where you use padding, might be to use 0xDEADDEAD instead of zeroes, then throw in some extra threads with a conditional, such that the extras look for the hex value, and either set a flag or throw an exception on corruption detection. Not exactly rigorous, but low cost and better than nothing. [I plan something along those lines for the generic version, more oriented to the automated tuning, however that's further off.] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1859160 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1859163 - Posted: 2 Apr 2017, 11:20:10 UTC - in response to Message 1859160. Hi, Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762 The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too. Petri used to run a 560ti, and early factory OC models were shipped with insufficient core voltage by default. They also tend to get pretty toasty. My feeling is we'll have to eventually embed some monitoring (e.g. NVML sensors during run) and possibly some lightweight spotchecks. Another potentially handy thing where you use padding, might be to use 0xDEADDEAD instead of zeroes, then throw in some extra threads with a conditional, such that the extras look for the hex value, and either set a flag or throw an exception on corruption detection. Not exactly rigorous, but low cost and better than nothing. [I plan something along those lines for the generic version, more oriented to the automated tuning, however that's further off.] Yup, Writing and checking for 0xDEAD (or whateved bin code) in between buffers would reveal buffer under/overflows. I'll see if I have time to implement that some evening next week. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1859163 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1859198 - Posted: 2 Apr 2017, 14:36:59 UTC - in response to Message 1859163. A little more info on the False overflows. It seems all of these are occurring with the Low Angle range Arecibo tasks, mainly around 0.248126 & 0.148085. How many overflows depends upon how many of those angle range tasks you get. Seeing as how the previous versions had False overflows with the VLARs, it would appear this is a leftover from that problem. It still doesn't like the Low Angle ranges. I haven't found any at the higher angle ranges. https://setiathome.berkeley.edu/results.php?hostid=8215300&state=5 https://setiathome.berkeley.edu/results.php?hostid=7769537&state=5 https://setiathome.berkeley.edu/results.php?hostid=8136063&state=5 etc, etc... ID: 1859198 ·

Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62	Message 1859455 - Posted: 4 Apr 2017, 9:38:26 UTC under W7, bad WU with ATI ? https://setiathome.berkeley.edu/workunit.php?wuid=2486259533 ID: 1859455 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1859491 - Posted: 4 Apr 2017, 14:40:22 UTC - in response to Message 1859455. Last modified: 4 Apr 2017, 14:43:36 UTC under W7, bad WU with ATI ? https://setiathome.berkeley.edu/workunit.php?wuid=2486259533 The task says; WARNING: This application needs newer GPU, at least ATI Radeon HD 5000 needed, exiting ! That is because SETI doesn't have an App that will work on the AMD Radeon HD 4850 in your machine. They tried one, but couldn't find a way to assign it to just the HD 4000 GPUs, so, they just removed it rather than send it to All the machines. You need to Uncheck Use ATI GPU in your Preferences, https://setiathome.berkeley.edu/prefs.php?subset=project Another way would be to Install the SSE41 CPU App on your machine. The Package only has the CPU App in the app_info.xml, so, it will only ask for CPU tasks. As a Bonus, the SSE41 App will be much faster on your machine than the Stock CPU App, SSE41_CPUr3344.zip ID: 1859491 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.