Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 83 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1856887 - Posted: 20 Mar 2017, 19:08:39 UTC - in response to Message 1856864.  

. . I am not privvy to what happens within Nvidia Corp or the manufacturing companies but I feel confident that they have access to much lower level routines to effect fan and clock control than a user level interface such as xserver. But I found the documentation ironic and that amused me, if you are unable to appreciate the irony in that I am sorry. You have been very helpful and it was never my intention to offend you.
Just curious. Since the Coolbits options are not included in the Public release of nVidia Settings, it would appear they are not intended for the general public. So, who do you think they where intended for? Now that the Vendors are releasing Software that uses the built-in nVidia tweaks, the choice nVidia has is to either make them available to those that want them, or see those people use someone else's software not under Nvidia's control. It's somewhat similar to an Automobile that can do well over 100mph. You will almost never be able to reach over 100mph as in most cases you will be penalized for trying. You can play around with Coolbits all you wish, but if you go to extremes and damage the hardware, there will be penalties. Nothing unusual there.

I just leave the BOINC Manager on my desktop, sometimes with NVIDIA-SMI displayed. It works for most people.


. . Well I guess then it is safe to say, I am not most people, or perhaps I am?

Stephen

..
ID: 1856887 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1856939 - Posted: 20 Mar 2017, 23:10:49 UTC

It actually wouldn't be that hard to script a fan control. SMI can output a steady stream of temp #s from it's options. One could average the last 10 readings and increase/decrease fan speed 2% (or proportional to the difference) to keep them at a desired temp (the same as any software company does).

Sure it would work, as long as it is bullet proof coding.
ID: 1856939 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1856946 - Posted: 21 Mar 2017, 0:14:04 UTC - in response to Message 1856939.  

It actually wouldn't be that hard to script a fan control. SMI can output a steady stream of temp #s from it's options. One could average the last 10 readings and increase/decrease fan speed 2% (or proportional to the difference) to keep them at a desired temp (the same as any software company does).

Sure it would work, as long as it is bullet proof coding.


. . Then I guess that leaves an amateur like me out of the running

Stephen

:)
ID: 1856946 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1858230 - Posted: 29 Mar 2017, 12:32:18 UTC

. . Hi guys,

. . For what it is worth I seem to be running at about 8% to 9% inconclusives, but still zero invalids. I have corrected the PCIe config on the Pentium and the runtimes are now even across the two GPUs and regular.

Stephen

:)
ID: 1858230 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1858609 - Posted: 31 Mar 2017, 15:58:11 UTC

Petri, Are you seeing a 40% increase in performance with the 1080 Ti going from 20 to 28 CU's?

I'm curious, been window shopping :)
ID: 1858609 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1858900 - Posted: 1 Apr 2017, 9:27:08 UTC - in response to Message 1858609.  

Hi, nearly.
Run time with 1080 is 140+ seconds and with ti it is 107 seconds. That is for vlar. See my results.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1858900 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1858906 - Posted: 1 Apr 2017, 10:20:32 UTC - in response to Message 1858900.  

Hi, nearly.
Run time with 1080 is 140+ seconds and with ti it is 107 seconds. That is for vlar. See my results.


Once we get to 12 seconds then we're obsolete, since that is the task observation time, and it might be cheaper to crowdfund 1080tis to Berkeley for realtime analysis, for Arecibo/multibeam anyway.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1858906 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1858992 - Posted: 1 Apr 2017, 19:26:40 UTC - in response to Message 1858906.  
Last modified: 1 Apr 2017, 20:12:31 UTC

Hi, nearly.
Run time with 1080 is 140+ seconds and with ti it is 107 seconds. That is for vlar. See my results.


Once we get to 12 seconds then we're obsolete, since that is the task observation time, and it might be cheaper to crowdfund 1080tis to Berkeley for realtime analysis, for Arecibo/multibeam anyway.


The 12 second 1.4. seconds version is running on my test host. It uses 4 Ti's simultaneously for one task. Three for long pulse finds and one for all the rest (Gauss, Triplet, Autocorrelations and Spikes).
For shorties it balances the load differently: 2 for pulsefind, 2 for all the rest. CPU is used for chirping and it uses AVX2.

EDIT: got the run time wrong.
EDI2: makes mee feel like a fool. I meant 4.1.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1858992 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1859010 - Posted: 1 Apr 2017, 20:36:28 UTC - in response to Message 1858609.  

Petri, Are you seeing a 40% increase in performance with the 1080 Ti going from 20 to 28 CU's?

I'm curious, been window shopping :)


Hi again,

The performance scales quite well.
The Wattage does not do nearly as good.

See image ..

To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1859010 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1859013 - Posted: 1 Apr 2017, 20:49:19 UTC - in response to Message 1858992.  

The 12 second 1.4. seconds version is running on my test host. It uses 4 Ti's simultaneously for one task. Three for long pulse finds and one for all the rest (Gauss, Triplet, Autocorrelations and Spikes).
For shorties it balances the load differently: 2 for pulsefind, 2 for all the rest. CPU is used for chirping and it uses AVX2.

EDIT: got the run time wrong.
EDI2: makes mee feel like a fool. I meant 4.1.

Petri
That brings to mind a saying ...

"The difference between men and boys, is the price of their toys!"
ID: 1859013 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1859015 - Posted: 1 Apr 2017, 21:05:11 UTC - in response to Message 1859013.  
Last modified: 1 Apr 2017, 21:10:35 UTC

The 12 second 1.4. seconds version is running on my test host. It uses 4 Ti's simultaneously for one task. Three for long pulse finds and one for all the rest (Gauss, Triplet, Autocorrelations and Spikes).
For shorties it balances the load differently: 2 for pulsefind, 2 for all the rest. CPU is used for chirping and it uses AVX2.

EDIT: got the run time wrong.
EDI2: makes mee feel like a fool. I meant 4.1.

Petri
That brings to mind a saying ...

"The difference between men and boys, is the price of their toys!"


Yeah!
It IS fun to play even in this age and time -- regardles of the recent outages.

To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1859015 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1859037 - Posted: 1 Apr 2017, 22:14:16 UTC - in response to Message 1859013.  
Last modified: 1 Apr 2017, 22:16:59 UTC

The 12 second 1.4. seconds version is running on my test host. It uses 4 Ti's simultaneously for one task. Three for long pulse finds and one for all the rest (Gauss, Triplet, Autocorrelations and Spikes).
For shorties it balances the load differently: 2 for pulsefind, 2 for all the rest. CPU is used for chirping and it uses AVX2.

EDIT: got the run time wrong.
EDI2: makes mee feel like a fool. I meant 4.1.

Petri
That brings to mind a saying ...

"The difference between men and boys, is the price of their toys!"


. . So when do we get our Lamborghinis ?? :)

Stephen

:)
ID: 1859037 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1859140 - Posted: 2 Apr 2017, 6:08:41 UTC

Hi,
Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762
The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1859140 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1859144 - Posted: 2 Apr 2017, 6:31:37 UTC - in response to Message 1859140.  

I ran across another one that is pretty simple. Zero signals and a Bad Best Pulse, https://setiathome.berkeley.edu/workunit.php?wuid=2488317742
Ran the task on my CPU and got; Best pulse: peak=4.564702, time=67.24, period=0.5079, d_freq=1420128564.62, score=0.8974, chirp=71.618, fft_len=64
http://boinc2.ssl.berkeley.edu/sah/download_fanout/3ba/16fe08aa.12502.25021.6.33.13

I've been running different builds since the outage, we'll see how they go.
ID: 1859144 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34748
Credit: 261,360,520
RAC: 489
Australia
Message 1859158 - Posted: 2 Apr 2017, 8:18:59 UTC - in response to Message 1859140.  

Hi,
Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762
The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too.

Petri

That other rig is spitting out garbage on its 560 Ti which you came up against there on that w/u. ;-)

Cheers.
ID: 1859158 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1859160 - Posted: 2 Apr 2017, 8:30:50 UTC - in response to Message 1859140.  
Last modified: 2 Apr 2017, 8:36:36 UTC

Hi,
Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762
The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too.

Petri


used to run a 560ti, and early factory OC models were shipped with insufficient core voltage by default. They also tend to get pretty toasty. My feeling is we'll have to eventually embed some monitoring (e.g. NVML sensors during run) and possibly some lightweight spotchecks.

Another potentially handy thing where you use padding, might be to use 0xDEADDEAD instead of zeroes, then throw in some extra threads with a conditional, such that the extras look for the hex value, and either set a flag or throw an exception on corruption detection. Not exactly rigorous, but low cost and better than nothing.

[I plan something along those lines for the generic version, more oriented to the automated tuning, however that's further off.]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1859160 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1859163 - Posted: 2 Apr 2017, 11:20:10 UTC - in response to Message 1859160.  

Hi,
Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762
The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too.

Petri


used to run a 560ti, and early factory OC models were shipped with insufficient core voltage by default. They also tend to get pretty toasty. My feeling is we'll have to eventually embed some monitoring (e.g. NVML sensors during run) and possibly some lightweight spotchecks.

Another potentially handy thing where you use padding, might be to use 0xDEADDEAD instead of zeroes, then throw in some extra threads with a conditional, such that the extras look for the hex value, and either set a flag or throw an exception on corruption detection. Not exactly rigorous, but low cost and better than nothing.

[I plan something along those lines for the generic version, more oriented to the automated tuning, however that's further off.]


Yup,
Writing and checking for 0xDEAD (or whateved bin code) in between buffers would reveal buffer under/overflows.
I'll see if I have time to implement that some evening next week.
Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1859163 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1859198 - Posted: 2 Apr 2017, 14:36:59 UTC - in response to Message 1859163.  

A little more info on the False overflows. It seems all of these are occurring with the Low Angle range Arecibo tasks, mainly around 0.248126 & 0.148085. How many overflows depends upon how many of those angle range tasks you get. Seeing as how the previous versions had False overflows with the VLARs, it would appear this is a leftover from that problem. It still doesn't like the Low Angle ranges. I haven't found any at the higher angle ranges.
https://setiathome.berkeley.edu/results.php?hostid=8215300&state=5
https://setiathome.berkeley.edu/results.php?hostid=7769537&state=5
https://setiathome.berkeley.edu/results.php?hostid=8136063&state=5
etc, etc...
ID: 1859198 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1859455 - Posted: 4 Apr 2017, 9:38:26 UTC

under W7, bad WU with ATI ?

https://setiathome.berkeley.edu/workunit.php?wuid=2486259533
ID: 1859455 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1859491 - Posted: 4 Apr 2017, 14:40:22 UTC - in response to Message 1859455.  
Last modified: 4 Apr 2017, 14:43:36 UTC

under W7, bad WU with ATI ?

https://setiathome.berkeley.edu/workunit.php?wuid=2486259533
The task says; WARNING: This application needs newer GPU, at least ATI Radeon HD 5000 needed, exiting !
That is because SETI doesn't have an App that will work on the AMD Radeon HD 4850 in your machine.
They tried one, but couldn't find a way to assign it to just the HD 4000 GPUs, so, they just removed it rather than send it to All the machines.
You need to Uncheck Use ATI GPU in your Preferences, https://setiathome.berkeley.edu/prefs.php?subset=project
Another way would be to Install the SSE41 CPU App on your machine. The Package only has the CPU App in the app_info.xml, so, it will only ask for CPU tasks.
As a Bonus, the SSE41 App will be much faster on your machine than the Stock CPU App, SSE41_CPUr3344.zip
ID: 1859491 · Report as offensive
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 83 · Next

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.