Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 83 · Next
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
Shame you can't find a cheaper method. For me, I was able to use my Existing DDR2 ram, CPU cooler and Power supply. You can find a Core2 Quad on eBay for around $20. So, all it took to go from a Dual core CPU was the $22 board and $20 CPU. Much cheaper than a new board, cpu, ram, etc. It runs the GPUs just fine, and with the Special App I don't worry much about CPU tasks. You can edit or create the xorg.conf manually to add Fan control and Overclocking. Open NVIDIA X Server Settings and select X Server Display Configuration. At the bottom select Save to X Configuration File. At the Dialogue window select Show preview, usually it will fail if you try to save it, so, select all the text and copy it. Then open a Terminal and enter gksu nautilus, navigate to the file, /etc/X11/xorg.conf, and open it with gedit. Paste the copied Configuration over whatever is there and add the Option "Coolbits" "12" in the below location{s). You have to add the Option to each screen; ... Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "Coolbits" "12" Option "Stereo" "0" Option "nvidiaXineramaInfoOrder" "DFP-0" Option "metamodes" "nvidia-auto-select +0+0" Option "SLI" "Off" Option "MultiGPU" "Off" Option "BaseMosaic" "off" SubSection "Display" Depth 24 EndSubSection EndSection Section "Screen" Identifier "Screen1" Device "Device1" Monitor "Monitor1" DefaultDepth 24 Option "Coolbits" "12" Option "Stereo" "0" Option "nvidiaXineramaInfoOrder" "CRT-0" Option "metamodes" "nvidia-auto-select +0+0" Option "SLI" "Off" Option "MultiGPU" "Off" Option "BaseMosaic" "off" SubSection "Display" Depth 24 EndSubSection EndSection Section "Screen" Identifier "Screen2" Device "Device2" Monitor "Monitor2" DefaultDepth 24 Option "Coolbits" "12" Option "Stereo" "0" Option "nvidiaXineramaInfoOrder" "CRT-0" Option "metamodes" "nvidia-auto-select +0+0" Option "SLI" "Off" Option "MultiGPU" "Off" Option "BaseMosaic" "off" SubSection "Display" Depth 24 EndSubSection EndSection Save the file. Then restart the machine. Coolbits 12 will give Fan control and OCing. Works for me. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
"sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus" . . Thanks Petri, . . I should have checked that twice, I missed the x before config. . . Now I can cool those 1060s :) . . Oops. That ran but did not open the app to change the fan speed etc. It reported that it used /etc/x11/xorg.conf and added the option thermal configuration check = true to screens 0 & 1 then backed up the file. . . Is it another command to bring up that lovely nVidia app that lets you set things for the video card and screen? Stephen :) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
. . Also one other point. You were correct about the problem updating cc_config.xml settings. With this installation in /home/BOINC and that setting in cc_config.xml set to 1 it is now running as 20 and 0 instead of 30 and 10.Now that it's running at normal priority have you noticed any difference in GPU Utilization in nvidia-smi? On my machine the Arecibo tasks run around the low-90%, the BLC13s run around the mid-90% range with spikes up to 100%. The x41p_zi3t1b version seems to be working well on both the Ubuntu machines and the Mac. The single GPU machine still has just a handful of legitimate inconclusives while the other two 3 GPU machines finally dropped into the 40s. The Mac caught a number of instant overflows and went back up for now. I suspect it will take over a month before they drop into the 30s as there are a lot of old inconclusives to wait on. Hmmm, perhaps a new App release tomorrow, maybe even one for OSX. |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
"sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus" nvidia-settings is one command to set things. Without options it opens up a window to set them manually each time the mache has to be rebooted. to automate things you can create a script to do something like this: root@Linux1:~/Downloads/BOINC# cat schedb.sh #!/bin/bash /usr/bin/nvidia-smi -pm 1 #gtx 1080s #prefer fastest mode nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1" #may not work on some GPUs nvidia-settings -a "[GPU:0]/GPUOverVoltageOffset=16000" #do some overclocking /usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:0]/GPUGraphicsClockOffset[3]=190" #set fan speed percentage nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:0]/GPUTargetFanSpeed=96" #set power limit in Watts /usr/bin/nvidia-smi -i 0 -pl 215 #set application clocks /usr/bin/nvidia-smi -i 0 -ac 5005,1911 # set another GPU nvidia-settings -a "[gpu:1]/GPUPowerMizerMode=1" nvidia-settings -a "[GPU:1]/GPUOverVoltageOffset=16000" /usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:1]/GPUGraphicsClockOffset[3]=190" nvidia-settings -a "[gpu:1]/GPUFanControlState=1" -a "[fan:1]/GPUTargetFanSpeed=96" /usr/bin/nvidia-smi -i 1 -pl 215 /usr/bin/nvidia-smi -i 1 -ac 5005,1911 nvidia-settings -a "[gpu:2]/GPUPowerMizerMode=1" nvidia-settings -a "[GPU:2]/GPUOverVoltageOffset=16000" /usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:2]/GPUGraphicsClockOffset[3]=190" nvidia-settings -a "[gpu:2]/GPUFanControlState=1" -a "[fan:2]/GPUTargetFanSpeed=100" /usr/bin/nvidia-smi -i 2 -pl 215 /usr/bin/nvidia-smi -i 2 -ac 5005,1911 nvidia-settings -a "[gpu:3]/GPUPowerMizerMode=1" nvidia-settings -a "[GPU:3]/GPUOverVoltageOffset=16000" /usr/bin/nvidia-settings -a "[gpu:3]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:3]/GPUGraphicsClockOffset[3]=180" nvidia-settings -a "[gpu:3]/GPUFanControlState=1" -a "[fan:3]/GPUTargetFanSpeed=96" /usr/bin/nvidia-smi -i 3 -pl 215 /usr/bin/nvidia-smi -i 3 -ac 5005,1911 # gtx980s #/usr/bin/nvidia-smi -i 0 -pl 230 #/usr/bin/nvidia-smi -i 2 -pl 230 #/usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[3]=200" -a "[gpu:1]/GPUGraphicsClockOffset[3]=30" #/usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[3]=200" -a "[gpu:2]/GPUGraphicsClockOffset[3]=20" #/usr/bin/nvidia-smi -i 0 -ac 3605,1321 #/usr/bin/nvidia-smi -i 2 -ac 3605,1324 #/usr/bin/nvidia-settings -a "[GPU:1]/GPUOverVoltageOffset=16000" #/usr/bin/nvidia-settings -a "[GPU:2]/GPUOverVoltageOffset=16000" #nvidia-settings -a "[gpu:1]/GPUFanControlState=1" -a "[fan:1]/GPUTargetFanSpeed=90" #nvidia-settings -a "[gpu:2]/GPUFanControlState=1" -a "[fan:2]/GPUTargetFanSpeed=90" for (( ; ; )) do schedtool -a 1,2,3,4 `pidof setiathome_x41zc_x86_64-pc-linux-gnu_cuda65` schedtool -a 1,2,3,4 `pidof setiathome_x41zc_x86_64-pc-linux-gnu_cuda65_v8` schedtool -a 1,2,3,4 `pidof ap_7.01r2793_sse3_clGPU_x86_64` schedtool -a 6,7,8,9,10,11 `pidof MBv8_8.05r3345_avx_linux64` schedtool -a 6,7,8,9,10,11 `pidof setiathome_8.04_i686-pc-linux-gnu` schedtool -a 6,7,8,9,10,11 `pidof ap_7.05r2728_avx_linux32e` schedtool -a 5 `pidof compiz` sleep 2 rmdir ~petri/Downloads/BOINC/slots/1?* 2>/dev/null rmdir ~petri/Downloads/BOINC/slots/2?* 2>/dev/null rmdir ~petri/Downloads/BOINC/slots/3?* 2>/dev/null rmdir ~petri/Downloads/BOINC/slots/4?* 2>/dev/null rmdir ~petri/Downloads/BOINC/slots/5?* 2>/dev/null rmdir ~petri/Downloads/BOINC/slots/6?* 2>/dev/null rmdir ~petri/Downloads/BOINC/slots/7?* 2>/dev/null rmdir ~petri/Downloads/BOINC/slots/8?* 2>/dev/null rmdir ~petri/Downloads/BOINC/slots/9?* 2>/dev/null done The schedtool sets apps to run on spesific range of CPUs. The rmdir is a relic to remove any excess slot directories from BOINC folder. I had 600 of them once. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . Also one other point. You were correct about the problem updating cc_config.xml settings. With this installation in /home/BOINC and that setting in cc_config.xml set to 1 it is now running as 20 and 0 instead of 30 and 10.Now that it's running at normal priority have you noticed any difference in GPU Utilization in nvidia-smi? On my machine the Arecibo tasks run around the low-90%, the BLC13s run around the mid-90% range with spikes up to 100%. . . The top output PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8196 stephen 20 0 22.642g 620056 327636 R 43.3 15.3 1:28.22 setiathome+ 8226 stephen 20 0 22.589g 564216 327304 R 43.3 13.9 0:27.53 setiathome+ 2335 stephen 20 0 608916 43772 26960 R 18.6 1.1 103:15.51 boincmgr . . The nividia-smi output Fri Mar 17 23:04:55 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 0000:01:00.0 On | N/A | | 40% 69C P2 55W / 120W | 1893MiB / 6068MiB | 80% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 106... Off | 0000:02:00.0 Off | N/A | | 31% 58C P2 55W / 120W | 1730MiB / 6072MiB | 58% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1009 G /usr/lib/xorg/Xorg 118MiB | | 0 2067 G compiz 43MiB | | 0 8226 C ...ome_x41p_zi3k+_x86_64-pc-linux-gnu_cuda80 1727MiB | | 1 8263 C ...ome_x41p_zi3k+_x86_64-pc-linux-gnu_cuda80 1727MiB | +-----------------------------------------------------------------------------+ . . So it seems even with those NI values it doesn't come close to utilising them fully. . . FWIW, runtimes for Arecibo normals are 3.5 to 4.25 mins, Blc13's are 4.75 to 5.5 mins, halflings (VHAR) and overflow tasks are from 1.25 to 2.5 mins. . . Both GPUs are crunching Arecibo tasks (NARAs) ATM so those are the readings for that. . . Not that it matters but the times are wrong, the clock is running 4hrs and 7 mins behind and I cannot fix it. Not even by selecting the manual setting for the clock. Stephen . |
![]() Send message Joined: 6 Jun 99 Posts: 1 Credit: 32,436,908 RAC: 0 ![]() |
Can anyone tell me where to find the CUDA8 MB Linux apps modified by Petri? I've looked around and can't seem to find where to download them. Thanks! |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
There is a Crunchers Anonymous link in the first post of this thread, the App can be found there. Seems the recent x41p_zi3t1b build is producing unmatched Overflows on both platforms with the Arecibo tasks. So far I haven't seen any problems with the x41p_zi3k+ build, so, I guess we'll be staying with the x41p_zi3k+ build for now. |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
There is a Crunchers Anonymous link in the first post of this thread, the App can be found there. An overflow is an overflow (noisy packet). It may contain 60 spikes, autocorrelations and pulses. They are searched in different work queues on GPU. Results are checked and reported. A different implementation (parallel) can find them in any order. No order can be said best. The rate of inconclusive results can be found here: http://setiathome.berkeley.edu/results.php?hostid=7475713&offset=0&show_names=0&state=4&appid=29 Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13904 Credit: 208,696,464 RAC: 304 ![]() ![]() |
An overflow is an overflow (noisy packet). It may contain 60 spikes, autocorrelations and pulses. They are searched in different work queues on GPU. Results are checked and reported. A different implementation (parallel) can find them in any order. No order can be said best. But if they can be reported in an order that matches what is considered to be the reference application, then Inconclusives are significantly reduced and the load on the database is reduced as the WU doesn't need processing once more & people get their Credit sooner. Grant Darwin NT |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
As always, Start with reproducing under bench with precisely the same parameters against stock/win32 CPU reference, then again with unroll set to 1. Assuming you reproduce mismatch with higher unroll, and then matches without unroll, then you just confirm what we already know. The Application is fast but broken, as it does not match the serial variants. Anyone that claims it it OK to choose whatever signals they like from the full dataset, has no idea what they are talking about, and you should point them out so I can ridicule them with computer science. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
@petri33. It is Still not OK to mismatch serial CPU on the overflows. This is because CPU serial has no choice, but you do, but are choosing to be lazy and not do a full reduction. Do it properly or don't bother contributing further. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
As always, Start with reproducing under bench with precisely the same parameters against stock/win32 CPU reference, then again with unroll set to 1. Assuming you reproduce mismatch with higher unroll, and then matches without unroll, then you just confirm what we already know. The Application is fast but broken, as it does not match the serial variants. Anyone that claims it it OK to choose whatever signals they like from the full dataset, has no idea what they are talking about, and you should point them out so I can ridicule them with computer science. a) If a packet is full of crap and the decision is made solely based on number of signals found, say 30, then it is completely irrelevant what sort of crap the packet contains. b) If a packet has an acceptable amount of signals they must be reported correctly. c) If a packet does not have reportable signals, finding best non reportable wastes cycles on cosmetics. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
As always, Start with reproducing under bench with precisely the same parameters against stock/win32 CPU reference, then again with unroll set to 1. Assuming you reproduce mismatch with higher unroll, and then matches without unroll, then you just confirm what we already know. The Application is fast but broken, as it does not match the serial variants. Anyone that claims it it OK to choose whatever signals they like from the full dataset, has no idea what they are talking about, and you should point them out so I can ridicule them with computer science. You are wrong. Make your code match serial CPU on overflows or I will abandon it entirely. Your Choice. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
:) To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Good to see we're on the same page ;) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
There is a Crunchers Anonymous link in the first post of this thread, the App can be found there. What I meant by Unmatched Overflow is that the zi3t1b App overflowed but the WingPeople didn't. Sorta like the problems with the last few versions. http://setiathome.berkeley.edu/results.php?hostid=6796479&state=5 https://setiathome.berkeley.edu/results.php?hostid=7769537&state=5 http://setiathome.berkeley.edu/workunit.php?wuid=2471201891 Three different machines, same problem. The difference is the previous versions Overflowed on the VLARs whereas these are Arecibos. I've been testing different versions on the Mac and it appears the current setiathome_x41p_zi3k+_x86_64-apple-darwin_cuda80 build, using the CUDA 7.5 libraries, is the best. I'll see about replacing the old Special_CUDA75-ComputeCode3.2+ at C.A. with the newer version. |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
There is a Crunchers Anonymous link in the first post of this thread, the App can be found there. Both Petri and I know Exactly what it is, and I beleive we have an understanding of how it's going to go. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
Hi Jason, Those are good examples of an error in the program. That executable reports finding spikes when it clearly should not. I'll send you and TBar a link to the latest source code. A number of people are running more recent executables without errors. The latest source has an option to set -unroll autotune . usage: exename -unroll autotune -pfb M where M is 4, 8, 16 ... or something. To me an unmatched overflow is that my version reports 7 autocorr and 23 spikes and the wingman reports 25 spikes and 5 autocorr or vice versa, both 30 signals, i.e. a bad packet.. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Good, keep that to work with and please email also. No reason not to match CPU reference other than dirty parallelism with no reductions. [or faulty reductions...] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
PM sent. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.