Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 83 · Next
Author | Message |
---|---|
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
Hi, here is an error in the software: https://setiathome.berkeley.edu/result.php?resultid=5594593817. It happens, just not so often.. Pulse finding detects multiple 'pulses' at certain time. Something gets overwritten I guess. Hard to debug since I do not have that wu on my computer. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Hi, Superficially looks like some of those pulses in the serial CPU case would [normally] coalesce. Possibly across where you unroll periods. If so, That's likely the missing reduction I mentioned. If that's the case, the issue disappears without unroll (unroll 1), then it's just a reproduction of the race condition I've been describing. Trick will probably be to drag pulse reporting out of the kernels, and either reduce from multiple pulse tables, or alternatively put atomics on the pulse results and only update if score is higher. Not sure which is easier, but probably pulling it all out will be better for the next generation, which will be even bigger. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Maybe a better alternative description: The pulses aren't at one single 'spot', and the pulsefinding algorithm is looking for the nicest 'blob'. If you did it visually it might be less like looking through a static image, and more like a 'Enhance...Pan' like Harrison Ford from Blade Runner Intro. [Edit:] https://youtu.be/qHepKd38pr0 "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 ![]() |
Greetings Okay here is a workunit, that as gone to 3 different users but is a -9 overflow - spike count 30. https://setiathome.berkeley.edu/workunit.php?wuid=2471498679 There might be some info there for you to look at maybe. Update, have checked my pv status on my GTX 780 machine running x41p_zi3t1b, 70 tasks, with at least 14 with the -9 overflow error. I have some tasks still in the cache to be done, but I have set to it to No New Tasks, Regards Mark |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
The second host there looks like it broke entirely. It'll be unclear on why yours and the first host mismatch, though your 30 pulses might mean you need a reboot or something. [Edit:] inconclusives/Invalids seem very high, even for the experimental apps. I'd look at doing the coolbits thing to get the fan up, and keep temps in check. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 ![]() |
Greetings Jason I seem to having trouble with the BLC workunits, I have bumped the fan up. I am also going to reboot and see if that helps. The invalids/errors were on a previous build. Regards |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Greetings Jason Great. Fingers crossed nothing serious. The 780's are troopers. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . An inconclusive rate of 2.5% seems not out of place, the error rate is a little more significant but at 2/3 of one percent is still not dramatic. Neither degrade the quality of the pool of result data and both are low enough to have only a very small impact on the work flow/ database size. . . But then I am sure there is an assessment process to define the fitness for purpose. Stephen . |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
As always, Start with reproducing under bench with precisely the same parameters against stock/win32 CPU reference, then again with unroll set to 1. Assuming you reproduce mismatch with higher unroll, and then matches without unroll, then you just confirm what we already know. The Application is fast but broken, as it does not match the serial variants. Anyone that claims it it OK to choose whatever signals they like from the full dataset, has no idea what they are talking about, and you should point them out so I can ridicule them with computer science. . . The first two points make sense, but what if one process says there are too many signals and another says there are not too many, would it not be better to eliminate mismatches in this grey area to be sure of results. On point three maybe one day those "best" results of what are deemed insignificant results now may be important if a signal is confirmed at the very limit of what is seen as being significant now. Just my take. Stephen ? |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
The second host there looks like it broke entirely. It'll be unclear on why yours and the first host mismatch, though your 30 pulses might mean you need a reboot or something. . . Now this is a subject that is of concern to me. I have run coolbits on the Pentium machine to keep the 1060s cool, but it doesn't do anything except reports an option value has been set to true for both devices and the config file has been backed up. No matter what I try it does not open the nvidia graphical app to control things so my fans are still running way low and the cards way hot. Any idea of how to fix this issue? . . I want to continue with this app but I don't want to damage the 1060s. . . Of course with a windows version I would have the tools I need to take care of this :) Stephen :) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . In addition to the previous post ... Sun Mar 19 00:13:53 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 0000:01:00.0 On | N/A | | 36% 63C P2 53W / 120W | 1867MiB / 6068MiB | 43% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 106... Off | 0000:02:00.0 Off | N/A | | 29% 55C P2 60W / 120W | 1730MiB / 6072MiB | 41% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1027 G /usr/lib/xorg/Xorg 106MiB | | 0 2192 G compiz 30MiB | | 0 4476 C ...ome_x41p_zi3k+_x86_64-pc-linux-gnu_cuda80 1727MiB | | 1 4444 C ...ome_x41p_zi3k+_x86_64-pc-linux-gnu_cuda80 1727MiB | +-----------------------------------------------------------------------------+ . . It would seem the temps have induced throttling on both GPUs, they are both cooler now but very low power consumption and very low utilisation. :( . . the nice values are AOK. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4444 stephen 20 0 22.562g 536112 327560 R 62.1 13.2 1:47.61 setiathome+ 4476 stephen 20 0 22.592g 567568 327468 R 52.6 14.0 1:12.15 setiathome+ . . But this is what I get when I run this command ... sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus Using X configuration file: "/etc/X11/xorg.conf". Option "ThermalConfigurationCheck" "True" added to Screen "Screen0". Option "ThermalConfigurationCheck" "True" added to Screen "Screen1". Backed up file '/etc/X11/xorg.conf' as '/etc/X11/xorg.conf.backup' New X configuration file written to '/etc/X11/xorg.conf' . . very depressing :( Stephen :( |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
That says the Option has been added. All you have to do is Open "NVIDIA X Server Settings" and go to "Thermal Settings". Make sure "Enable GPU Fan Settings" is Checked and you should have a Slider control. Slide it to the desired setting and click Apply. If you don't have the Slider, then Coolbits isn't working and you should try My suggestion on editing the xorg.conf file. You also may need to edit the gpu-manager.conf file to keep from having the xorg.conf overwritten on reboot. Both those items are discussed earlier in this thread. https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1835926#1835926 https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1855894 |
![]() Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 ![]() |
Greetings I boosted the fan speed and also made some adjustments to the command line in the app_info file. Also rebooted and so far so good on the BLC workunits. I will have a better understanding tomorrow once I have sifted through the latest round of workunits. @Stephen, I am using coolbits option 4, but I also just realised that you have two GPU's in the machine. Regards |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
...Also rebooted and so far so good on the BLC workunits... I just had to check ;-) It appears you're suffering the same problem I was having with x41p_zi3t1b. You are receiving Overflows where your WingPeople aren't, https://setiathome.berkeley.edu/workunit.php?wuid=2472661131 I just got zi3t1d running on the Mac, we'll see how that works. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13915 Credit: 208,696,464 RAC: 304 ![]() ![]() |
. . It would seem the temps have induced throttling on both GPUs, they are both cooler now but very low power consumption and very low utilisation. :( 63°c on a video card (or CPU) I would describe as warm, certainly not hot nor even close to causing thermal throttling. Thermal throttling results when they reach their maximum rated temperatures, and before that point their fans would be running at 100%. The low temperatures & power consumption would be due to the low load. Grant Darwin NT |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
That says the Option has been added. All you have to do is Open "NVIDIA X Server Settings" and go to "Thermal Settings". . . Aaaah, and there's the rub. What is the actual command to <open "NVIDIA X Server Settings">? I have found the file xorg.cnf and the thermalconfigurationcheck option is there and "true". But I cannot find a gpu-manager file and no variation of ./xserver or ./nvidia-xserver runs anything. :( . . I guess I am dumber than I thought but there is nothing about this that makes sense to me. Stephen ? |
![]() Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 ![]() |
Greetings Have gone back through my results since I made the changes and I only have found 3 workunits with the -9 overflow still in pending out of a possible 60 workunits that have been done. I then checked my valid workunits and I have found at two -9 overflow workunits that have validated. is this normal?? https://setiathome.berkeley.edu/workunit.php?wuid=2472679069 https://setiathome.berkeley.edu/workunit.php?wuid=2472696759 Regards |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . It would seem the temps have induced throttling on both GPUs, they are both cooler now but very low power consumption and very low utilisation. :( . . Thanks Grant but that was after throttling. The temps were hitting the 80 mark the last time I had checked it then boom, down to those numbers and the runtimes had doubled. I shut it down and let it cool a little then fired it back up, and running exactly the same WUs the usage went up to 88% and the temps climbed back up to the high 70s. . . Maybe it wasn't throttling but if not I would like to know what it was that turned down the wick on the GPUs rather than turning up the fans. The default fan profiles are waaaayyy too adventurous for my liking. Too much pre-occupation with low noise and not enough with good cooling. . . Where is a Linux version of Afterburner ..... wwwwaaahhhhhhh! Stephen :( |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
Hit the Top Search button in the launcher and enter nv, it should show the App. Once running, lock it to the Launcher so it's easy to find.That says the Option has been added. All you have to do is Open "NVIDIA X Server Settings" and go to "Thermal Settings".. . Aaaah, and there's the rub. What is the actual command to <open "NVIDIA X Server Settings">? I have found the file xorg.cnf and the thermalconfigurationcheck option is there and "true". But I cannot find a gpu-manager file and no variation of ./xserver or ./nvidia-xserver runs anything. :( If NVIDIA X Server Settings is in the Search box you click it. If NVIDIA X Server Settings is in the Launcher you just click it. Pretty straightforward, and it appears you have managed to open it previously. Open NVIDIA X Server Settings and see if you have the Fan Control Slider before doing anything else. ![]() |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . Thanks TBar, . . If the app had been in the Launcher I would have had no problems. The nv. search didn't find it, but an nvidia. search did and solved the problem thanks for that. . . The 1060s thank you as well, they are now running at around 60deg. Stephen :) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.