Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 151 · 152 · 153 · 154 · 155 · 156 · 157 . . . 162 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
I noticed references in the news to Netflix's energy consumption: For 2018, Netflix's operations used 51,000 megawatt hours where as serving content used 194,000 megawatt hours, for a total of 245,000 megawatt hours. Last year, operations energy usage increased to 94,000 megawatt hours while serving content required 357,000 megawatt hours, for a total of 451,000 megawatt hours.I was surprised that distribution - 'serving content' - was the dominant energy consumer. Clearly internet downloading is not a free resource, in climate terms. We're doing it all the time, of course, because the data is constantly changing. But perhaps we should try to be a little kinder to the planet with our application downloads? |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
would I be right in assuming that the huge size of the app (230 MB!) implies that it's a full static build with all the libraries like FFT linked in? That’s correct. The special app has been this way since V0.96 or V0.97 or so and was one of the major breakthroughs in speed. Based on that file size I’m guessing you’re referring to the 10.2 mutex build that I compiled. That has essentially everything included. TBar removed support for Maxwell cards in his 10.2 version and also lacks the mutex code, so the file size is a bit smaller, just under 200MB. The CUDA 10.2 package is really big. The same app compiled with CUDA 10.0 instead (with Maxwell support, with mutex) is only 185MB. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I noticed references in the news to Netflix's energy consumption: For netflix, serving content is so energy heavy because they are having to transcode the content to different resolutions on the fly. Netflix is more than likely only retaining the highest quality files for their library (rather than holding say 3-4 different copies of the same movie at different resolutions). So when someone watches content but can’t play it at full resolution (due to internet speed, capabilities of the player, native screen resolution, etc), the Netflix servers have to convert this file to what is supported by the client. This is very resource intensive. It has little to do with the energy used just moving data over the network. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Found one. They are hard to find, you have to catch the machine a day or two after a reboot, and of course it has to be a machine having the problem; https://setiathome.berkeley.edu/result.php?resultid=8531480003 Coprocessors : [2] NVIDIA GeForce GTX 1060 3GB (3019MB) driver: 390.11 OpenCL: 1.2 Operating System: Linux LinuxMint Device 1: GeForce GTX 1060 3GB is okay SETI@home using CUDA accelerated device GeForce GTX 1060 3GB Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 setiathome v8 enhanced x41p_V0.98b1, Cuda 9.00 special Modifications done by petri33, compiled by TBar Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.014356 Sigma 50 Sigma > GaussTOffsetStop: 50 > 14 Thread call stack limit is: 1k Triplet: peak=13.90288, time=39.26, period=36.91, d_freq=7804726873.49, chirp=8.2782, fft_len=512 setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce GTX 1060 3GB, 3019 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 9 pciBusID = 1, pciSlotID = 0 Device 2: GeForce GTX 1060 3GB, 3019 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 9 pciBusID = 6, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 2 setiathome_CUDA: CUDA Device 2 specified, checking... Device 2: GeForce GTX 1060 3GB is okay SETI@home using CUDA accelerated device GeForce GTX 1060 3GB Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 setiathome v8 enhanced x41p_V0.98b1, Cuda 9.00 special Modifications done by petri33, compiled by TBar Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.014356 Sigma 50 Sigma > GaussTOffsetStop: 50 > 14 Thread call stack limit is: 1k Triplet: peak=13.90288, time=39.26, period=36.91, d_freq=7804726873.49, chirp=8.2782, fft_len=512 Spike: peak=24.56104, time=28.63, d_freq=7804726816.76, chirp=25.393, fft_len=128k Autocorr: peak=18.47104, time=62.99, delay=3.5495, d_freq=7804720013.21, chirp=-28.877, fft_len=128k Best spike: peak=24.56104, time=28.63, d_freq=7804726816.76, chirp=25.393, fft_len=128k Best autocorr: peak=18.47104, time=62.99, delay=3.5495, d_freq=7804720013.21, chirp=-28.877, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=0, time=-2.124e+11, period=0, d_freq=0, score=0, chirp=0, fft_len=0 Best triplet: peak=13.90288, time=39.26, period=36.91, d_freq=7804726873.49, chirp=8.2782, fft_len=512 Spike count: 1 Autocorr count: 1 Pulse count: 0 Triplet count: 1 Gaussian count: 0The correct result is; Best pulse: peak=3.622246, time=45.9, period=7.337, d_freq=7804724066.9, score=1.136, chirp=31.396, fft_len=2kIt missed All the Pulses....just as the Macs do. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Another one, this time on a Single Maxwell; https://setiathome.berkeley.edu/result.php?resultid=8533813354 Device 1: GeForce GTX 980 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 980 Ti Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 setiathome v8 enhanced x41p_V0.98b1, Cuda 10.1 special Modifications done by petri33, compiled by TBar Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.012727 Sigma 57 Sigma > GaussTOffsetStop: 57 > 7 Thread call stack limit is: 1k Pulse: peak=0.5492414, time=45.82, period=0.4259, d_freq=7798171867.95, score=1.019, chirp=-12.898, fft_len=256 Pulse: peak=10.97148, time=45.82, period=29.91, d_freq=7798172920.24, score=1.049, chirp=-18.226, fft_len=256 Pulse: peak=0.8585057, time=45.82, period=0.8991, d_freq=7798172371.26, score=1.005, chirp=30.284, fft_len=64 Triplet: peak=11.81117, time=55.96, period=18.49, d_freq=7798171194.57, chirp=32.527, fft_len=64 Triplet: peak=10.5076, time=76.24, period=14.32, d_freq=7798179418.41, chirp=37.267, fft_len=8k Pulse: peak=5.103323, time=45.82, period=10.08, d_freq=7798172116.34, score=1.099, chirp=-38.695, fft_len=256 Pulse: peak=2.727972, time=45.86, period=5.19, d_freq=7798176682.61, score=1.036, chirp=54.328, fft_len=1024 Pulse: peak=10.47997, time=45.86, period=28.14, d_freq=7798171571.82, score=1.054, chirp=-60.777, fft_len=1024 Pulse: peak=10.49335, time=45.86, period=28.1, d_freq=7798171576.59, score=1.055, chirp=-60.917, fft_len=1024 setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 980 Ti, 6082 MiB, regsPerBlock 65536 computeCap 5.2, multiProcs 22 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 980 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 980 Ti Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 setiathome v8 enhanced x41p_V0.98b1, Cuda 10.1 special Modifications done by petri33, compiled by TBar Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.012727 Sigma 57 Sigma > GaussTOffsetStop: 57 > 7 Thread call stack limit is: 1k Triplet: peak=11.81117, time=55.96, period=18.49, d_freq=7798171194.57, chirp=32.527, fft_len=64 Triplet: peak=10.5076, time=76.24, period=14.32, d_freq=7798179418.41, chirp=37.267, fft_len=8k Triplet: peak=12.77178, time=56.79, period=27.97, d_freq=7798171418.07, chirp=86.363, fft_len=64 Best spike: peak=23.75427, time=5.727, d_freq=7798176878.49, chirp=-0.80218, fft_len=128k Best autocorr: peak=17.71228, time=28.63, delay=4.5664, d_freq=7798175714.88, chirp=-3.3839, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=0, time=-2.124e+11, period=0, d_freq=0, score=0, chirp=0, fft_len=0 Best triplet: peak=12.77178, time=56.79, period=27.97, d_freq=7798171418.07, chirp=86.363, fft_len=64 Spike count: 0 Autocorr count: 0 Pulse count: 0 Triplet count: 3 Gaussian count: 0The correct result; Best pulse: peak=5.103332, time=45.82, period=10.08, d_freq=7798172116.34, score=1.099, chirp=-38.695, fft_len=256Missed it by that much... |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Guess my single GPU system isn’t out of the sample after all. Yet it’s never happened. And it has GDDR5 memory. Still don’t think the memory type is one of the factors. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
That's interesting. Care to share your theory why a machine with two 1080Ti and three 1070s only has the problem on the 1070s? BTW, another machine with two GPUs has the problem on a 750Ti but Not the 1080Ti. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
some idiosyncrasy with the specific environment(s) having the issue would be my guess. maybe even a hardware issue. or would you rather jump to conclusions and ignore all the people here telling you we don't have the issue and aren't able to replicate it? Care to share your theory why if you're so certain it's due to GDDR5 mem, that none of the people here have been able to replicate on their GDDR5 cards? correlation is not causation. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I've been working on this 2 Years. Believe it or Not, My knowledge of this issue vastly overshadows yours. I really don't care what you think. This issue has taken out a whole platform, probably two if you consider how Windows might respond to this. Pardon me if I ignore you. I've got work to do. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
not sure what Windows has to do with an issue of the Linux special app. but I digress. first the issue was Firefox, and you were sure of it. then it was something to do with the monitor, and you were sure of it. and now its GDDR5 memory, and you're sure of it. no one was able to replicate theses claims. no one is denying that the issue is happening. you've clearly demonstrated that it's happening. but you're jumping to conclusions about the cause. IIRC Juan claimed to fix his issue when it was happening to him by clearing his slots, ie a software corruption. which makes a LOT more sense than the theories presented thus far. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
IIRC Juan claimed to fix his issue when it was happening to him by clearing his slots, ie a software corruption. which makes a LOT more sense than the theories presented thus far.Please try not to allow further obfuscation to creep into what is already a very opaque and ill-defined issue. 'Slots' can refer to two very different concepts. There's the software folder used as temporary workspace while BOINC runs a task. BOINC does a pretty good job of deleting everything in a slot folder after the task has finished, but we did have to apply a bugfix relatively recently because some files were being missed. If any (detectable - that was part of the problem) files are left in the folder, BOINC won't reuse it for the next task - it'll create a new one. If the BOINC client in use on the affected machine is built from new enough sources, that shouldn't be a problem. [I'll try and clarify the vague terms like 'bugfix' and 'new enough' when I've tracked them down] Or 'slots' can refer to the physical electrical connection between GPU and motherboard, with or without an intermediate cable. 'Clearing' (or cleaning) a slot might involve a solvent spray or a pencil eraser. We should perhaps ask Juan to repost details of the incident in question. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
What Ian and Juan refer to "cleaning his slots" means deleting the nv cache directory of the gpu compute kernel primitives. That is located at: C:\Users\{username}\AppData\Roaming\NVIDIA\ComputeCache for Windows and for Linux at: /home/{username}/.nv/ComputeCache for Linux Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
What Ian and Juan refer to "cleaning his slots" means deleting the nv cache directory of the gpu compute kernel primitives.Ah - thanks. Yet another possibility, but one which makes a lot more sense in this context. For the record, I was thinking about fix bug when delete > 4GB file (May 2015 - Windows only), so probably doesn't apply to SETI tasks. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
When's the last time you've seen the same Exact bug on two different platforms and have it Not be the code? |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
IIRC Juan claimed to fix his issue when it was happening to him by clearing his slots, ie a software corruption. which makes a LOT more sense than the theories presented thus far.Please try not to allow further obfuscation to creep into what is already a very opaque and ill-defined issue. sorry, I was just using the same terminology that Juan used in his post. Keith is likely right, it's really the compute cache. I agree the issue is very ill-defined. unfortunately asking for a definition is met with resistance and snark. meh. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Ok i will try to explain again but please forgive my bad english. Slots is the cache created in the disk by Boinc for each working WU... they are from slot 0 to slot 11 in my case (12 WU crunched at a time). Not related to the PCIe physical slots or de NV cache. Its location is on the Boinc directory too. The problem appears few year ago when few have 4 GPU`s (i have 4x1070 coincidence?) on the hosts. Back from the day when we not have the spoofed client and needs to use rescheduler to try to keep our WU cache filled. What we discovered at that time was, something is not working fine when the rescheduler kills the process who are running. Some type of timing error (not know a better word) makes the program close the boinc BEFORE he was able to clear the slot, leaving the boinc_lockfile in the slot and that makes a huge mess when the boinc is started again. Take some time to discover this at that time mainly because IIRC only one other SETIzen reported the same problem (he has 4 GPU`s 2) and the problem only appears on Linux based hosts. But something was important, the problem was random, could happening maybe a coupule of times in a day and we was unable to find a way to replicate. But what was sure it`s related to the way the kill of the boinc was done by the rescheduler call in some weird condition (never discover what). We solve the problem by adding some extra wait time at the end of the boinc finish process and for extra safety we add a clean all slots subrotine in the begging of the rescheduler after it kills the running client to force the clean of the file. Just in case. But as that happening it disappears, not see the problem for a long time. Ok i not use the scheduler anymore too. The problem related on this post apparently is similar to that one years ago. I believe some timing could be the trigger it too. Hope i was able to explain... will try to find the related thread but that will take time, was few years ago. |
elec999 Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141 |
Gents ive been having lots of issue lately, with machines. GPU are showing missing. NVIDIA-SMI is telling me GPU is missing please reboot. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Problem with the card or the slot on the motherboard it is plugged into. Try moving to a different slot. Check PCIe power connectors on the card for burned pins. Change PCIe power cables. Try a different power supply. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
Gents ive been having lots of issue lately, with machines. GPU are showing missing. NVIDIA-SMI is telling me GPU is missing please reboot. If you're using risers at all, it could also very well be the USB signal cables. Dunno why they use such cheap cables {sigh} Otherwise, as mentioned, almost always power issues. NV cards act like teenagers when the power is borderline. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Working on a suggestion from Chris I connected All 3 of the NV cards, on the machine which uses the Intel GPU for the Main monitor, and tested it. Normally this machine will have 2 out of 3 GPUs Miss all pulses after a reboot. With the GPUs connected to active monitors at boot, None missed any pulses, just as with Chris's machine. After that though, on a Mac, you have to turn Off the monitors connected to the NV GPUs or eventually they will start missing pulses, which means you need the Main monitor connected to a 1080Ti, which doesn't have that problem, or, another type GPU such as an ATI or Intel. I really don't see how you can blame this on 'kernel synchronisation' considering it will run for months once you solve the initial startup problem. To me this appears to be a conflict between the Video Driver and the App fighting over the same vram space at startup. Since it happens on TWO DIFFERENT Platforms, I don't think the Video Driver is at fault. It's really nice after all this time to find a GPU that doesn't have this problem, too bad the 1080Ti is still rather expensive.The problem with missed pulses after boot has been narrowed down to the type of memory on the card, seems it disappears when you go from a 1070 to a 1080Ti.That sounds like a highly dubious claim. I can only assume that it implies a race condition: different kernels finishing at different times, data not being available at times when the code assumes that it is ready. That's a synchronisation problem which it is the duty of the programmer to solve. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.