High performance Linux clients at SETI

Author	Message
Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1990621 - Posted: 19 Apr 2019, 8:35:19 UTC - in response to Message 1990613. I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high. ID: 1990621 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1990626 - Posted: 19 Apr 2019, 9:33:34 UTC - in response to Message 1990613. What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer. Memory issues, or more likely Power Supply issues with the increased load of the new application? Grant Darwin NT ID: 1990626 ·

W3Perl Volunteer tester Send message Joined: 29 Apr 99 Posts: 251 Credit: 3,696,783,867 RAC: 12,606	Message 1990645 - Posted: 19 Apr 2019, 13:26:35 UTC - in response to Message 1990565. [ . . Hi Laurent, . . Those numbers surprised me, I have better times on Arecibo tasks than Blc(32) on all 4 boxes and GPU types (GTX1050, 1050ti, 970 and 1060), but I am still running v0.97 on the 2 Linux boxes and SoG on the Windows boxes (the x2 indicates running 2 concurrent tasks). Perhaps where you say Arecibo they are VLAR tasks? 1050 (SoG x 2) : Arecibo => 20 to 21 mins : Blc32 => 28 to 29 mins 1060 (SoG x 2) : Arecibo => 11 to 13 mins : Blc32 => 15 to 17 mins 1050ti (0.97) : Arecibo => 235 to 245 secs : Blc32 => 260 to 265 secs 970 . . . (0.97) : Arecibo => 135 to 140 secs : Blc32 => 160 to 165 secs Stephen ? ? The difference between SoG and petri's software is so impressive ! We need to compare the same arecibo wu, the latest ones I have received are computed in 88-90 sec. About unroll, I used to change the value according to gpu ram available. The lower the unroll was set, the lower the amount of ram gpu was need. This is not need anymore as the memory fingerprint have been reduced to the minimum level. Does anyone know what is the meaning of pfb ? It may useful to understand why some users show improvements and others dont. ID: 1990645 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1990649 - Posted: 19 Apr 2019, 14:16:12 UTC - in response to Message 1990626. Last modified: 19 Apr 2019, 14:35:51 UTC What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer. Memory issues, or more likely Power Supply issues with the increased load of the new application? I am running a 1600 PSU on both systems. A fresh "nvidia-smi" shows none of the gpus are pegging their power limits. Or running too hot. I have a few bios settings that I can return to "auto" on this system. I was trying out a faster ram speed setting and some memory manually controlled memory interweaving. I have re-set those to their default values of "auto". I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it. The gpus are being run at "stock" settings. I have shown myself incompetent to "tune" them. Tom A proud member of the OFA (Old Farts Association). ID: 1990649 ·

W3Perl Volunteer tester Send message Joined: 29 Apr 99 Posts: 251 Credit: 3,696,783,867 RAC: 12,606	Message 1990657 - Posted: 19 Apr 2019, 14:41:48 UTC - in response to Message 1990649. What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer. Memory issues, or more likely Power Supply issues with the increased load of the new application? I am running a 1600 PSU on both systems. A fresh "nvidia-smi" shows none of the gpus are pegging their power limits. Or running too hot. I have a few bios settings that I can return to "auto" on this system. I was trying out a faster ram speed setting and some memory manually controlled memory interweaving. I have re-set those to their default values of "auto". I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it. The gpus are being run at "stock" settings. I have shown myself incompetent to "tune" them. Tom It's quite easy to get 'inconclusive' with overclocked ram. If you want to check your overclocking is safe, try to compress, uncompress a large file (hundred of megabytes). If you get errors, time to decrease the clock settings ! ID: 1990657 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1990662 - Posted: 19 Apr 2019, 14:58:13 UTC - in response to Message 1990657. It's quite easy to get 'inconclusive' with overclocked ram. If you want to check your overclocking is safe, try to compress, uncompress a large file (hundred of megabytes). If you get errors, time to decrease the clock settings ! That might be the issue. I fired up an Intel utility that Keith introduced me to called "i7z" which allows me to "see" a lot of what my cpus are doing. For the first time in my memory, the cpus were actually doing "C0" halts. Previously they never halted. So I re-enabled the C3/C6 and the "halt limit" parameters in the Bios and looked again. Yup, they appear to be halting a even little more. I am going back to my previous settings. Since I want to see if running the memory completely stock causes the issues to go away. Tom A proud member of the OFA (Old Farts Association). ID: 1990662 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1990663 - Posted: 19 Apr 2019, 15:00:23 UTC - in response to Message 1990645. Does anyone know what is the meaning of pfb ? It may useful to understand why some users show improvements and others dont. https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/PetriR_0.97/client/confsettings.cpp#L73 It is the Exact same setting as in the Windows CUDA App, if (gCudaDevProps.major < 2) fprintf(stderr,"pulsefind: blocks per SM %d %s\n", pfBlocksPerSM, (pfBlocksPerSM == def) ? "(Pre-Fermi default) If you look at the same cpp in XBranch you'll see it is the same. I never got those settings to do anything in Windows either, I don't think anyone else had success either. ID: 1990663 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1990668 - Posted: 19 Apr 2019, 15:20:13 UTC - in response to Message 1990621. I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high. I always discount any task crunched by apple-darwin hosts of which the majority are the reason for any inconclusive. That app should be removed in my opinion. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1990668 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1990669 - Posted: 19 Apr 2019, 15:23:21 UTC - in response to Message 1990645. Does anyone know what is the meaning of pfb ? It may useful to understand why some users show improvements and others dont. I just answered this yesterday for somebody else. Throwback to x41zi days with the CUDA 42 and CUDA 50 days. pfblockspersm = (1-16) pulse finding blocks per SM pfperiodsperlaunch = (1-1000) pulse finding periods per launch Has to do with the FFT setup for the compute kernel. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1990669 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1990671 - Posted: 19 Apr 2019, 15:30:52 UTC - in response to Message 1990662. It's quite easy to get 'inconclusive' with overclocked ram. If you want to check your overclocking is safe, try to compress, uncompress a large file (hundred of megabytes). If you get errors, time to decrease the clock settings ! That might be the issue. I fired up an Intel utility that Keith introduced me to called "i7z" which allows me to "see" a lot of what my cpus are doing. For the first time in my memory, the cpus were actually doing "C0" halts. Previously they never halted. So I re-enabled the C3/C6 and the "halt limit" parameters in the Bios and looked again. Yup, they appear to be halting a even little more. I am going back to my previous settings. Since I want to see if running the memory completely stock causes the issues to go away. Tom If you are doing anything other than JEDEC 2133 or stock XMP setup for the RAM you need to always test for memory errors. You can run memtest86 or Google Stressapptest or run Prime95 to check for errors.. I would state that if you can pass an hour of stressapptest using 90% of your memory and pass an hour of small 8K -24K FFT Prime95 with no errors detected, then you are both cpu and gpu stable enough to run Seti. sudo apt install stressapptest https://www.mersenne.org/download/ Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1990671 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1990673 - Posted: 19 Apr 2019, 16:00:35 UTC - in response to Message 1990621. Last modified: 19 Apr 2019, 16:22:28 UTC I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high. . . When I looked it was at 206 total but only 100 were from the current period (i.e. the same time frame as the valid tasks shown) which with 5500 valid tasks is an inconclusive rate of 1.7%. Not where we would all want it but not in any way terrible. . . With 100 inc's to check I didn't bother looking at the wingmen, so apple-darwin hosts could be the culprits in many of those cases. Stephen . . ID: 1990673 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1990676 - Posted: 19 Apr 2019, 16:29:21 UTC - in response to Message 1990668. I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high. I always discount any task crunched by apple-darwin hosts of which the majority are the reason for any inconclusive. That app should be removed in my opinion. I agree about the Apple app(s). If only we had an Apple developer... But I checked the first 20 on the same link just now: 8 Apple, 10 Windows, 2 Linux/Intel GPU (the same anonymous machine in both cases). You can't blame it all on the Apple app. ID: 1990676 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1990706 - Posted: 19 Apr 2019, 20:41:07 UTC - in response to Message 1990649. I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it. When it comes to voltages, I only trust a good multimeter and not some software's interpretation of some motherboard hardware of indeterminate accuracy, Grant Darwin NT ID: 1990706 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1990709 - Posted: 19 Apr 2019, 21:27:21 UTC - in response to Message 1990579. I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error. I think I will experiment with -unroll values of 2 next. . . As Ian said, Petri stated that for 0.98 10.1 there was negigible difference between unroll 1 and unroll 2, so your data will help clarify that. But that would indicate that higher values are relatively meaningless as your posted data verifies. Stephen . . The key is what Petri said in this quote. The pulse find algorithm search stage was completely rewritten . It does not need any buffer for temporary values. The scan is fully unrolled to all SM units but does not require any memory to store data. That is why playing around with -pfb and -unroll is fruitless. Thank You Keith, You can read "my English". -pfb and -unroll are needed only when a pulse (or a suspect) is found. That is a rare event. Only Then the old code is run. On noisy data a larger unroll may help a second or two, but the likelihood of an error is bigger. A noisebomb is a noiseblimb/blumb/blomb and it is a pure chance which data is reported from the very first beginning of a data packet. Unroll 1 gives the best equivalent to the CPU version. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1990709 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1990710 - Posted: 19 Apr 2019, 21:31:49 UTC - in response to Message 1990706. I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it. When it comes to voltages, I only trust a good multimeter and not some software's interpretation of some motherboard hardware of indeterminate accuracy, Yes, there can be quite the discrepancy between the real voltages on a motherboard and what the BIOS or other monitoring software displays. But it is also dependent on the motherboard SIO monitoring chip and where the values are probed on the motherboard. If a voltage being monitored is just at the source supply, it does not account for voltage drop after traversing unknown length of board traces. Different SIO chips have different A-D conversion accuracy depending on whether the analog voltage is sampled at 8 bits or 10 bits. So accuracy may only be the least significant bit which may equate to 20mV or something. But it can be dicey measuring a voltage on the backside of a cpu socket since one slip with the meter probe can short things out. Better is a board that has dedicated measuring points in a easily accessible area like some of the ASUS motherboards. Then again those measuring points are a distance away from the actual component so will show higher voltage that does not account for voltage drop across the trace length. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1990710 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1990713 - Posted: 19 Apr 2019, 21:38:33 UTC - in response to Message 1990662. It's quite easy to get 'inconclusive' with overclocked ram. If you want to check your overclocking is safe, try to compress, uncompress a large file (hundred of megabytes). If you get errors, time to decrease the clock settings ! That might be the issue. I fired up an Intel utility that Keith introduced me to called "i7z" which allows me to "see" a lot of what my cpus are doing. For the first time in my memory, the cpus were actually doing "C0" halts. Previously they never halted. So I re-enabled the C3/C6 and the "halt limit" parameters in the Bios and looked again. Yup, they appear to be halting a even little more. I am going back to my previous settings. Since I want to see if running the memory completely stock causes the issues to go away. Tom I have two "inconclusives" for today. But they may have been in the pipeline before I made the changes. If I get a couple more on tomorrows date, I will try increasing the cpu voltage a tiny bit. "I taught I saw a puddy cat, I did, I did, I did (Picture of Tweedy Bird from your memory)" Tom A proud member of the OFA (Old Farts Association). ID: 1990713 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1990714 - Posted: 19 Apr 2019, 21:39:01 UTC - in response to Message 1990709. Thank You Keith, You can read "my English". -pfb and -unroll are needed only when a pulse (or a suspect) is found. That is a rare event. Only Then the old code is run. On noisy data a larger unroll may help a second or two, but the likelihood of an error is bigger. A noisebomb is a noiseblimb/blumb/blomb and it is a pure chance which data is reported from the very first beginning of a data packet. Unroll 1 gives the best equivalent to the CPU version. Petri Yes, thank you for the further explanation of when the parameters are actually used. My take on the new app is a complete elimination of invalids on short or late overflows because of the difference in methods of counting the pulses in the noise bombs from the stock cpu and gpu SoG apps. That is GREATLY appreciated. Good job, Petri. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1990714 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1990727 - Posted: 19 Apr 2019, 22:44:15 UTC - in response to Message 1990676. I always discount any task crunched by apple-darwin hosts of which the majority are the reason for any inconclusive. That app should be removed in my opinion. I agree about the Apple app(s). If only we had an Apple developer... But I checked the first 20 on the same link just now: 8 Apple, 10 Windows, 2 Linux/Intel GPU (the same anonymous machine in both cases). You can't blame it all on the Apple app. . . Seems like a roughly 50/50 split so far, 8 apple/darwin and 1 dud Linux host against 10 windows. Considering the much higher number of Windows users that weights it pretty heavily against the apple/darwin subset. I wonder though where someone might find an apple developer? Stephen :) ID: 1990727 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1990729 - Posted: 19 Apr 2019, 23:09:48 UTC The AMD 2700 went temporarily south. In the process of reverting almost all the way back to default, the gpus started doing the lovely "task postponed" song again. I just backgraded to Nvidia driver 4.10 and to the previous version of the all-in-one. Everything seems to be happy again. Tom X <- crossed fingers A proud member of the OFA (Old Farts Association). ID: 1990729 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1990792 - Posted: 20 Apr 2019, 11:34:59 UTC My Intel box has run through a string of "time limit exceeded"/"aborted by user" gpu tasks here lately. https://setiathome.berkeley.edu/results.php?hostid=8676008&offset=0&show_names=0&state=6&appid= I know I added one more used gtx 1060 3GB a couple of days ago. Otherwise I am clueless. Anyone got an idea? Yes, I know I can backgrade to the "90" gpu app via an app_info.xml change. I am trying to decide if I have hardware dieing error or something else. Tom A proud member of the OFA (Old Farts Association). ID: 1990792 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.