Message boards :
Number crunching :
High performance Linux clients at SETI
Message board moderation
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 20 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer. Memory issues, or more likely Power Supply issues with the increased load of the new application? Grant Darwin NT |
W3Perl Send message Joined: 29 Apr 99 Posts: 251 Credit: 3,696,783,867 RAC: 12,606 |
[ The difference between SoG and petri's software is so impressive ! We need to compare the same arecibo wu, the latest ones I have received are computed in 88-90 sec. About unroll, I used to change the value according to gpu ram available. The lower the unroll was set, the lower the amount of ram gpu was need. This is not need anymore as the memory fingerprint have been reduced to the minimum level. Does anyone know what is the meaning of pfb ? It may useful to understand why some users show improvements and others dont. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer. I am running a 1600 PSU on both systems. A fresh "nvidia-smi" shows none of the gpus are pegging their power limits. Or running too hot. I have a few bios settings that I can return to "auto" on this system. I was trying out a faster ram speed setting and some memory manually controlled memory interweaving. I have re-set those to their default values of "auto". I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it. The gpus are being run at "stock" settings. I have shown myself incompetent to "tune" them. Tom A proud member of the OFA (Old Farts Association). |
W3Perl Send message Joined: 29 Apr 99 Posts: 251 Credit: 3,696,783,867 RAC: 12,606 |
What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer. It's quite easy to get 'inconclusive' with overclocked ram. If you want to check your overclocking is safe, try to compress, uncompress a large file (hundred of megabytes). If you get errors, time to decrease the clock settings ! |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
That might be the issue. I fired up an Intel utility that Keith introduced me to called "i7z" which allows me to "see" a lot of what my cpus are doing. For the first time in my memory, the cpus were actually doing "C0" halts. Previously they never halted. So I re-enabled the C3/C6 and the "halt limit" parameters in the Bios and looked again. Yup, they appear to be halting a even little more. I am going back to my previous settings. Since I want to see if running the memory completely stock causes the issues to go away. Tom A proud member of the OFA (Old Farts Association). |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Does anyone know what is the meaning of pfb ? It may useful to understand why some users show improvements and others dont. https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/PetriR_0.97/client/confsettings.cpp#L73 It is the Exact same setting as in the Windows CUDA App, if (gCudaDevProps.major < 2) fprintf(stderr,"pulsefind: blocks per SM %d %s\n", pfBlocksPerSM, (pfBlocksPerSM == def) ? "(Pre-Fermi default) If you look at the same cpp in XBranch you'll see it is the same. I never got those settings to do anything in Windows either, I don't think anyone else had success either. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high. I always discount any task crunched by apple-darwin hosts of which the majority are the reason for any inconclusive. That app should be removed in my opinion. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Does anyone know what is the meaning of pfb ? It may useful to understand why some users show improvements and others dont. I just answered this yesterday for somebody else. Throwback to x41zi days with the CUDA 42 and CUDA 50 days. pfblockspersm = (1-16) pulse finding blocks per SM Has to do with the FFT setup for the compute kernel. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
If you are doing anything other than JEDEC 2133 or stock XMP setup for the RAM you need to always test for memory errors. You can run memtest86 or Google Stressapptest or run Prime95 to check for errors.. I would state that if you can pass an hour of stressapptest using 90% of your memory and pass an hour of small 8K -24K FFT Prime95 with no errors detected, then you are both cpu and gpu stable enough to run Seti. sudo apt install stressapptest https://www.mersenne.org/download/ Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high. . . When I looked it was at 206 total but only 100 were from the current period (i.e. the same time frame as the valid tasks shown) which with 5500 valid tasks is an inconclusive rate of 1.7%. Not where we would all want it but not in any way terrible. . . With 100 inc's to check I didn't bother looking at the wingmen, so apple-darwin hosts could be the culprits in many of those cases. Stephen . . |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I agree about the Apple app(s). If only we had an Apple developer...I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high.I always discount any task crunched by apple-darwin hosts of which the majority are the reason for any inconclusive. That app should be removed in my opinion. But I checked the first 20 on the same link just now: 8 Apple, 10 Windows, 2 Linux/Intel GPU (the same anonymous machine in both cases). You can't blame it all on the Apple app. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it. When it comes to voltages, I only trust a good multimeter and not some software's interpretation of some motherboard hardware of indeterminate accuracy, Grant Darwin NT |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error. Thank You Keith, You can read "my English". -pfb and -unroll are needed only when a pulse (or a suspect) is found. That is a rare event. Only Then the old code is run. On noisy data a larger unroll may help a second or two, but the likelihood of an error is bigger. A noisebomb is a noiseblimb/blumb/blomb and it is a pure chance which data is reported from the very first beginning of a data packet. Unroll 1 gives the best equivalent to the CPU version. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it. Yes, there can be quite the discrepancy between the real voltages on a motherboard and what the BIOS or other monitoring software displays. But it is also dependent on the motherboard SIO monitoring chip and where the values are probed on the motherboard. If a voltage being monitored is just at the source supply, it does not account for voltage drop after traversing unknown length of board traces. Different SIO chips have different A-D conversion accuracy depending on whether the analog voltage is sampled at 8 bits or 10 bits. So accuracy may only be the least significant bit which may equate to 20mV or something. But it can be dicey measuring a voltage on the backside of a cpu socket since one slip with the meter probe can short things out. Better is a board that has dedicated measuring points in a easily accessible area like some of the ASUS motherboards. Then again those measuring points are a distance away from the actual component so will show higher voltage that does not account for voltage drop across the trace length. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I have two "inconclusives" for today. But they may have been in the pipeline before I made the changes. If I get a couple more on tomorrows date, I will try increasing the cpu voltage a tiny bit. "I taught I saw a puddy cat, I did, I did, I did (Picture of Tweedy Bird from your memory)" Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thank You Keith, Yes, thank you for the further explanation of when the parameters are actually used. My take on the new app is a complete elimination of invalids on short or late overflows because of the difference in methods of counting the pulses in the noise bombs from the stock cpu and gpu SoG apps. That is GREATLY appreciated. Good job, Petri. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I always discount any task crunched by apple-darwin hosts of which the majority are the reason for any inconclusive. That app should be removed in my opinion.I agree about the Apple app(s). If only we had an Apple developer... . . Seems like a roughly 50/50 split so far, 8 apple/darwin and 1 dud Linux host against 10 windows. Considering the much higher number of Windows users that weights it pretty heavily against the apple/darwin subset. I wonder though where someone might find an apple developer? Stephen :) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
The AMD 2700 went temporarily south. In the process of reverting almost all the way back to default, the gpus started doing the lovely "task postponed" song again. I just backgraded to Nvidia driver 4.10 and to the previous version of the all-in-one. Everything seems to be happy again. Tom X <- crossed fingers A proud member of the OFA (Old Farts Association). |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
My Intel box has run through a string of "time limit exceeded"/"aborted by user" gpu tasks here lately. https://setiathome.berkeley.edu/results.php?hostid=8676008&offset=0&show_names=0&state=6&appid= I know I added one more used gtx 1060 3GB a couple of days ago. Otherwise I am clueless. Anyone got an idea? Yes, I know I can backgrade to the "90" gpu app via an app_info.xml change. I am trying to decide if I have hardware dieing error or something else. Tom A proud member of the OFA (Old Farts Association). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.