High performance Linux clients at SETI

Message boards : Number crunching : High performance Linux clients at SETI
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 20 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1990621 - Posted: 19 Apr 2019, 8:35:19 UTC - in response to Message 1990613.  

I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high.
ID: 1990621 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1990626 - Posted: 19 Apr 2019, 9:33:34 UTC - in response to Message 1990613.  

What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer.

Memory issues, or more likely Power Supply issues with the increased load of the new application?
Grant
Darwin NT
ID: 1990626 · Report as offensive
W3Perl Project Donor
Volunteer tester

Send message
Joined: 29 Apr 99
Posts: 251
Credit: 3,696,783,867
RAC: 12,606
France
Message 1990645 - Posted: 19 Apr 2019, 13:26:35 UTC - in response to Message 1990565.  

[
. . Hi Laurent,

. . Those numbers surprised me, I have better times on Arecibo tasks than Blc(32) on all 4 boxes and GPU types (GTX1050, 1050ti, 970 and 1060), but I am still running v0.97 on the 2 Linux boxes and SoG on the Windows boxes (the x2 indicates running 2 concurrent tasks). Perhaps where you say Arecibo they are VLAR tasks?

1050 (SoG x 2) : Arecibo => 20 to 21 mins : Blc32 => 28 to 29 mins
1060 (SoG x 2) : Arecibo => 11 to 13 mins : Blc32 => 15 to 17 mins

1050ti (0.97) : Arecibo => 235 to 245 secs : Blc32 => 260 to 265 secs
970 . . . (0.97) : Arecibo => 135 to 140 secs : Blc32 => 160 to 165 secs

Stephen

? ?


The difference between SoG and petri's software is so impressive ! We need to compare the same arecibo wu, the latest ones I have received are computed in 88-90 sec.
About unroll, I used to change the value according to gpu ram available. The lower the unroll was set, the lower the amount of ram gpu was need. This is not need anymore
as the memory fingerprint have been reduced to the minimum level.

Does anyone know what is the meaning of pfb ? It may useful to understand why some users show improvements and others dont.
ID: 1990645 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1990649 - Posted: 19 Apr 2019, 14:16:12 UTC - in response to Message 1990626.  
Last modified: 19 Apr 2019, 14:35:51 UTC

What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer.

Memory issues, or more likely Power Supply issues with the increased load of the new application?


I am running a 1600 PSU on both systems. A fresh "nvidia-smi" shows none of the gpus are pegging their power limits. Or running too hot.

I have a few bios settings that I can return to "auto" on this system. I was trying out a faster ram speed setting and some memory manually controlled memory interweaving. I have re-set those to their default values of "auto".

I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it.

The gpus are being run at "stock" settings. I have shown myself incompetent to "tune" them.


Tom
A proud member of the OFA (Old Farts Association).
ID: 1990649 · Report as offensive
W3Perl Project Donor
Volunteer tester

Send message
Joined: 29 Apr 99
Posts: 251
Credit: 3,696,783,867
RAC: 12,606
France
Message 1990657 - Posted: 19 Apr 2019, 14:41:48 UTC - in response to Message 1990649.  

What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer.

Memory issues, or more likely Power Supply issues with the increased load of the new application?


I am running a 1600 PSU on both systems. A fresh "nvidia-smi" shows none of the gpus are pegging their power limits. Or running too hot.

I have a few bios settings that I can return to "auto" on this system. I was trying out a faster ram speed setting and some memory manually controlled memory interweaving. I have re-set those to their default values of "auto".

I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it.

The gpus are being run at "stock" settings. I have shown myself incompetent to "tune" them.


Tom


It's quite easy to get 'inconclusive' with overclocked ram.
If you want to check your overclocking is safe, try to compress, uncompress a large file (hundred of megabytes). If you get errors, time to decrease the clock settings !
ID: 1990657 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1990662 - Posted: 19 Apr 2019, 14:58:13 UTC - in response to Message 1990657.  


It's quite easy to get 'inconclusive' with overclocked ram.
If you want to check your overclocking is safe, try to compress, uncompress a large file (hundred of megabytes). If you get errors, time to decrease the clock settings !


That might be the issue. I fired up an Intel utility that Keith introduced me to called "i7z" which allows me to "see" a lot of what my cpus are doing. For the first time in my memory, the cpus were actually doing "C0" halts. Previously they never halted.

So I re-enabled the C3/C6 and the "halt limit" parameters in the Bios and looked again. Yup, they appear to be halting a even little more. I am going back to my previous settings. Since I want to see if running the memory completely stock causes the issues to go away.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1990662 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1990663 - Posted: 19 Apr 2019, 15:00:23 UTC - in response to Message 1990645.  

Does anyone know what is the meaning of pfb ? It may useful to understand why some users show improvements and others dont.

https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/PetriR_0.97/client/confsettings.cpp#L73
It is the Exact same setting as in the Windows CUDA App, if (gCudaDevProps.major < 2) fprintf(stderr,"pulsefind: blocks per SM %d %s\n", pfBlocksPerSM, (pfBlocksPerSM == def) ? "(Pre-Fermi default)
If you look at the same cpp in XBranch you'll see it is the same.
I never got those settings to do anything in Windows either, I don't think anyone else had success either.
ID: 1990663 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990668 - Posted: 19 Apr 2019, 15:20:13 UTC - in response to Message 1990621.  

I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high.

I always discount any task crunched by apple-darwin hosts of which the majority are the reason for any inconclusive. That app should be removed in my opinion.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990668 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990669 - Posted: 19 Apr 2019, 15:23:21 UTC - in response to Message 1990645.  

Does anyone know what is the meaning of pfb ? It may useful to understand why some users show improvements and others dont.

I just answered this yesterday for somebody else. Throwback to x41zi days with the CUDA 42 and CUDA 50 days.

pfblockspersm = (1-16) pulse finding blocks per SM
pfperiodsperlaunch = (1-1000) pulse finding periods per launch


Has to do with the FFT setup for the compute kernel.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990669 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990671 - Posted: 19 Apr 2019, 15:30:52 UTC - in response to Message 1990662.  


It's quite easy to get 'inconclusive' with overclocked ram.
If you want to check your overclocking is safe, try to compress, uncompress a large file (hundred of megabytes). If you get errors, time to decrease the clock settings !


That might be the issue. I fired up an Intel utility that Keith introduced me to called "i7z" which allows me to "see" a lot of what my cpus are doing. For the first time in my memory, the cpus were actually doing "C0" halts. Previously they never halted.

So I re-enabled the C3/C6 and the "halt limit" parameters in the Bios and looked again. Yup, they appear to be halting a even little more. I am going back to my previous settings. Since I want to see if running the memory completely stock causes the issues to go away.

Tom

If you are doing anything other than JEDEC 2133 or stock XMP setup for the RAM you need to always test for memory errors. You can run memtest86 or Google Stressapptest or run Prime95 to check for errors.. I would state that if you can pass an hour of stressapptest using 90% of your memory and pass an hour of small 8K -24K FFT Prime95 with no errors detected, then you are both cpu and gpu stable enough to run Seti.

sudo apt install stressapptest

https://www.mersenne.org/download/
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990671 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1990673 - Posted: 19 Apr 2019, 16:00:35 UTC - in response to Message 1990621.  
Last modified: 19 Apr 2019, 16:22:28 UTC

I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high.


. . When I looked it was at 206 total but only 100 were from the current period (i.e. the same time frame as the valid tasks shown) which with 5500 valid tasks is an inconclusive rate of 1.7%. Not where we would all want it but not in any way terrible.

. . With 100 inc's to check I didn't bother looking at the wingmen, so apple-darwin hosts could be the culprits in many of those cases.

Stephen

. .
ID: 1990673 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1990676 - Posted: 19 Apr 2019, 16:29:21 UTC - in response to Message 1990668.  

I did a spot check on Validation inconclusive tasks for computer 8676008. 207 total at the time of writing: 14 on CPU, and 193 on NVidia (Cuda 10.1 special). It's still a low proportion of the huge throughput, but I'd still say it was too high.
I always discount any task crunched by apple-darwin hosts of which the majority are the reason for any inconclusive. That app should be removed in my opinion.
I agree about the Apple app(s). If only we had an Apple developer...

But I checked the first 20 on the same link just now: 8 Apple, 10 Windows, 2 Linux/Intel GPU (the same anonymous machine in both cases). You can't blame it all on the Apple app.
ID: 1990676 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1990706 - Posted: 19 Apr 2019, 20:41:07 UTC - in response to Message 1990649.  

I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it.

When it comes to voltages, I only trust a good multimeter and not some software's interpretation of some motherboard hardware of indeterminate accuracy,
Grant
Darwin NT
ID: 1990706 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1990709 - Posted: 19 Apr 2019, 21:27:21 UTC - in response to Message 1990579.  

I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error.

I think I will experiment with -unroll values of 2 next.


. . As Ian said, Petri stated that for 0.98 10.1 there was negigible difference between unroll 1 and unroll 2, so your data will help clarify that. But that would indicate that higher values are relatively meaningless as your posted data verifies.

Stephen

. .

The key is what Petri said in this quote.

The pulse find algorithm search stage was completely rewritten . It does not need any buffer for temporary values.
The scan is fully unrolled to all SM units but does not require any memory to store data.


That is why playing around with -pfb and -unroll is fruitless.


Thank You Keith,

You can read "my English".

-pfb and -unroll are needed only when a pulse (or a suspect) is found. That is a rare event. Only Then the old code is run. On noisy data a larger unroll may help a second or two, but the likelihood of an error is bigger. A noisebomb is a noiseblimb/blumb/blomb and it is a pure chance which data is reported from the very first beginning of a data packet. Unroll 1 gives the best equivalent to the CPU version.


Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1990709 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990710 - Posted: 19 Apr 2019, 21:31:49 UTC - in response to Message 1990706.  

I find it hard to believe that I don't have enough "power." But since I have had trouble getting good software based instrumentation on this Intel box it certainly is possible I can't "see" it.

When it comes to voltages, I only trust a good multimeter and not some software's interpretation of some motherboard hardware of indeterminate accuracy,

Yes, there can be quite the discrepancy between the real voltages on a motherboard and what the BIOS or other monitoring software displays. But it is also dependent on the motherboard SIO monitoring chip and where the values are probed on the motherboard. If a voltage being monitored is just at the source supply, it does not account for voltage drop after traversing unknown length of board traces. Different SIO chips have different A-D conversion accuracy depending on whether the analog voltage is sampled at 8 bits or 10 bits. So accuracy may only be the least significant bit which may equate to 20mV or something.

But it can be dicey measuring a voltage on the backside of a cpu socket since one slip with the meter probe can short things out. Better is a board that has dedicated measuring points in a easily accessible area like some of the ASUS motherboards. Then again those measuring points are a distance away from the actual component so will show higher voltage that does not account for voltage drop across the trace length.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990710 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1990713 - Posted: 19 Apr 2019, 21:38:33 UTC - in response to Message 1990662.  


It's quite easy to get 'inconclusive' with overclocked ram.
If you want to check your overclocking is safe, try to compress, uncompress a large file (hundred of megabytes). If you get errors, time to decrease the clock settings !


That might be the issue. I fired up an Intel utility that Keith introduced me to called "i7z" which allows me to "see" a lot of what my cpus are doing. For the first time in my memory, the cpus were actually doing "C0" halts. Previously they never halted.

So I re-enabled the C3/C6 and the "halt limit" parameters in the Bios and looked again. Yup, they appear to be halting a even little more. I am going back to my previous settings. Since I want to see if running the memory completely stock causes the issues to go away.

Tom


I have two "inconclusives" for today. But they may have been in the pipeline before I made the changes. If I get a couple more on tomorrows date, I will try increasing the cpu voltage a tiny bit.

"I taught I saw a puddy cat, I did, I did, I did (Picture of Tweedy Bird from your memory)"

Tom
A proud member of the OFA (Old Farts Association).
ID: 1990713 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990714 - Posted: 19 Apr 2019, 21:39:01 UTC - in response to Message 1990709.  

Thank You Keith,

You can read "my English".

-pfb and -unroll are needed only when a pulse (or a suspect) is found. That is a rare event. Only Then the old code is run. On noisy data a larger unroll may help a second or two, but the likelihood of an error is bigger. A noisebomb is a noiseblimb/blumb/blomb and it is a pure chance which data is reported from the very first beginning of a data packet. Unroll 1 gives the best equivalent to the CPU version.


Petri

Yes, thank you for the further explanation of when the parameters are actually used. My take on the new app is a complete elimination of invalids on short or late overflows because of the difference in methods of counting the pulses in the noise bombs from the stock cpu and gpu SoG apps. That is GREATLY appreciated.

Good job, Petri.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990714 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1990727 - Posted: 19 Apr 2019, 22:44:15 UTC - in response to Message 1990676.  

I always discount any task crunched by apple-darwin hosts of which the majority are the reason for any inconclusive. That app should be removed in my opinion.
I agree about the Apple app(s). If only we had an Apple developer...
But I checked the first 20 on the same link just now: 8 Apple, 10 Windows, 2 Linux/Intel GPU (the same anonymous machine in both cases). You can't blame it all on the Apple app.


. . Seems like a roughly 50/50 split so far, 8 apple/darwin and 1 dud Linux host against 10 windows. Considering the much higher number of Windows users that weights it pretty heavily against the apple/darwin subset. I wonder though where someone might find an apple developer?

Stephen

:)
ID: 1990727 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1990729 - Posted: 19 Apr 2019, 23:09:48 UTC

The AMD 2700 went temporarily south. In the process of reverting almost all the way back to default, the gpus started doing the lovely "task postponed" song again.
I just backgraded to Nvidia driver 4.10 and to the previous version of the all-in-one.

Everything seems to be happy again.

Tom X <- crossed fingers
A proud member of the OFA (Old Farts Association).
ID: 1990729 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1990792 - Posted: 20 Apr 2019, 11:34:59 UTC

My Intel box has run through a string of "time limit exceeded"/"aborted by user" gpu tasks here lately.
https://setiathome.berkeley.edu/results.php?hostid=8676008&offset=0&show_names=0&state=6&appid=

I know I added one more used gtx 1060 3GB a couple of days ago. Otherwise I am clueless. Anyone got an idea? Yes, I know I can backgrade to the "90" gpu app via an app_info.xml change. I am trying to decide if I have hardware dieing error or something else.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1990792 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 20 · Next

Message boards : Number crunching : High performance Linux clients at SETI


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.