SETIv8 for Linux skylake-avx512 available

Message boards : Number crunching : SETIv8 for Linux skylake-avx512 available
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1938388 - Posted: 6 Jun 2018, 7:51:32 UTC - in response to Message 1938351.  

Despite the name, "branches/sah_v7_opt/AKv8/client" is probably the one, but take advice.

Yep, for opt CPU MBv8 build it's right choice.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1938388 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1938389 - Posted: 6 Jun 2018, 7:53:44 UTC - in response to Message 1938353.  

Thanks, I've tried compiling that version with no success. There are multiple errors on Linux; it appears to be meant for Windows. Any other advice would be greatly appreciated.


It's cross-platform sources. Look for configure lines in repo. It known to be buildable on Linux (for x86), checked on quite recent revs.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1938389 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1938390 - Posted: 6 Jun 2018, 7:55:42 UTC - in response to Message 1938356.  


All that work was done by Raistmer - he'll probably discover this thread in the morning. I'm on UK time, so I'll sign off for tonight.

:D ;D
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1938390 · Report as offensive
luz

Send message
Joined: 3 Jun 18
Posts: 18
Credit: 11,853
RAC: 0
Message 1938396 - Posted: 6 Jun 2018, 10:20:23 UTC - in response to Message 1938375.  

OK, I'm replacing the compiled app with the stock one now, I'll report results when it's done work.
ID: 1938396 · Report as offensive
luz

Send message
Joined: 3 Jun 18
Posts: 18
Credit: 11,853
RAC: 0
Message 1938399 - Posted: 6 Jun 2018, 10:29:20 UTC - in response to Message 1938396.  

From the BOINC Manager estimated completion times, I'm seeing that the stock app runs about 4 times as fast, completing 16 tasks an hour. What could be the problem? I assumed that compiling for skylake-avx512 would speed it up, not slow it down. I would really appreciate any help. I'll look into ./configure options, maybe the default ./configure is no good.
ID: 1938399 · Report as offensive
luz

Send message
Joined: 3 Jun 18
Posts: 18
Credit: 11,853
RAC: 0
Message 1938401 - Posted: 6 Jun 2018, 10:36:08 UTC - in response to Message 1938399.  

I see that ./configure includes --enable-[instruction] for various instructions, I'd assumed those were enabled by default, but I'm now compiling with explicit directives. Hopefully this will solve the issue.
ID: 1938401 · Report as offensive
luz

Send message
Joined: 3 Jun 18
Posts: 18
Credit: 11,853
RAC: 0
Message 1938402 - Posted: 6 Jun 2018, 10:45:24 UTC - in response to Message 1938401.  

I've built configuring with all the instructions enabled and editing the makefiles so that the only -march that appears in the flags is skylake-avx512 (./configure adds various -march flags automatically), but the estimated completion times are still long. I'll let it run and see what happens, maybe the estimates are wrong and will improve after completing units.
ID: 1938402 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1938403 - Posted: 6 Jun 2018, 10:47:52 UTC - in response to Message 1938402.  

maybe the estimates are wrong and will improve after completing units.
Yes, estimates are useless for testing purposes - they know nothing about the performance of your new app.
ID: 1938403 · Report as offensive
luz

Send message
Joined: 3 Jun 18
Posts: 18
Credit: 11,853
RAC: 0
Message 1938404 - Posted: 6 Jun 2018, 10:49:10 UTC - in response to Message 1938403.  

Good to know, I'll look out for run times of completed units.
ID: 1938404 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22204
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1938472 - Posted: 6 Jun 2018, 18:03:58 UTC - in response to Message 1938402.  

As Richard says the "estimates" are, initially, little more than guess work and of no real use to what you are trying to do.

There are two places to see how the times are panning out:
First is in your "Pending" list of task - these are tasks that have been completed, have not errored out and have not been validated.
Second is your "Valid" tasks list - this will be a few hours out of date, and only contains tasks that have validated.

In case you don't know where to find these two invaluable lists:
Pending list:
https://setiathome.berkeley.edu/results.php?hostid=8524703&offset=0&show_names=0&state=2&appid=

Valid List
https://setiathome.berkeley.edu/results.php?hostid=8524703&offset=0&show_names=0&state=4&appid=
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1938472 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22204
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1938605 - Posted: 7 Jun 2018, 16:56:57 UTC

If you want to compare your computer with "something comparable" I've set one of mine, with an 8-core/16 thread Ryzen 1700 under Mint, to do CPU only tasks for a bit - you can its progress here:
https://setiathome.berkeley.edu/results.php?hostid=8317875
A quick look at the times on your computer and mine shows that your execution times are between 3 and 5 times mine - which means I'm returning between 1.5 and 2.5 times as many tasks per hour as you are. There is obviously something amiss with your i9 :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1938605 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1938613 - Posted: 7 Jun 2018, 19:47:33 UTC

The lunatics.kwsn.info benchmark script was a very useful utility to compare stock app run times with the optimized versions. And it would be perfect for comparing your own compiled code since it does a "compare results" stage to catch any catastrophic errors. I did my own benchmarking of the avx, avx2, sse41, and sse42 optimized versions before settling on the sse41 for my own use. The results are dependent upon the internal cpu instruction efficiency so it's necessary to do your own benchmarking on your own cpu. (The benchmark script suspends all other Seti activity, while it runs, so you'll get times for the test app unaffected by thread/cache contention.) My system is a Ryzen 7 1700 very similar to the one @Grant has opened for you. Feel free to browse mine also. At the moment I've scaled back the thread limit to only allow 10 concurrent BOINC tasks, mainly to hold cpu temperatures to the low 70's during the heat of the day. (The 1060 GPU pumps a lot of heat into the case!) My recent experiments with hyper-threading are described in an earlier thread
https://setiathome.berkeley.edu/forum_thread.php?id=82965

Always happy to see another Linux host on-line! And have fun with your optimizing efforts.
ID: 1938613 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22204
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1938618 - Posted: 7 Jun 2018, 20:16:29 UTC

That's a useful comparison Gene - From memory I'm running the AVX app, which might explain why my times are about double yours.
I'm about due to do the annual dust bunny eviction on that computer as it is starting to get a bit warm over here and I''l take the opportunity to hook up a monitor and sort out a couple of annoyances. I may even stuff some more RAM into it - currently its only got 8Gb....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1938618 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1938627 - Posted: 7 Jun 2018, 22:16:08 UTC

Both Juan and myself also did extensive testing of the available CPU apps with the Lunatics benchmarking tools. I too settled on the SSE41 app as best for my Ryzens. Where Juan and I differed was that I found the SSE41 app also ran best and fastest on my i7-6850K system. Juan on the other hand always found the AVX2 app the fastest on his i7-6850K system. We never could figure out why the difference with only very minor differences in cpu and memory clocks and installed RAM.

We both used the Crunchers Anonymous TBar compiled apps for the SSE41 and AVX2 applications. The TBar SSE41 app is compiled differently than the Lunatics SSE41 resource mentioned earlier. The AVX2 app is new as that SIMD instruction didn't exist at the time that Urs compiled all the Linux apps at Lunatics.

In general both the Intel and Ryzen systems do a BLC cpu task in 32 minutes and a Arecibo standard around 51 minutes. These are the systems I am referring to:

Linux Intel Host 8480062
Windows10 Host 8030022
Linux Ryzen 1800 Host 8306366

To make is easier to find cpu tasks out of all the gpu task, here are 3 representative tasks.

http://setiathome.berkeley.edu/result.php?result_name=blc13_2bit_guppi_58185_66975_And_X_0022.12991.1636.21.44.148.vlar_0

http://setiathome.berkeley.edu/result.php?result_name=blc13_2bit_guppi_58185_64974_And_XI_off_0019.15308.0.22.45.83.vlar_1

http://setiathome.berkeley.edu/result.php?result_name=30no17ab.31828.11928.7.34.28.vlar_2
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1938627 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1938720 - Posted: 8 Jun 2018, 17:10:46 UTC

@Keith
I peeked at your host 8306366 and I think our respective cpu run times are reasonably comparable; your 1800X has a higher clock than my 1700 and I'm "mildly" multi-threading, with 10 tasks in 8 cores. We both seem to be using the 8.22r3711 sse41 app.
@rob
I (also) peeked at your host 8317875. If/when you do your dust-cleanout routine it might be a convenient time to do some quick benchmarking of the avx app you're running (appears to be the 3345 build) compared to the sse41 (3711 build) that Keith and I are using. If you're multi-threading all 16 potential threads that will "almost double" the cpu run times. It does increase the overall throughput but the side effect is that each task's run time is longer. (When the kernel allocates a task to a thread it has no way to measure the in-core delays for resource contention and has to assume elapsed time = cpu time.) Making allowance for all these effects it does seem your run times, about 7200 seconds, are longer than I would expect. --Another thought... is the cpu temp high enough to induce automatic clock throttling?? For Linux I use the xsensors package with the k10temp module to monitor the temp - running 69.2 C now with 9 BOINC + Firefox running.

<<not meaning to hijack this thread... hope the OP luz will find useful comments anyway>>
ID: 1938720 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1938762 - Posted: 8 Jun 2018, 23:36:19 UTC

@Gene. I run basically 8 cpu + 3 gpu task on my Ryzens. They all support 3 gpu cards. Only the Intel i7-6850K system supports 4 cards. I run 4 + 4 on that system. I prefer to run cpu tasks on the physical cores. A bias from my AMD FX systems I have always done. I know the Ryzens aren't the same and actually can support two threads on each core with each thread having a fpu register to itself, unlike the FX processors. But the mindset carried over to the Ryzens.

I use a script to assign cpu task affinities to the physical cores and leave the HT cores to support the gpu tasks. I think both task types run best that way.

I too apologize for hi-jacking the thread. Hopeful that luz saw my post about the compiling for OSX/CUDA thread which is showing some progress with compiling on the newer distributions for Ryzen/Threadripper.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1938762 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : SETIv8 for Linux skylake-avx512 available


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.