Message boards :
Number crunching :
Lunatics optimization for Ryzen, any plans?
Message board moderation
Author | Message |
---|---|
M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8 |
As the title say, are there any plans for this? As I understand, Ryzen is pretty much different architecture then Intel, so it would make sense to get optimized path code for it, especially since there are more and more Ryzen and Threadripper systems out there... |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I sorta doubt it. Our developers are stretched thin as it is. The legacy Intel code path of the current apps is sufficiently competent enough to keep Ryzen and Threadripper in the conversation. I too wish there were more choices for cpu apps in the Windows environment. Lots more cpu app choices in the Linux environment. The SSE4.1 app for Linux is head and shoulders faster than the fastest AVX Windows cpu app. Wish I had a SSE4.x app in Windows. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I sorta doubt it. Our developers are stretched thin as it is. The legacy Intel code path of the current apps is sufficiently competent enough to keep Ryzen and Threadripper in the conversation. I too wish there were more choices for cpu apps in the Windows environment. Lots more cpu app choices in the Linux environment. The SSE4.1 app for Linux is head and shoulders faster than the fastest AVX Windows cpu app. Wish I had a SSE4.x app in Windows. It's kind of "all who drink vodka in 1850 are currently dead hence vodka was poisoned in 1850" reasoning. Linux SSE4.1 app faster (I'll take it for grated) not because it's SSE4.1 over AVX but because different compilers were used to get binaries. There is no SSE4.1 path in opt CPU app at all. I rechecked that recently while reconstructing build environment on new device. And cause VC++ compiler has only SSE2, AVX and AVX2 code generation options there will be no SSE4.1 binary again [There will be SSE3 though. Cause there is hand-coded SSE3 path in source code that will emit SSE3 machine ops in binary]. One can attempt to grab "SSE4.1" (if there was such) FFTW library DLL and use it with app for speedup. Or one can use completely different toolchain to build real SSE4.1 binary in hope compiler of that toolchain has good enough auto-optimizer. But again, there is no separate SSE4.1 path in opt app(and as I recall in stock either). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I sorta doubt it. Our developers are stretched thin as it is. The legacy Intel code path of the current apps is sufficiently competent enough to keep Ryzen and Threadripper in the conversation. I too wish there were more choices for cpu apps in the Windows environment. Lots more cpu app choices in the Linux environment. The SSE4.1 app for Linux is head and shoulders faster than the fastest AVX Windows cpu app. Wish I had a SSE4.x app in Windows. OK, I guess my request should be directed at Urs and ask him either to port the AVX, AVX2, SSE41 and SSE42 code to Windows or to loan his machine and whatever compilers it has on it to somebody with the Windows code. There is a complete stock of every CPU variation at Lunatics The fftwfloat335.7z file is available that has all the libfftw library variations to match up with the compiled cpu app. So, why does the Lunatics installer have a specific radio button to choose the SSE4.1 app for AMD systems and even suggests to run that for better performance over the stock SSE3 app? I have two basically identical Ryzen systems running at the same frequency or close to it. One is running Linux and the other Windows 10. The LInux system runs the SSE41 app and the Windows 10 system runs the AVX app. The BLC25 with standard AR tasks run for 45-48 minutes on the Linux system and the same BLC25 tasks with standard AR run for 55-62 minutes with the Windows 10 system. [Addendum] I forgot to mention I was previously using the AVX app on the Linux system after it was build until someone in the forums suggested the SSE41 app as faster. I tried and changed to the SSE41 app and have validated that claim. So I have baselines for each app in Linux. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Hm. Well, today is day of repetition it seems. OK, I'll repeat - there is no SSE4.1 code path in opt app. New statement: processing code for Linux and Windows is the same. Code paths are different only in OS-specific detais. Repeat: difference comes from toolchains. New statement: on Linux Urs used GCC AFAIK, on Windows I used VC++. So, try to find volunteer to cross-compile with GCC under Windows. Or try to find volunteer to restore MinGW config Joe Segur used for prev family of opt apps. They definitely were faster (at those times direct comparison was possible) so it smth worth to do. Regarding Lunatics installer - most probably just leftover from those times. There was SSE4.1 SETI7 app. BTW, while you in test mood- try to run SSE3 app under Linux. How big is difference in speed if any? To be precise, this one http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=481 over this one http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=482 SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And regarding thread title - what makes Ryzen to be different? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, let me be clear, I am not a developer and do not have the skills or software necessary to compile apps. I still do not understand your statement that all the apps use the same code. Did Urs simply create one file, the optimized SSE3 app and then make four more exact copies of the file and simply give them different names that indicate they use a different SSE code path? What would the purpose be? Why do the SSE2, SSSE3, SSE41 and SSE42 r3306 apps exist at all then? I do see that all the r3306 apps have the same file size. Is there truth in your statement that all the apps are the same? I do see that the r3345 AVX app has a different file size, so I would conclude it DOES have different code in it compared to the r3306 app. This is the original app I was running after I built the Linux system back in July. I ran it exclusively up till the third week of October when I switched to the SSE41 app. On my system, the SSE41 app is undisputably faster than the AVX app. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
And regarding thread title - what makes Ryzen to be different? It is a completely new and different architecture than the previous FX processors and is also very different on many factors compared to current Intel architecture. The entire gaming industry is having to learn how to code for Ryzen now. They have learned that the Intel code pathways they have been using for the past ten years are sub-optimal for the Ryzen architecture and if no changes are made to their games, huge amounts of performance is being left on the table. Read the very good synopsis on the Ryzen architecture at PC Perspective AMD-Ryzen-7-1800X-Review-Now-and-Zen Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, I'll repeat - there is no SSE4.1 code path in opt app. I'm sorry Raistmer, you are going to have dumb it way down for me. I just do not comprehend how an app that has SSE41 function calls can work on a cpu that only has SSE3 capabilities. My Ryzen's have all MMX capabilities and SSE capabilities up to AVX2 but excluding AVX512 which is a recent development in the latest Intel generation. At Lunatics, you are warned to correctly choose the right app for your operating system (32bit or 64bit) and also to verify exactly what SSE functions your cpu supports before downloading the correct app. I am speaking only about the Linux environment since that is the only one that has all possible SSE type apps available. I am only comparing the different Linux cpu apps. Referring back to my original post in this thread, I was only wishing for a SSE41 Windows app since I have seen such a large improvement over the AVX app. I was just stating I would like to see the same choices of SSE function apps on Windows. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
As I understand it (and Raistmer is free to correct me), the basic code is the same for many of the apps. When you compile the code, you ask the complier to try and optimize for specific instruction sets, and it will try to change some of the coding, so that it uses newer instructions, whenever possible and beneficial. The end result of this kind of optimization is very reliant upon the compilers ability to do this well, and probably to some extend, the person running the compiles, and his abilities to use the correct parameters. The last part probably comes down to a lot of trial and error. In the code itself, there is some handwritten very well optimized code, that the compiler is told to keep its hands of. That is probably the SSE3 code base? It would be nice to have a more Ryzen specific codepath, but its probably not a small job, and there would have to be a lot of experimentation. I once tested on such a project on older AMD hardware at lunatics, where the resulting AMD specific code was 5-10% faster. It can probably be done, and there could be large benefits, but it takes a lot of time. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
No. There are high-level (C/C++) language sources (program) and machine code binary instructions (binary executable). Compiler translates one to another. What I'm saying - high-level code, specific for SSE4.1, is absent. There is such code paths for SSE and SSE3 for example. So compiler takes different source to make SSE and SSE3. In case of SSE41 the difference in speed comes not from any SSE4.1 specific optimizations done by programmers but just from differencies between compilers - how they translate high-level code into machine code. GCC can emit SSE4.1 instructions on its own. VC++ - doesn't. So, if programmer didn't directly tell use such SSE4.1 instuction (by using so called intrinsics, for example) resulting binary will not use SSE4.1 machine instructions at all. So, Urs' binaries are different. But only because of used compiler/toolchain. If one find correspondingly good compiler for Windows (before we use ICC for example) - Windows binaries become faster too. Ad it doesn't require any additional development (that is, source code writing), just adaptation of build process to new toolset. In other words, codebase should be ported to the new toolset. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I'll try, thanks. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It will not. It will call SSE3-based ones instead (how stock works) or one should use another binary (how opt app works).
exactly.
And I trying to explain that the reason of speed difference lies not in SSE-level of app per se. BTW, speed improvement on your host (AMD?) can come just from poor implementation of AVX instruction set. We had that before, when SSE2 binaries were faster than SSE3 ones on early AMD processors. Still proposed experiment willbe quite interesting: try to replace your current SSE4.1 Linux binary with SSE3 one (links posted above). And run it for few days - will it slower faster or approx the same on Ryzen? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
As I understand it (and Raistmer is free to correct me), the basic code is the same for many of the apps. yep. Hope your english will beat mine in explanatory strength :) SETI apps news We're not gonna fight them. We're gonna transcend them. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, I can do that. I ran exactly one task using the r3305 SSSE3 Linux app that TBar included in his zi3v Linux package when I first built the system and installed Linux. I then updated the app to the r3345 AVX Linux app simply because the Windows AVX app was faster than the optimized SSE3 app on my first Windows Ryzen system and I expect the same response from the Ryzen Linux system. So you want the basic SSE3 app run for a few days for comparison, correct? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, so there are a lot of benchmark tools at Lunatics. I even see a lot of files with your name Raistmer, attributed. I only thought you worked in the Windows environment. Should I run some of the benchmark tools and test tasks with the suggested SSE41 and SSE3 apps with the Lunatics test tools. Or do you want just the SETI Main tasks that get sent to my host. Which method would provide the best comparisons and information for you. I don't have any experience with the Lunatics test tools. It looks like I need to MAKE some of the files myself. I have never compiled anything so far in Linux. I would probably need some handholding. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
I did some benches on my Ryzen under Linux after i built the system. SSE4.1 was fastest followed by AVX so i dont think your bench would be much different. With each crime and every kindness we birth our future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
OK, so there are a lot of benchmark tools at Lunatics. I even see a lot of files with your name Raistmer, attributed. I only thought you worked in the Windows environment. Should I run some of the benchmark tools and test tasks with the suggested SSE41 and SSE3 apps with the Lunatics test tools. Or do you want just the SETI Main tasks that get sent to my host. Which method would provide the best comparisons and information for you. I don't have any experience with the Lunatics test tools. It looks like I need to MAKE some of the files myself. I have never compiled anything so far in Linux. I would probably need some handholding. Benchmark would be more precise until one collect big statistics for online runs. And no, it doesn't need any compilation. Just to put needed files in needed places and to run script. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I did some benches on my Ryzen under Linux after i built the system. For what degree? What % difference? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
I did some benches on my Ryzen under Linux after i built the system. Just 1% - 2%. On some tasks even SSSE3 was faster than AVX. You can view the results on Lunatics i posted them in my Mint thread. With each crime and every kindness we birth our future. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.