GPU FLOPS: Theory vs Reality

Author	Message
juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1970175 - Posted: 13 Dec 2018, 20:11:06 UTC @Shaggie76 Nice work. We have a new winner in performance x watt as expected, at least with OpenCL. But one question is unanswered: What happening if the same work is done with the ones who run Linux special builds? Most of the top Seti host are actually running this CUDA Special Sauce builds. ID: 1970175 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1970196 - Posted: 13 Dec 2018, 22:18:46 UTC - in response to Message 1970155. 1. I feel like there is a lot of background with how you compile this data. For example, is the credit/hour and credit/watt-hour calculated all-time, or within a timeframe? Do you have a running list of notes somewhere? It's all open source on github. The credit/hour is based on data crawled from the SETI tasks web pages and the power usage is coarse estimate based on the published TDP for stock cards (I picked the data from the wikiepedia tables). 2. I think I browsed one of your earlier posts, and a different NVIDIA card (the 970)? at one point had a higher credit/watt-hour rating. I'm curious what changed for that. It depends on the mix of short vs log work units at the time of the scan -- Arecibo work tends to be much faster but as we shift to more Greenbank data the performance characteristics change. 3. Do we know what CPU was used in tandem with the GPU credits? I know the CPU provides a minor role in the crunching of the WU, but I wonder if there is a significant difference between one CPU and another. I'm not tracking that data but the core script tracks CPU used by the GPU work units if you run it manually. If I recall correctly AMD Open CL and NVIDIA CUDA builds use less CPU than NVIDIA OpenCL builds but this may not matter so much with hyperthreading because even though the GPU tasks's CPU side is busy polling another thread can get work done on the same core because the ALU/FPU units aren't very busy. Personally prefer 100% GPU where possible because the work/watt is more favorable (and power is the limiting factor for me). ID: 1970196 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1970204 - Posted: 13 Dec 2018, 22:53:07 UTC Hi Shaggie76, I just looked through the github repository and believe there is no reason why you couldn't scan for anonymous hosts running the special app like Juan is always requesting. I see the -anon parameter option listed along with referencing a unique host ID. If you just passed a known anonymous host ID running the special app through to aggregate.pl that would be a proof of concept I would think. Do I have the correct grasp on the program? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1970204 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1970265 - Posted: 14 Dec 2018, 6:05:03 UTC - in response to Message 1970134. At last I have data: Looks like I might get myself an RTX 2070 for my birthday in a few months. GTX 1080 Ti (or better) crunching performance, at a maximum of 185W v 250W. Interesting to see that the GTX 750Ti is still in the top 10 for efficiency. Grant Darwin NT ID: 1970265 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1970295 - Posted: 14 Dec 2018, 13:39:54 UTC - in response to Message 1970204. Last modified: 14 Dec 2018, 13:40:23 UTC Hi Shaggie76, I just looked through the github repository and believe there is no reason why you couldn't scan for anonymous hosts running the special app like Juan is always requesting. I see the -anon parameter option listed along with referencing a unique host ID. If you just passed a known anonymous host ID running the special app through to aggregate.pl that would be a proof of concept I would think. Do I have the correct grasp on the program? Sure, you can do whatever you want, but like I've said before I am not interested in ranking this app until it's official (and when it is it'll just show up in my charts as the CUDA app like some of the vintage NVIDIA cards already do). If it doesn't produce the same results as the official build then using it is prioritizing "winning internet points" over "doing science" and I think that's losing sight of the real purpose of this project. ID: 1970295 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1970335 - Posted: 14 Dec 2018, 17:54:39 UTC - in response to Message 1970295. Hi Shaggie76, I just looked through the github repository and believe there is no reason why you couldn't scan for anonymous hosts running the special app like Juan is always requesting. I see the -anon parameter option listed along with referencing a unique host ID. If you just passed a known anonymous host ID running the special app through to aggregate.pl that would be a proof of concept I would think. Do I have the correct grasp on the program? Sure, you can do whatever you want, but like I've said before I am not interested in ranking this app until it's official (and when it is it'll just show up in my charts as the CUDA app like some of the vintage NVIDIA cards already do). If it doesn't produce the same results as the official build then using it is prioritizing "winning internet points" over "doing science" and I think that's losing sight of the real purpose of this project. The developer of the app has gone to great lengths to ensure the app DOES in fact produce the same results as any official build. Just like the Lunatics apps weren't considered "official" builds initially . . . . until they were promoted to be the stock apps. Also the "special" apps have been vetted countless times in the standard benchmark/validation applications to show they produce the same results as the stock apps. I have done so myself as well as many others. We even have a new developer that has created a modernized benchmark tool that is very easy to use and prove that the special apps verify against the stock apps all the time. To prove it yourself, just look at any host running the special app and you will see zero or very low Invalid task counts and at the same level as any stock app. So the special app produces just as valid "science" as any stock application. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1970335 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20254 Credit: 7,508,002 RAC: 20	Message 1970346 - Posted: 14 Dec 2018, 19:21:11 UTC - in response to Message 1970335. So the special app produces just as valid "science" as any stock application. Just to add: The "special apps" or "optimized apps" are just that: Optimized. They give the same results as far as is possible by making more efficient use of the hardware. An easy example is in how the new code takes advantage of more recent CPU features and for utilizing the compute features of GPUs. (And to do that, there's been some spectacular work done for the maths and testing...) Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 1970346 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1973135 - Posted: 3 Jan 2019, 5:28:06 UTC - in response to Message 1963448. A good start for vega would be. -sbs 2048 -period_iterations_num 1 -spike_fft_thresh 4096 -high_perf -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 @Mike, What would be a good start for the gpu side of the Amd Ryzen 5 2400G (integrated cpu/gpu) under Windows 10 (stock Seti apps)? I have a few tasks done. I think I have got the gpu set for its highest OC now although not earlier. I think it will now run maybe 40 minutes per gpu task. At least that is the latest it is getting ready to upload. Here is a proposed command line. -sbs 192 -spike_fft_thresh 2048 -tune 1 2 1 16 -period_iterations_num 10 -high_perf -high_prec_timer -tt 1600 -hp The spike/tune are from the "entry level" examples in the documentation. I can't figure out what a good -sbs should be for an "entry level" gpu. And the rest of the parameters are from my Nvidia experience. Presuming the gpu is still the fastest processor on this setup, I want to raise its productivity as high as possible. I will work on tweaking the cpu production later. I am not even sure that "entry level" is the right description for this Vega 11 gpu or not. Tom http://setiathome.berkeley.edu/show_host_detail.php?hostid=8645775 A proud member of the OFA (Old Farts Association). ID: 1973135 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 1973164 - Posted: 3 Jan 2019, 8:30:29 UTC - in response to Message 1970265. At last I have data: Looks like I might get myself an RTX 2070 for my birthday in a few months. GTX 1080 Ti (or better) crunching performance, at a maximum of 185W v 250W. Interesting to see that the GTX 750Ti is still in the top 10 for efficiency. The RTX 2060 is rumoured to being announced at CES2019 on the 7th of Jan with availability mid-January. It should be approx USD $150 cheaper than the RTX 2070. The Einstein apps however donâ€™t work with RTX at the moment. BOINC blog ID: 1973164 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1973167 - Posted: 3 Jan 2019, 8:45:18 UTC - in response to Message 1973164. Last modified: 3 Jan 2019, 8:46:43 UTC The RTX 2060 is rumoured to being announced at CES2019 on the 7th of Jan with availability mid-January. It should be approx USD $150 cheaper than the RTX 2070. The Einstein apps however donâ€™t work with RTX at the moment. I'm just hoping the prices start coming down. $850 is the starting price of RTX 2070s at the moment. The model i'm interested in is $900. I could (probably) go up to $800, but $900? Sheesh! Even at $2,000-$2,500 the RTX 2080Tis are almost permanently out of stock. I'm hoping over the next few months as more models come out, and production continues to ramp up/demand eases, prices will settle down from the ridiculous to just painful levels. fingers crossed Grant Darwin NT ID: 1973167 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1984636 - Posted: 12 Mar 2019, 1:27:02 UTC New this scan: NVIDIA GeForce RTX 2060's with excellent credit/watt and excellent performance. There were some GTX 1660 Ti's in the scan but not quiet enough to qualify. I'll re-scan in a few weeks and maybe there'll be enough by then. ID: 1984636 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1984689 - Posted: 12 Mar 2019, 6:46:00 UTC Thanks for the update, greatly appreciated. Will be interesting to see where the GTX 1660Ti ends up. Grant Darwin NT ID: 1984689 ·

Bernie Vine Volunteer moderator Volunteer tester Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328	Message 1984692 - Posted: 12 Mar 2019, 7:21:17 UTC Will be interesting to see where the GTX 1660Ti ends up. Indeed it will, first time I have had a card that is too new to be included ;-) ID: 1984692 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1984742 - Posted: 12 Mar 2019, 13:20:05 UTC Last modified: 12 Mar 2019, 13:20:22 UTC I know i ask before but is possible to run the same scan with the hosts who run Lnux builds instead of OpenCl builds? We could expect the similar performance or due the way the optimized Linux builds works the numbers will look different? Is just curiosity ID: 1984742 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1984752 - Posted: 12 Mar 2019, 13:56:09 UTC - in response to Message 1984742. One problem with that is a good majority of the Linux optimized uses also run multiple GPUs which are not shown in the graphs. ID: 1984752 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1984786 - Posted: 12 Mar 2019, 19:16:57 UTC - in response to Message 1984752. One problem with that is a good majority of the Linux optimized uses also run multiple GPUs which are not shown in the graphs. In this case we will never really know the real difference on the Linux boxes. Because the way the Petri highly optimized builds works it uses the memory in a different way than the standard builds. So i imagine, the type and the speed of the memory could make a huge difference there. ID: 1984786 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 1984788 - Posted: 12 Mar 2019, 19:25:16 UTC - in response to Message 1984752. One problem with that is a good majority of the Linux optimized uses also run multiple GPUs which are not shown in the graphs. is he only scanning single-GPU systems? i can understand the case of having 2 different GPUs in the system. where you can't just take the average. but in the event that all GPUs in use were the same type, you could just average them. But i understand you wont know if all GPUs are the same or which is which unless you inspect the stderr.txt file. just curious. it would be nice to see special app comparisons though. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 1984788 ·

Kissagogo27 Send message Joined: 6 Nov 99 Posts: 715 Credit: 8,032,827 RAC: 62	Message 1984913 - Posted: 13 Mar 2019, 10:27:31 UTC i think i remember that he scans only non anonymous hosts .. ID: 1984913 ·

Bill Volunteer tester Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60	Message 1985794 - Posted: 18 Mar 2019, 15:19:47 UTC - in response to Message 1984788. is he only scanning single-GPU systems? I am only speculating here, but perhaps he's grabbing information from the tasks themselves, which I assume identifies which GPU completed the task. I would also speculate that by doing so it would not know how many GPUs the computer is running. Just a guess without doing any back-tracking through this thread. Seeing how the RTX 2060 lines up compared to the RTX 2070 is interesting. In particular, these two GPUs are pretty close in line on the credit/watt, and the RTX 2080/2080 Ti are nowhere near as efficient. I wonder why that is, and I wonder why the low end of the bar is as low as it is for the RTX 2080s. Obviously more data would need to be collected, but I feel like the 2080s are an aberration. I would be curious to see if the 1660 and 1660 Tis line up more with the 2060/2070, or the 2080/2080Ti (with respect to credit/watt). Seti@home classic: 1,456 results, 1.613 years CPU time ID: 1985794 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1985802 - Posted: 18 Mar 2019, 15:46:28 UTC - in response to Message 1985794. Pretty sure he just scrapes the https://setiathome.berkeley.edu/gpu_list.php web page. So he is not looking at any card's stderr.txt output. No way to see how many cards a host is running. Then he just looks up the card's published TDP specification to match power used to credit produced. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1985802 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.