GPU FLOPS: Theory vs Reality

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1997134 - Posted: 7 Jun 2019, 4:55:20 UTC - in response to Message 1997120. Thanks for the updated charts and stats, Shaggie76. Yep, greatly appreciated. I have my sights set on the 2070 as the best compromise to replace the 1070's. I'm happy with my RTX 2060, very good performance for a much more affordable price (another $200 (minimum) for an RTX 2070). Grant Darwin NT ID: 1997134 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1997151 - Posted: 7 Jun 2019, 7:41:43 UTC - in response to Message 1997134. The 2060 won't beat the performance of my 1070's. So I would be downgrading. Only up direction for me. I need to get the Threadripper ahead of the Intel host production. It is being embarrassed and with its horsepower advantage, shouldn't be. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1997151 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1997153 - Posted: 7 Jun 2019, 8:04:11 UTC - in response to Message 1997151. The 2060 won't beat the performance of my 1070's. For me running SoG, it's a big step up. Grant Darwin NT ID: 1997153 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1997167 - Posted: 7 Jun 2019, 11:42:14 UTC - in response to Message 1997151. The 2060 won't beat the performance of my 1070's. So I would be downgrading. Only up direction for me. I need to get the Threadripper ahead of the Intel host production. It is being embarrassed and with its horsepower advantage, shouldn't be. How many MB slots do you have available? Maybe just adding gtx 1070's would help. It is possible that 1 slot and a 1 to 4 expander would make a difference. Tom A proud member of the OFA (Old Farts Association). ID: 1997167 ·

Bill Volunteer tester Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60	Message 1997180 - Posted: 7 Jun 2019, 12:32:31 UTC - in response to Message 1997116. New this scan, Radeon RX 590, GeForce GTX 1660 Ti, and GeForce RTX 2060: the 1660 Ti looks to be a new leader in performance/watt and the 2060 looks like an excellent economy option! Thank you for the updated chart, Shaggie! I was starting to wonder when a new one would come out. I noticed that the GTX 1660 (non-Ti), GTX 1650, and even the Radeon VII are not on this chart. I'm assuming there is not enough of a sample size to include them, or is there another reason? Also, if/when the GTX 1650 is put on the chart, how will you compute the Credit/Watt-Hour for this card since some run power off the motherboard and others have a power connector? Seti@home classic: 1,456 results, 1.613 years CPU time ID: 1997180 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1997206 - Posted: 7 Jun 2019, 16:54:05 UTC - in response to Message 1997180. I noticed that the GTX 1660 (non-Ti), GTX 1650, and even the Radeon VII are not on this chart. I'm assuming there is not enough of a sample size to include them, or is there another reason? Also, if/when the GTX 1650 is put on the chart, how will you compute the Credit/Watt-Hour for this card since some run power off the motherboard and others have a power connector? Yeah, not enough 1650's yet. There might of been enough Radeon VIIs but I couldn't find TDP specs for them so I left that for the next scan since I was out of time. Since I can't measure actual power draw I cross-reference the published TDP specs from Wikipedia assuming that all vendors and cards will have approximately the same variation in overclocked parts etc. ID: 1997206 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1997208 - Posted: 7 Jun 2019, 16:59:45 UTC - in response to Message 1997167. The 2060 won't beat the performance of my 1070's. So I would be downgrading. Only up direction for me. I need to get the Threadripper ahead of the Intel host production. It is being embarrassed and with its horsepower advantage, shouldn't be. How many MB slots do you have available? Maybe just adding gtx 1070's would help. It is possible that 1 slot and a 1 to 4 expander would make a difference. Tom Don't have any slots available as I prefer to install into native slots. Not ready for the pain of finding what does and does not work with mining hardware. Both you and Ian have convinced me the trouble is to be avoided. The issue is that the two 1070's in the Threadripper can't compete with the two 1080's in the Intel host. The clocks and memory of the 1080's is too much an advantage. I could level the playing field with replacing the two 1070's with new 2070's. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1997208 ·

Bill Volunteer tester Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60	Message 1997209 - Posted: 7 Jun 2019, 17:14:47 UTC - in response to Message 1997206. Yeah, not enough 1650's yet. There might of been enough Radeon VIIs but I couldn't find TDP specs for them so I left that for the next scan since I was out of time. Since I can't measure actual power draw I cross-reference the published TDP specs from Wikipedia assuming that all vendors and cards will have approximately the same variation in overclocked parts etc. 300 watts for the Radeon VII. Thanks for the info! Seti@home classic: 1,456 results, 1.613 years CPU time ID: 1997209 ·

Retvari Zoltan Send message Joined: 28 Apr 00 Posts: 35 Credit: 128,746,856 RAC: 230	Message 1997218 - Posted: 7 Jun 2019, 18:25:42 UTC Last modified: 7 Jun 2019, 18:48:45 UTC My RTX 2080Ti yields 2-3 credits per second with the CUDA 10.1 special app. That equals 7,200-10,800 credits per hour, or 172,800-259,200 credits per day. The highest credit per day in 15 days was 183,325 credits (7,638 credits per hour). This host experiences download hiccups, and I don't use the "hacked" BOINC client, so it can queue only 100 workunits, which is pretty low. The longest WU I could spot took 51 seconds, the average is 19-35-44 seconds. So 100 WUs are processed in about an hour, which is inadequately short, as the download and project backoffs quickly escalate above that after a couple of unsuccessful attempts. In addition the BOINC manager doesn't ask for more work if there are downloads pending, this behavior is quite annoyingly restricts the maximum achievable CPD. ID: 1997218 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1997222 - Posted: 7 Jun 2019, 18:52:11 UTC - in response to Message 1997218. Well hello stranger. ;-> Good to see you over here in the alt-universe from GPUGrid. We've been lamenting the low task allocation limits in the world of GPU computing for years. Probably never going to change since the powers that be still think we crunch on 2 core Pentiums. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1997222 ·

Retvari Zoltan Send message Joined: 28 Apr 00 Posts: 35 Credit: 128,746,856 RAC: 230	Message 1997229 - Posted: 7 Jun 2019, 20:03:37 UTC - in response to Message 1997222. Judging by the graph, the CUDA10.1 special client is about 6 times faster than the "official" OpenCL one. As an "outsider" I don't understand why cripples the project itself on purpose by not making it an official app. (Sorry, I won't dig the 2 million post on the forum for a self-justificatory explanation.) I consider it an awful waste of ~~space~~ computing power. Obviously the "powers that be" doesn't. ID: 1997229 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1997231 - Posted: 7 Jun 2019, 20:23:24 UTC When and IF, the Linux special app code is ported to "mainstream" Windows and tested on Seti Beta, the likelihood of seeing the app here as stock is nil. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1997231 ·

AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266	Message 1997246 - Posted: 7 Jun 2019, 21:34:41 UTC Shaggy, Thanks for bringing this up. I've only run a cursory check through the thread, but I have seen some discussion of the variances in settings. Here is something which I've come across in researching these settings, and I would personally like some clarifications of where/how some of these settings come. First, a little reading which I found intriguing on the subject of performance of AMD processors in computing: https://www.karlrupp.net/2016/01/gpu-memory-bandwidth-vs-thread-blocks-cuda-workgroups-opencl/ From what I'm reading here, and keeping in mind this is 3 years old, the setting of workgroups to a larger size appears to be degrading the memory bandwidth utilization of the AMD cards, which I believe might be explaining some of this. The current cards use HBM which is specifically intended for High Bandwidth, very wide parallelization, which seemingly requires very large numbers of logical threads running on these cards. The article appears to be looking at the older W9100 which has 4096 physical threads, and discusses that the best utilization of memory throughput happens at 32 workgroups of 128 threads. This got me to thinking on my RX 580 and Vega 56, because I had been setting my WG size at 256, because larger is better. Right? With this setting, I was running three instances of seti per card. After I read that article, I immediately lowered my workgroup size to 64, because we are seemingly limited to this, and I increased my number of instances to 8. What I found over the next 4 days was a nice increase in credits per day. Unfortunately, I'm now in the middle of upgrading my cards to the Vega Fronteir Editions, so I can't keep looking at this specific issue with those cards. My quick questions would be what are the default settings for people who do no tinker with their settings, as I'm sure many won't. If the defaults are for a low number of workgroups with a large size, we may need to rethink our default settings and the reasoning why. The current limitation of the size being no smaller than 64 limits me to 64 workgroups, and won't allow me to take advantage of these larger HBM sizes on my new Frontier Editions to their fullest. Where is this limit imposed? Is it Apple? AMD? Seti? BOINC? We could be missing a big opportunity for these high end cards being maximized. Thoughts? Thanks, Guy ID: 1997246 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1997255 - Posted: 7 Jun 2019, 22:07:21 UTC - in response to Message 1997246. After I read that article, I immediately lowered my workgroup size to 64, because we are seemingly limited to this, and I increased my number of instances to 8. What I found over the next 4 days was a nice increase in credits per day. Even if those changes reduced your performance, you would still get a Credit increase, as everyone on the project has been getting for the last few days, due to the re-introduction of Arecibo work. Run times are the only accurate indicator of performance. Of course running multiple WUs on a video card will reduce the accuracy of that as different types of WU running at the same time can have either no impact on each other's run time, or a huge impact depending on the type of WUs involved (but it's still a better indicator than RAC (or anything that involves Credit)). RAC is OK for comparing systems, it's just not of much use for determining system performance of different system settings due to it's inherent variability & dampened rate of variation. Grant Darwin NT ID: 1997255 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1997262 - Posted: 7 Jun 2019, 22:26:43 UTC - in response to Message 1997208. The 2060 won't beat the performance of my 1070's. So I would be downgrading. Only up direction for me. I need to get the Threadripper ahead of the Intel host production. It is being embarrassed and with its horsepower advantage, shouldn't be. How many MB slots do you have available? Maybe just adding gtx 1070's would help. It is possible that 1 slot and a 1 to 4 expander would make a difference. Tom Don't have any slots available as I prefer to install into native slots. Not ready for the pain of finding what does and does not work with mining hardware. Both you and Ian have convinced me the trouble is to be avoided. The issue is that the two 1070's in the Threadripper can't compete with the two 1080's in the Intel host. The clocks and memory of the 1080's is too much an advantage. I could level the playing field with replacing the two 1070's with new 2070's. Or swaping the 1080's and the 1070's ;) Tom A proud member of the OFA (Old Farts Association). ID: 1997262 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1997264 - Posted: 7 Jun 2019, 22:31:43 UTC - in response to Message 1997262. Or swaping the 1080's and the 1070's ;) Unfortunately, the 1080's are EVGA AIO cards. I don't have the physical room for the radiators in the custom loop chassis as the twin 360 radiators already commandeer all the available mounting space. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1997264 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1997278 - Posted: 7 Jun 2019, 23:43:23 UTC - in response to Message 1997264. Or swaping the 1080's and the 1070's ;) Unfortunately, the 1080's are EVGA AIO cards. I don't have the physical room for the radiators in the custom loop chassis as the twin 360 radiators already commandeer all the available mounting space. Darn. What you need is a "TEXAS SIZED" MB one with a LOT of space for your "steers". :) I wonder if one of these X570's would be a reasonable slot spacing up grade? Tom A proud member of the OFA (Old Farts Association). ID: 1997278 ·

AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266	Message 1997281 - Posted: 7 Jun 2019, 23:46:53 UTC - in response to Message 1997255. Last modified: 7 Jun 2019, 23:53:04 UTC [quoteEven if those changes reduced your performance, you would still get a Credit increase, as everyone on the project has been getting for the last few days, due to the re-introduction of Arecibo work.[/quote] I'm not sure I've seen any Arecibo work. If you're referring to astropulse work, I know I haven't seen this, I am still getting the error of unknown app in app_config when I reload config files. There really isn't a good metric to judge all of this on outside of credit unfortunately, either daily credit or the lagging calculations for RAC. I wish I had a better metric to judge this on, perhaps a FLOPS/day or something along this line. I'm not the guy with the PhD in computational science though, and his arguments make sense. We look at a lot of different metrics as a single unit, and he seems to be applying more of a vector that we are using FLOPS, speed, or some other metric without taking the whole picture into account. **Edit Regardless, I'll never know on those cards. The RX 580 is out of the picture now. Just got my second Breakaway Box 650, awaiting the second Frontier Edition to arrive to replace the Vega 56. It'll be nice to have two identical cards on the system for a change. ID: 1997281 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1997283 - Posted: 7 Jun 2019, 23:54:44 UTC - in response to Message 1997278. Or swaping the 1080's and the 1070's ;) Unfortunately, the 1080's are EVGA AIO cards. I don't have the physical room for the radiators in the custom loop chassis as the twin 360 radiators already commandeer all the available mounting space. Darn. What you need is a "TEXAS SIZED" MB one with a LOT of space for your "steers". :) I wonder if one of these X570's would be a reasonable slot spacing up grade? Tom The motherboard isn't the problem. The computer case is the problem. No more room to mount the AIO radiators. The cards are the standard two slot wide cards of the reference design, just with 120mm radiators attached. I have both the Thermaltake X5 and X9 cases so they are about as big as they come. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1997283 ·

Retvari Zoltan Send message Joined: 28 Apr 00 Posts: 35 Credit: 128,746,856 RAC: 230	Message 1997284 - Posted: 7 Jun 2019, 23:55:30 UTC - in response to Message 1997247. Last modified: 7 Jun 2019, 23:57:08 UTC I made some power measurements. The power consumption of my system: idle: 130W running SETI@home v8 8.22 (opencl_nvidia_SoG) blc25_2bit_guppi: 260W (average) 210~275W, most of the time in the 250~275W range running SETI@Home V8 8.01 (CUDA10.1) blc25_2bit_guppi: 360W (average) 350~385W, most of the time in the 355~365W range Let's calculate how much energy needed to process a similar workunit: SETI@home v8 8.22 (opencl_nvidia_SoG) blc25_2bit_guppi: 260W217s=56,420J SETI@Home V8 8.01 (CUDA10.1) blc25_2bit_guppi: 355W 35s=12,600J The ratio of the energy used for a single workunit (aka the ratio of their credit/watt-hour): 56,420J/12,600J=4.478 To process a wu, the CUDA 10.1 client uses less than the quarter of the energy the Offical OpenCL client uses. To put it in another way: the CUDA 10.1 client earns 4.478 times the credit for the same watt-hour the Offical OpenCL client earns. So using the Official OpenCL client on NVidia cards is also an awful waste of energy. ID: 1997284 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.