Posts by M_M

1) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1828969)
Posted 7 Nov 2016 by Profile M_M
Post:
To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :(

And easy to overcome with available tools like NvidiaInspector.


I have tried it (Win10 and GTX1080) but I could't make it work.

This was possible on Maxwell but not on Pascal I think...
2) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1828885)
Posted 6 Nov 2016 by Profile M_M
Post:
To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :(
3) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1828803)
Posted 6 Nov 2016 by Profile M_M
Post:
Thanks Shaggie.

Any ideas on 980ti/1080 case?

Accoding to nVidia, 980ti is around 6TFLOPS and 1080 is around 9TFLOPS. Raw memory bandwidth wise they are almost the same but 1080 should alse have a benefits of better memory compression of around 20% as claimed by nVidia.
4) Message boards : Number crunching : MB v8: CPU vs GPU (in terms of efficiency) (Message 1816463)
Posted 11 Sep 2016 by Profile M_M
Post:
I think even the cheap power meters ($15-20) should be accurate enough to measure average power consumption, so why not try? My measurements at wall socket are below (I also have a APC SmartUPS that also draws some 5% from shown below).

My PC idle (i.e. ordinary desktop work, websurf etc) with 24" LCD is around 170W (100W in real idle with monitor sleeping).
With S&H running just on CPU, power draw is around 275W. (i7-2600k, overclocked to 4.5GHz)
With S&H running CPU + GTX1080, power draw is around 390W. So GTX1080 is responsible for around 115W draw, which is around 64% of its TDP, close to what I get as report from GPU-Z average power consumption.
5) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1816323)
Posted 11 Sep 2016 by Profile M_M
Post:
One another observation - In raw processing power (cores, but also nVidia declared TFlops) GTX1060 is basically "a half" of GTX1080, but it achieves here around 80% of its processing speed. Yet in games, it achieves of average just 60-65 % max, meaning that games are more easily taking advantage of high-end GPUs.

Also, Cr/Wh as calculated and presented here is rough picture since we have seen that actual TDP usage for different cards is different. GTX750Ti often goes above 80% TDP average usage, while for example GTX1080 with current application is below 65% of its TDP, regardless of CPU and number of GPU instances.
6) Message boards : Number crunching : MB v8: CPU vs GPU (in terms of efficiency) (Message 1815848)
Posted 9 Sep 2016 by Profile M_M
Post:
Just to mention, if efficiency is a primary concern, undervolting and underclocking your GPU could significantly boost its power efficiency. For example, if you underclock your GPU by just 10% (and undervolt by another 10-15%, actually as much as you can to still keep it 100% stable), your GPU power usage will go down by 25-30%. This is the primary way how mobile GPUs are selected, testing them slightly on lower clock and much lower voltage.

On other hand, this means that overclocking (especially with overvolting) is significantly decreasing the power efficiency, which is nothing new but people usually overlook.

Worth mentioning is also that GPU apps are far away from their optimal efficiency, which is not so much case for CPU apps. For example, Petri33 custom optimized nV GPU application is 2-2.5x more efficient (and 2.5-3x faster) then standard app, and he is convinced there is still space for further improvement.

Reason is that it is much harder to properly optimize GPU applications, due to GPUs heavy parallelism and various architectures.
7) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1812956)
Posted 27 Aug 2016 by Profile M_M
Post:
I can only hope that this means that the modern GPUs are merely under-utilized.


From discussion in this and another threads, I would say you are right... Seems that current applications cannot fully utilize modern GPUs. We have seen that Petri33 custom linux binary is about 2.5-3x more efficient then SoG, so obviously space for improvement exist (specially for new and high end GPUs).

Some patience is needed, but I have no doubt that it is just a matter of time when new optimized applications shall be available...
8) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1812508)
Posted 25 Aug 2016 by Profile M_M
Post:

Thank you for volunteering to test a Linux version.


Is there some similar Windows binary to test, I would gladly volunteer... ;)
9) Message boards : Number crunching : 1080 underclocking (Message 1810722)
Posted 20 Aug 2016 by Profile M_M
Post:
Yap, I have noticed this some time ago, and so far no known workaround to push it back to P0 during crunching... Seems like nVidia purposely locked compute to P2 with lower memory clock.

So effectively, nVidia for compute tasks (where it matters the most) is limiting memory bandwidth to lower then advertised 320GB/sec. Why, I don't know, this was never the case with 7x0 or earlier GPU series, but first seen recently on 9x0 (workaround using smi possible) and now on 10x0 (workaround not possible yet).
10) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1810657)
Posted 20 Aug 2016 by Profile M_M
Post:
Yes, I see that from your WUs, obviously there is lot of room for improvement.

I am sure that you, Richard, Raistmer and others are putting efforts to improve applications to use this "spare" GPU potential...
11) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1810302)
Posted 19 Aug 2016 by Profile M_M
Post:
i7-2600k is a 4core/8threads. I have experimented a bit and found out this as a optimal setting on my config for maximum RAC. Even though CPU load is shown as 100%, system (Win10) is reasonably responsive for normal work (surfing, office work etc).



If I set more then 50% CPU with 2 SoG in parallel, system is a bit less responsive and GPU WU times increases (visible also by lower GPU TDP, which is a better indication of GPU use then GPU load indicator itself). I have not experimented with CPU time usage limitation, so far it has been always set to 100%...
12) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1810299)
Posted 19 Aug 2016 by Profile M_M
Post:
Guppies on my GTX1080 takes about 13-14min each (latest SoG r3500, running 2 in parallel ). On my i7-2600k and I have limited CPU usage to 50% to be able to feed the monster properly and yet have a reasonably responsive system to work on at same time.

I am using command line switches as suggested in readme for high end GPUs:
-sbs 384 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64

Average TDP with Guppies is around 58% (over 70% with non-Guppies), average GPU load around 97%.
13) Message boards : Number crunching : OpenCL vs CUDA (Stock) (Message 1809220)
Posted 15 Aug 2016 by Profile M_M
Post:
I somehow expected this; Latest GPUs are not used well by now old Cuda 5.0, while OpenCL is a bit higher level programming and OpenCL driver itself is doing a better job optimizing the task to actual higher-end GPU hardware.

Sure, well written code in Cuda 7.5/8.0 would probably be even better but requires some additional human effort to be put in...
14) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808728)
Posted 12 Aug 2016 by Profile M_M
Post:
@Shaggie76: Is it possible to get some stats about SoG vs Cuda 5.0 on GTX7xx, GTX9xx and GTX10x0?
15) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808569)
Posted 11 Aug 2016 by Profile M_M
Post:
The generic GPU usage tells only the first SMX usage.

So this is then a catch - most developers probably fully rely on generic GPU usage indication when optimizing their GPU code and judging if their code is squeezing maximum from GPU, but this is wrong as they should actually more rely on power consumption if they want to optimize code for maximum efficiency and performance...
16) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808446)
Posted 11 Aug 2016 by Profile M_M
Post:
I am also a bit surprised that difference between GTX1080 and GTX1070 is so small, since GTX1080 has 33% more compute units (2560 vs 1920 shaders) and 25% faster memory (10GHz vs 8GHz), and its even a bit higher clocked, so something is holding GTX1080 back? Even nVidia was advertizing GTX1080 as 8.9 TFLOPS and GTX1070 as 6.5 TFLOPS.

I would guess that current application implementation is not using its extra resources well... Maybe time for some new, optimized application?
17) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808366)
Posted 10 Aug 2016 by Profile M_M
Post:
As I know, it is float 32bit, i.e. single precision (SP), where nVidia is in general slightly faster in same price bracket. However, in DP AMD is usually faster as nVidia is "saving" DP performance for much more expensive dedicated compute cards like Tesla P100 for example.
18) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808343)
Posted 10 Aug 2016 by Profile M_M
Post:
AMD Fiji is R9 Fury series, i.e. still AMD top performance series, waiting to be replaced soon, since it cannot compete to new nVidia GPUs.

However, R9 Fury/Fury X is still more powerful comparing to Ellemere RX470/480, which are new AMD performance/power efficient middle level GPU, with overall performance level similar to AMD Hawaii R9 290/290x.
19) Message boards : Number crunching : Better Maxwell/Pascal support with new Cuda (Message 1808141)
Posted 9 Aug 2016 by Profile M_M
Post:
Probably most of the improvements medium term will come from recent contributions from Petri33. Longer term probably looking at trying to leverage some of the ai targeted features not explored in setiathome code yet (longer term prospect).


Yes, I noticed efficiency of Petri33 crunching with his custom applications, 2-3x faster then stock applications. He is obviously very skilled programmer, so it is very good if he is willing to contribute to whole community.

For long term, I agree, we should always be open minded willing to explore new features and techniques to improve "the quest"
20) Message boards : Number crunching : Better Maxwell/Pascal support with new Cuda (Message 1808074)
Posted 9 Aug 2016 by Profile M_M
Post:
Any plans for optimized Cuda 7.5/8.0RC binaries for nVidia GPUs with Compute Capability 5.0 or higher? Seems that many improvements and optimizations have been put in since Cuda 5.0.


Next 20


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.