Posts by M_M

1) Message boards : Number crunching : Lunatics optimization for Ryzen, any plans? (Message 1901927)
Posted 4 days ago by Profile M_M
Post:
As the title say, are there any plans for this?

As I understand, Ryzen is pretty much different architecture then Intel, so it would make sense to get optimized path code for it, especially since there are more and more Ryzen and Threadripper systems out there...
2) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1899269)
Posted 18 days ago by Profile M_M
Post:
[quote]
The software being run has to actually be able to take advantage of that potential performance.
Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind.

So it is about how well is application is suited for particular architecture but mostly how well is written to use potential performance - and at the end it seems that currently, in general, nVidia is a bit better in SETI, and AMD is bit better in coin mining?

BTW if Linux special app is so much more efficient, why it isn't ported to Windows app? Is app so much reliant to underling OS, since CPU instruction set is the same and GPU drivers are probably very similar?
3) Message boards : Number crunching : AMD EPYC Benchmarks Smash the WinTel Hedgemony?... (Message 1891798)
Posted 25 Sep 2017 by Profile M_M
Post:

Looks like it. Just watched Paul's Hardware YT video where the 16 core and 18 core i9X trounced the 16 core AMD 1950X fairly soundly.
Intel 7980XE and 7960X vs AMD 1950X! 18-Core i9 Benchmarks & Review

Having in mind huge price difference, I would be surprised that wasn't the case...
4) Message boards : Number crunching : Ryzen and win 7 (Message 1891623)
Posted 24 Sep 2017 by Profile M_M
Post:
I also upgraded to Ryzen R7-1700 recently, Win10 and no issues so far.

I know that AMD AVX implementation is not so good as Intel, so using a SSE3 v8 app.

Wondering are there any plans for AMD Ryzen/Threadripper optimized apps in the near future, considering so much different architecture from Intel?
5) Message boards : Number crunching : AMD 290X vs RX 480 for seti/ DUAL NVIDIA 1070 vs Single 1080 (Message 1842988)
Posted 19 Jan 2017 by Profile M_M
Post:
If you are building a SETI dedicated cruncher, at this moment, best performance/watt and performance/$ will give you 2xGTX1070 on Linux with optimized Cuda apps... No such highly optimized apps available for AMD.
6) Message boards : Number crunching : Question about SOG (Message 1840630)
Posted 7 Jan 2017 by Profile M_M
Post:
Is it in essence mostly about "sleep" and timer accuracy?

If so, can in Windows HPET be used? Sure, it has to be enabled first as I think it is disabled by default...
7) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1828969)
Posted 7 Nov 2016 by Profile M_M
Post:
To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :(

And easy to overcome with available tools like NvidiaInspector.


I have tried it (Win10 and GTX1080) but I could't make it work.

This was possible on Maxwell but not on Pascal I think...
8) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1828885)
Posted 6 Nov 2016 by Profile M_M
Post:
To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :(
9) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1828803)
Posted 6 Nov 2016 by Profile M_M
Post:
Thanks Shaggie.

Any ideas on 980ti/1080 case?

Accoding to nVidia, 980ti is around 6TFLOPS and 1080 is around 9TFLOPS. Raw memory bandwidth wise they are almost the same but 1080 should alse have a benefits of better memory compression of around 20% as claimed by nVidia.
10) Message boards : Number crunching : MB v8: CPU vs GPU (in terms of efficiency) (Message 1816463)
Posted 11 Sep 2016 by Profile M_M
Post:
I think even the cheap power meters ($15-20) should be accurate enough to measure average power consumption, so why not try? My measurements at wall socket are below (I also have a APC SmartUPS that also draws some 5% from shown below).

My PC idle (i.e. ordinary desktop work, websurf etc) with 24" LCD is around 170W (100W in real idle with monitor sleeping).
With S&H running just on CPU, power draw is around 275W. (i7-2600k, overclocked to 4.5GHz)
With S&H running CPU + GTX1080, power draw is around 390W. So GTX1080 is responsible for around 115W draw, which is around 64% of its TDP, close to what I get as report from GPU-Z average power consumption.
11) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1816323)
Posted 11 Sep 2016 by Profile M_M
Post:
One another observation - In raw processing power (cores, but also nVidia declared TFlops) GTX1060 is basically "a half" of GTX1080, but it achieves here around 80% of its processing speed. Yet in games, it achieves of average just 60-65 % max, meaning that games are more easily taking advantage of high-end GPUs.

Also, Cr/Wh as calculated and presented here is rough picture since we have seen that actual TDP usage for different cards is different. GTX750Ti often goes above 80% TDP average usage, while for example GTX1080 with current application is below 65% of its TDP, regardless of CPU and number of GPU instances.
12) Message boards : Number crunching : MB v8: CPU vs GPU (in terms of efficiency) (Message 1815848)
Posted 9 Sep 2016 by Profile M_M
Post:
Just to mention, if efficiency is a primary concern, undervolting and underclocking your GPU could significantly boost its power efficiency. For example, if you underclock your GPU by just 10% (and undervolt by another 10-15%, actually as much as you can to still keep it 100% stable), your GPU power usage will go down by 25-30%. This is the primary way how mobile GPUs are selected, testing them slightly on lower clock and much lower voltage.

On other hand, this means that overclocking (especially with overvolting) is significantly decreasing the power efficiency, which is nothing new but people usually overlook.

Worth mentioning is also that GPU apps are far away from their optimal efficiency, which is not so much case for CPU apps. For example, Petri33 custom optimized nV GPU application is 2-2.5x more efficient (and 2.5-3x faster) then standard app, and he is convinced there is still space for further improvement.

Reason is that it is much harder to properly optimize GPU applications, due to GPUs heavy parallelism and various architectures.
13) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1812956)
Posted 27 Aug 2016 by Profile M_M
Post:
I can only hope that this means that the modern GPUs are merely under-utilized.


From discussion in this and another threads, I would say you are right... Seems that current applications cannot fully utilize modern GPUs. We have seen that Petri33 custom linux binary is about 2.5-3x more efficient then SoG, so obviously space for improvement exist (specially for new and high end GPUs).

Some patience is needed, but I have no doubt that it is just a matter of time when new optimized applications shall be available...
14) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1812508)
Posted 25 Aug 2016 by Profile M_M
Post:

Thank you for volunteering to test a Linux version.


Is there some similar Windows binary to test, I would gladly volunteer... ;)
15) Message boards : Number crunching : 1080 underclocking (Message 1810722)
Posted 20 Aug 2016 by Profile M_M
Post:
Yap, I have noticed this some time ago, and so far no known workaround to push it back to P0 during crunching... Seems like nVidia purposely locked compute to P2 with lower memory clock.

So effectively, nVidia for compute tasks (where it matters the most) is limiting memory bandwidth to lower then advertised 320GB/sec. Why, I don't know, this was never the case with 7x0 or earlier GPU series, but first seen recently on 9x0 (workaround using smi possible) and now on 10x0 (workaround not possible yet).
16) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1810657)
Posted 20 Aug 2016 by Profile M_M
Post:
Yes, I see that from your WUs, obviously there is lot of room for improvement.

I am sure that you, Richard, Raistmer and others are putting efforts to improve applications to use this "spare" GPU potential...
17) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1810302)
Posted 19 Aug 2016 by Profile M_M
Post:
i7-2600k is a 4core/8threads. I have experimented a bit and found out this as a optimal setting on my config for maximum RAC. Even though CPU load is shown as 100%, system (Win10) is reasonably responsive for normal work (surfing, office work etc).



If I set more then 50% CPU with 2 SoG in parallel, system is a bit less responsive and GPU WU times increases (visible also by lower GPU TDP, which is a better indication of GPU use then GPU load indicator itself). I have not experimented with CPU time usage limitation, so far it has been always set to 100%...
18) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1810299)
Posted 19 Aug 2016 by Profile M_M
Post:
Guppies on my GTX1080 takes about 13-14min each (latest SoG r3500, running 2 in parallel ). On my i7-2600k and I have limited CPU usage to 50% to be able to feed the monster properly and yet have a reasonably responsive system to work on at same time.

I am using command line switches as suggested in readme for high end GPUs:
-sbs 384 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64

Average TDP with Guppies is around 58% (over 70% with non-Guppies), average GPU load around 97%.
19) Message boards : Number crunching : OpenCL vs CUDA (Stock) (Message 1809220)
Posted 15 Aug 2016 by Profile M_M
Post:
I somehow expected this; Latest GPUs are not used well by now old Cuda 5.0, while OpenCL is a bit higher level programming and OpenCL driver itself is doing a better job optimizing the task to actual higher-end GPU hardware.

Sure, well written code in Cuda 7.5/8.0 would probably be even better but requires some additional human effort to be put in...
20) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808728)
Posted 12 Aug 2016 by Profile M_M
Post:
@Shaggie76: Is it possible to get some stats about SoG vs Cuda 5.0 on GTX7xx, GTX9xx and GTX10x0?


Next 20


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.