Posts by M_M


log in
1) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1812508)
Posted 1 day ago by Profile M_M

Thank you for volunteering to test a Linux version.


Is there some similar Windows binary to test, I would gladly volunteer... ;)
2) Message boards : Number crunching : 1080 underclocking (Message 1810722)
Posted 6 days ago by Profile M_M
Yap, I have noticed this some time ago, and so far no known workaround to push it back to P0 during crunching... Seems like nVidia purposely locked compute to P2 with lower memory clock.

So effectively, nVidia for compute tasks (where it matters the most) is limiting memory bandwidth to lower then advertised 320GB/sec. Why, I don't know, this was never the case with 7x0 or earlier GPU series, but first seen recently on 9x0 (workaround using smi possible) and now on 10x0 (workaround not possible yet).
3) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1810657)
Posted 7 days ago by Profile M_M
Yes, I see that from your WUs, obviously there is lot of room for improvement.

I am sure that you, Richard, Raistmer and others are putting efforts to improve applications to use this "spare" GPU potential...
4) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1810302)
Posted 8 days ago by Profile M_M
i7-2600k is a 4core/8threads. I have experimented a bit and found out this as a optimal setting on my config for maximum RAC. Even though CPU load is shown as 100%, system (Win10) is reasonably responsive for normal work (surfing, office work etc).



If I set more then 50% CPU with 2 SoG in parallel, system is a bit less responsive and GPU WU times increases (visible also by lower GPU TDP, which is a better indication of GPU use then GPU load indicator itself). I have not experimented with CPU time usage limitation, so far it has been always set to 100%...
5) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1810299)
Posted 8 days ago by Profile M_M
Guppies on my GTX1080 takes about 13-14min each (latest SoG r3500, running 2 in parallel ). On my i7-2600k and I have limited CPU usage to 50% to be able to feed the monster properly and yet have a reasonably responsive system to work on at same time.

I am using command line switches as suggested in readme for high end GPUs:
-sbs 384 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64

Average TDP with Guppies is around 58% (over 70% with non-Guppies), average GPU load around 97%.
6) Message boards : Number crunching : OpenCL vs CUDA (Stock) (Message 1809220)
Posted 12 days ago by Profile M_M
I somehow expected this; Latest GPUs are not used well by now old Cuda 5.0, while OpenCL is a bit higher level programming and OpenCL driver itself is doing a better job optimizing the task to actual higher-end GPU hardware.

Sure, well written code in Cuda 7.5/8.0 would probably be even better but requires some additional human effort to be put in...
7) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808728)
Posted 14 days ago by Profile M_M
@Shaggie76: Is it possible to get some stats about SoG vs Cuda 5.0 on GTX7xx, GTX9xx and GTX10x0?
8) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808569)
Posted 15 days ago by Profile M_M
The generic GPU usage tells only the first SMX usage.

So this is then a catch - most developers probably fully rely on generic GPU usage indication when optimizing their GPU code and judging if their code is squeezing maximum from GPU, but this is wrong as they should actually more rely on power consumption if they want to optimize code for maximum efficiency and performance...
9) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808446)
Posted 16 days ago by Profile M_M
I am also a bit surprised that difference between GTX1080 and GTX1070 is so small, since GTX1080 has 33% more compute units (2560 vs 1920 shaders) and 25% faster memory (10GHz vs 8GHz), and its even a bit higher clocked, so something is holding GTX1080 back? Even nVidia was advertizing GTX1080 as 8.9 TFLOPS and GTX1070 as 6.5 TFLOPS.

I would guess that current application implementation is not using its extra resources well... Maybe time for some new, optimized application?
10) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808366)
Posted 16 days ago by Profile M_M
As I know, it is float 32bit, i.e. single precision (SP), where nVidia is in general slightly faster in same price bracket. However, in DP AMD is usually faster as nVidia is "saving" DP performance for much more expensive dedicated compute cards like Tesla P100 for example.
11) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1808343)
Posted 16 days ago by Profile M_M
AMD Fiji is R9 Fury series, i.e. still AMD top performance series, waiting to be replaced soon, since it cannot compete to new nVidia GPUs.

However, R9 Fury/Fury X is still more powerful comparing to Ellemere RX470/480, which are new AMD performance/power efficient middle level GPU, with overall performance level similar to AMD Hawaii R9 290/290x.
12) Message boards : Number crunching : Better Maxwell/Pascal support with new Cuda (Message 1808141)
Posted 17 days ago by Profile M_M
Probably most of the improvements medium term will come from recent contributions from Petri33. Longer term probably looking at trying to leverage some of the ai targeted features not explored in setiathome code yet (longer term prospect).


Yes, I noticed efficiency of Petri33 crunching with his custom applications, 2-3x faster then stock applications. He is obviously very skilled programmer, so it is very good if he is willing to contribute to whole community.

For long term, I agree, we should always be open minded willing to explore new features and techniques to improve "the quest"
13) Message boards : Number crunching : Better Maxwell/Pascal support with new Cuda (Message 1808074)
Posted 18 days ago by Profile M_M
Any plans for optimized Cuda 7.5/8.0RC binaries for nVidia GPUs with Compute Capability 5.0 or higher? Seems that many improvements and optimizations have been put in since Cuda 5.0.
14) Message boards : Number crunching : Failed Upgrade (Message 1807795)
Posted 19 days ago by Profile M_M
I would suspect that that nVidia drivers are still long way to be "bulletproof" and optimized when it comes to Pascal, i.e. 10x0 family, and specially when it comes to SLI setups, so some patience is needed...
15) Message boards : Number crunching : 1080 underclocking (Message 1807756)
Posted 20 days ago by Profile M_M
@memory downclock issue: I think nVidia should clearly state a downclock reason and provide option so users can decide, i.e. for example, in control panel to give us the option would we use P2 for number crunching (could be a default setting) or P0 (with some exclamation that calculation accuracy could not be guaranteed, if this is the reason).

As I can see, for SETI calculation, memory controller load is between 50-70%, so I suspect memory clock matters and makes a difference in calculation efficiency.
16) Message boards : Number crunching : 1080 underclocking (Message 1807728)
Posted 20 days ago by Profile M_M
I have another downlock observation with GTX1080; memory clock goes down around 10% as soon as SETI or any other pure GPU compute starts, card is pushed into P2 mode instead of P0 (max performance mode). Win10x64 and 368.81 drivers. Why nVidia is doing this, it is unknown...

No any issues with games, all are running perfectly on P0, is max. performance mode and normal memory and boost GPU clocks.

Seems like nVidia driver as soon pure compute load is detected (no graphics output), cards are pushed to P2 mode.

Some others also found this, and happens also with Maxwell and recent drivers as well...
https://devtalk.nvidia.com/default/topic/940304/cuda-programming-and-performance/grim-memory-bandwidth-gtx-1080/1

There is some workaround but it doesn't work on Pascal, only on Maxwell....
https://devtalk.nvidia.com/default/topic/892842/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/
17) Message boards : Number crunching : Status of opencl_ati for V8 (Message 1757350)
Posted 18 Jan 2016 by Profile M_M
V8 for ATI HD5 works fine for me on R9 290 with Crimson 15.12 and Win10
18) Message boards : Number crunching : Status of opencl_ati for V8 (Message 1756354)
Posted 14 Jan 2016 by Profile M_M
You can download here.

http://mikesworldnet.de/download.html


No instructions on how to integrate them into app_info.xml? Maybe to release v0.43c?
19) Message boards : Number crunching : No ATI GPU WUs? (Message 1503044)
Posted 12 Apr 2014 by Profile M_M
Thanks, that's it. Lunatic's works...

BTW, it is a R9 290 graphics card with latest official 13.12 Catalyst.
20) Message boards : Number crunching : No ATI GPU WUs? (Message 1502860)
Posted 11 Apr 2014 by Profile M_M
11.4.2014 18:54:10 | | Starting BOINC client version 7.2.42 for windows_x86_64
11.4.2014 18:54:10 | | log flags: file_xfer, sched_ops, task
11.4.2014 18:54:10 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
11.4.2014 18:54:10 | | Data directory: M:\BOINC
11.4.2014 18:54:10 | | Running under account Korisnik
11.4.2014 18:54:10 | | OpenCL: AMD/ATI GPU 0: Hawaii (driver version 1348.5 (VM), device version OpenCL 1.2 AMD-APP (1348.5), 2048MB, 2048MB available, 3200 GFLOPS peak)
11.4.2014 18:54:10 | | OpenCL CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1348.5 (sse2,avx), device version OpenCL 1.2 AMD-APP (1348.5))
11.4.2014 18:54:10 | | Host name: Korisnik-PC
11.4.2014 18:54:10 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7]
11.4.2014 18:54:10 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx tm2 pbe
11.4.2014 18:54:10 | | OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
11.4.2014 18:54:10 | | Memory: 7.97 GB physical, 7.97 GB virtual
11.4.2014 18:54:10 | | Disk: 931.51 GB total, 150.50 GB free
11.4.2014 18:54:10 | | Local time is UTC +2 hours
11.4.2014 18:54:10 | | Config: use all coprocessors
11.4.2014 18:54:10 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 5859584; resource share 600
11.4.2014 18:54:10 | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2470467; resource share 300
11.4.2014 18:54:10 | SETI@home | General prefs: from SETI@home (last modified 25-Oct-2012 19:51:00)
11.4.2014 18:54:10 | SETI@home | Computer location: home


Next 20

Copyright © 2016 University of California