Posts by petri33


log in
1) Message boards : Number crunching : Is there a Linux version of a similar program like Tthrottle?! (Message 1784702)
Posted 2 hours ago by Profile petri33Project donor
nvidia-settings can set clock up/down for GPU and its memory.
You have to enable cool-bits.

Open a Terminal and type,

cd /etc/X11
sudo nvidia-xconfig --cool-bits=28


This will take effect after a restart.

Then launch nvidia-settings and you have an option to tune up/down all youg nvidia gpu's.

The bit values are explained in many places. I copied this one from somewhere.
The Coolbits value is the sum of its component bits in the binary numeral system. The component bits are: 1 (bit 0) - Enables overclocking of older (pre-Fermi) cores on the Clock Frequencies page in nvidia-settings. 2 (bit 1) - When this bit is set, the driver will "attempt to initialize SLI when using GPUs with different amounts of video memory". 4 (bit 2) - Enables manual configuration of GPU fan speed on the Thermal Monitor page in nvidia-settings. 8 (bit 3) - Enables overclocking of Fermi and newer cores on the PowerMizer page in nvidia-settings. Available since version 337.12.[1] 16 (bit 4) - Enables overvoltage of Fermi and newer cores using nvidia-settings CLI options. Available since version 346.16.[2] To enable multiple features, add the Coolbits values together. For example, to enable overclocking and overvoltage of Fermi cores, set Option "Coolbits" "24".
2) Message boards : Number crunching : What am I missing here? Major RAC diff between 2 machines. (Message 1784235)
Posted 1 day ago by Profile petri33Project donor
see http://stats.free-dc.org/stats.php?page=hostbycpid&cpid=311e83fe46e6c6ae2d92e1d3d037110e
and
http://stats.free-dc.org/stats.php?page=hostbycpid&cpid=4c4c2c2ab5e7217bdfff066e2970f4ef

Scroll both of them down. The titan gets 54000 a day. It is a new machine and its RAC is still climbing.



Thee stats say nothing about what GPUs are on these machines, so far as I can see...what am I missing?


You can get to the Free-DC statistics from here : http://setiathome.berkeley.edu/hosts_user.php?userid=24185
It lists your computers and on the left there are two links to statistics, the other is Boinc stats.

And to see user GTP's computers statistics go through his computers.
3) Message boards : Number crunching : What am I missing here? Major RAC diff between 2 machines. (Message 1784186)
Posted 2 days ago by Profile petri33Project donor
see http://stats.free-dc.org/stats.php?page=hostbycpid&cpid=311e83fe46e6c6ae2d92e1d3d037110e
and
http://stats.free-dc.org/stats.php?page=hostbycpid&cpid=4c4c2c2ab5e7217bdfff066e2970f4ef

Scroll both of them down. The titan gets 54000 a day. It is a new machine and its RAC is still climbing.
4) Message boards : Number crunching : Panic Mode On (102) Server Problems? (Message 1784114)
Posted 2 days ago by Profile petri33Project donor
The following lists show the most productive GPU models on different platforms. Relative speeds, measured by average elapsed time of tasks, are shown in parentheses. Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 208 NVIDIA Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 175 No GPU tasks reported Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 209 ATI/AMD Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 175 No GPU tasks reported Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 210 Intel Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 175 No GPU tasks reported Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 211 Generated ---
5) Message boards : Number crunching : GPU Wars 2016: NVIDIA Pascal details (Message 1782651)
Posted 8 days ago by Profile petri33Project donor
I'll probably have better luck searching for a unicorn but anybody have numbers for v8 (or v7 if not) on stock app doing one WU at a time?


That was poorly written. What I meant was anybody have v8 times for any of these:

- GeForce 9800 GTX - 324mm2 - 140W
- GeForce GTS 250 - 260mm2 - 150W
- GeForce GTX 460 - 332mm2 - 150W or 160W (2 versions)
- GeForce GTX 560 Ti - 332mm2 - 170W
- GeForce GTX 680 - 294mm2 - 195W
- GeForce GTX 770 - 294mm2 - 230W
- GeForce GTX 960 - 227mm2 - 120W
- GeForce GTX 980 - 398mm2 - 165W


You can take a look at my results. Mine is not stock. I ve got 2 980's and 2 780's running and all cards run one at a time. nvidia-smi reports 149W for the 980 when running shorties and 77-140W (varying) when running vlars. The 780 does not report power consumption.

http://setiathome.berkeley.edu/results.php?hostid=7475713&offset=0&show_names=0&state=4&appid=29
6) Message boards : Number crunching : Average Credit Decreasing? (Message 1781572)
Posted 11 days ago by Profile petri33Project donor
I just had my first GUPPI GPU task validate; http://setiathome.berkeley.edu/result.php?resultid=4877746625
WU true angle range is : 0.306775
Run time: 5 min 48 sec
CPU time: 5 min 40 sec
Validate state: Valid
Credit: 60.22

That is about Half of what the Arecibo Tasks with that Angle Range would pay.
Surprisingly it's also about what I got when I ran a CPU GUPPI on my GPU last week.

I'm afraid things may get much worse very soon...or maybe not.


I have an Arecibo 0.41 task here: http://setiathome.berkeley.edu/result.php?resultid=4878143709 it is run on a 780 in 3:25 and gives 89.41 credits.
Here is a guppi 0.30 http://setiathome.berkeley.edu/result.php?resultid=4878232504 and its runtime on a 780 is 2:50 and it gives 68.72 credits.

So guppi tasks are faster to process even with a lower angle rate.
7) Message boards : Number crunching : Panic Mode On (102) Server Problems? (Message 1780560)
Posted 14 days ago by Profile petri33Project donor
Hi,

My GPU WU queue is empty, I'm running MB v8 NV CUDA.

There are 500000+ work units ready to send. The CPU app downloads those guppi vlars just fine.

Do the ATI/AMD OpenCL platforms get guppi VLARS for processing?

If yes then: What shoud I set to plan_class to get GPU work? (To fake I'm running AMD opencl)

In the absence of non vlar work I could do vlars albeit slowly. They take about 14 minutes one at a time.

Petri

EDIT: I just received 20 tasks for my GPU. They are not guppi.
My plan_class is <plan_class>opencl_nvidia_sah</plan_class>
8) Message boards : Number crunching : Nvidia driver versions vs. performance (Message 1774115)
Posted 26 Mar 2016 by Profile petri33Project donor
At some point around 337 the drivers started to use 64 bit addressing using two 32 bit registers to represent an address to support more than 4 Gb of memory. That hurts performance. All address calculations need two additions and/or multiplications with carry bit(s).
9) Message boards : Number crunching : V8 CUDA for Linux? (Message 1773635)
Posted 24 Mar 2016 by Profile petri33Project donor
Yes, It's a mystery to me why the driver version is not detected/reported by the client on Linux (might do some homework there at some point).


Seems to have been fixed sometime during 7.4 series.


Good to know at least newer clients can use familiar scheduler logic. Will have to nail down a niggling glibc problem for older kernels/distros. Found some ways to force the issue for the widest compatibility, so plenty to tie up on the weekend.


I'm not sure if you have tried
APP_LIBS = -lm -static-libstdc++

I had to add -static-libstdc++ to my compile flags so that my version could be run on an older system/kernel.

Tha user had to update some libs.
10) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1772672)
Posted 19 Mar 2016 by Profile petri33Project donor
In/on Linux there is this libsleep.so solution that has been used in other BOINC projects too.

The liblessp.so does help ith 100% CPU usage.

1) Make the code and CUDA use Yield.
2) libsleep.so replaces Yield() with nanosleep.
3) libsleep.so need to be loaded ith LD_PRELOAD in linux, in Mac there may be some magic DYLD_XXX that does the same.

How does it work?
Yield gives the timeslice to a process that is ready to run and has the same or higher priority. Sleep and nanosleep gives the timeslice to any thread that is ready to run.
LD_PRELOAD loads a library in to memory before the program and its libraries and replaces the Yield() function call in the program and in the libraries (NVIDIA libs too) with nanosleep().

I have posted the nanosleep instructions and source and it should be easily found with a search in the forums. Fakesguy is running linux and has low CPU usage, and me too.

And the maxrregcount is a ay to tell the CUDA compiler to allocate more register to a thread but the sacrifice is to run a kernel (a piece of GPU code) with less parallelism. There is alays a trade of between more threads or more work done by a thread with less interdependencies or waiting for memory.
11) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1772496)
Posted 18 Mar 2016 by Profile petri33Project donor
And Yes,
A faster GPU needs more attention from the CPU.
Btw. Did you specify maxrregcount=64 when not using Makefile?
I guess you did - and if so, then that is not the issue.
Not specifying would have caused some major register spilling and induced a huge performance penalty on the GPU code.
12) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1772173)
Posted 17 Mar 2016 by Profile petri33Project donor
@TBar
One of the listings says opencl 1.0 for your machine a few posts before.
OpenCL: NVIDIA GPU 0: GeForce GTS 250 (driver version 304.128, device version OpenCL 1.0 CUDA, 1023MB, 844MB available, 705 GFLOPS peak) Thu 17 Mar 2016 01:16:48 AM EDT
OpenCL: NVIDIA GPU 1: GeForce 8800 GT (driver version 304.128, device version OpenCL 1.0 CUDA, 512MB,

so the driver is too old. The plan class should have something not saying opencl on that machine.


@jason_gee
yes, a faster app may use more cpu. But I'll check my code where I have those nanosleep loops.
13) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1772071)
Posted 17 Mar 2016 by Profile petri33Project donor
Yes, All the Apps compiled with the v8 'Baseline' code run with 'normal' CPU usage on a Mac. It's only the 'Special' Code that uses a Full CPU core. After months of looking and prodding it's still the same. Unfortunately I wouldn't know which part of the codes to compare. The code in the 'baseline' section works normally, the code in the Alpha section doesn't.

I decided to go really retro, and BOINC 6.10.56 trashes everything when going from 7.2.33 no matter what settings you use, so, I'll have to play with Plan Classes later. First is to find a BOINC that works with Ubuntu 11.04 and actually updates the counters without having to run the mouse across the screen. But hey, the CUDA 42 App works as expected. Maybe I should update the driver to 304?

The plan is to run the old setup in Beta as Stock, then switch over to the CUDA 42 App and compare the results. However, I Really need counters that work.
I'll also have to see about resurrecting these recent Ghosties...


Hmmm, it did the same as last time. The server doesn't mind resending the GPU tasks, or the normal CPU tasks, but sometimes insist it must expire the VLARs. Oh well, they've already been sent to someone else.


In cudaAcceleration.cu the stock code is
96 bool cudaAcc_setBlockingSync(int device) 97 { 98 // CUdevice hcuDevice; 99 // CUcontext hcuContext; 100 101 /* CUresult status = cuInit(0); 102 if(status != CUDA_SUCCESS) 103 return false; 104 105 status = cuDeviceGet( &hcuDevice, device); 106 if(status != CUDA_SUCCESS) 107 return false; 108 109 status = cuCtxCreate( &hcuContext, 0x4, hcuDevice ); //0x4 is CU_CTX_BLOCKING_SYNC 110 if(status != CUDA_SUCCESS) 111 return false;*/ 112 113 #if CUDART_VERSION < 4000 114 CUDA_ACC_SAFE_CALL(cudaSetDeviceFlags(cudaDeviceBlockingSync),false); 115 // CUDA_ACC_SAFE_CALL(cudaSetDeviceFlags(cudaDeviceScheduleYield),false); 116 #else 117 CUDA_ACC_SAFE_CALL(cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync),false); 118 // CUDA_ACC_SAFE_CALL(cudaSetDeviceFlags(cudaDeviceScheduleYield),false); 119 #endif 120 return true; 121 }


my code is different. Try using the same as in stock.

I'll get back in 10 hours or so and tell the other place(s) to look.
14) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1771985)
Posted 16 Mar 2016 by Profile petri33Project donor
Hi TBar,

Do you have a CUDA app that does not use a full core on your mac?

If yes, I'd compare the piece of code that sets the cuda driver to yield, poll or whatever the third word is (a temporary dementia/amnesia has hit me).

Another place to look at is the lines in my code that call nanosleep in a loop. They could be replaced with a normal cuda synchronization code for a CPU thread. You can look at the code in the cuda part of the pulsefind when the CPU is waiting for a stream to finish its work.
15) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1771982)
Posted 16 Mar 2016 by Profile petri33Project donor
I use <plan_class>opencl_nvidia_100</plan_class> with my cuda 6.5 special.
You might get some more VLAR tasks and depending on your hardware it might be OK or not. I'd leave an entry for the old plan class too if the cache is not empty.
16) Message boards : Number crunching : Update on Linux 64 -Nividia-V8-MB ????? (Message 1771728)
Posted 15 Mar 2016 by Profile petri33Project donor
Decice peak flops 9638.
Apr 930-1000 (varying).
Whar is the Real performance?
17) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1771428)
Posted 13 Mar 2016 by Profile petri33Project donor
Gianfranco Lizzio had the same problem with cdft. He might be able to help.
18) Message boards : Number crunching : MAC OS X El Capitan and NVIDIA Web Driver. (Message 1771274)
Posted 12 Mar 2016 by Profile petri33Project donor
There are some Mac's around and some are running stock.
Some have more processing units and some are faster.
Some have compiled their own and some are still waiting for.

http://setiathome.berkeley.edu/workunit.php?wuid=2091505693
19) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1770538)
Posted 9 Mar 2016 by Profile petri33Project donor
Ok. For linux nv only.
I use 7.5 compiler but 6.5 libraries. 7.5 libraries give me the same 15 second slowdown.
The .cs or .cg may depend on HW. My 780 and 980 do good with .cs. I have .cg or .ca in some kernel(s).
.cs does caching but marks the cache tedy to discard after use so it is best suited for sequential access.
20) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1770377)
Posted 8 Mar 2016 by Profile petri33Project donor
So, the Hatanaka test won't work?

Akira Hatanaka 2014-07-21 16:34:44 CDT

Created attachment 12806 [details]
test

The last time you suggested editing bin files, last year sometime, I looked at the Mac bin files and they were nothing similar to the examples you posted. I don't think the files are the same as on other platforms.


The hatanaka test c file has a deliberate bug in it. The variable is double (two registers in PTX). It should be float for "=f" or the "=f" should be "=d" for the double. My code has float and "=f" as it is supposed to be.

For NVIDIA GPUS ONLY:
The bin files are different for each GPU and driver version. If they contain something like this:

ld.global.v4.u32 {%r20, %r21, %r22, %r23}, [%rd4];
Changes to
ld.global.cs.nc.v4.u32 {%r20, %r21, %r22, %r23}, [%rd4];


They can be edited by a person who has off line testing capability.
Off line testing is mandatory. Otherwise you risk trashing your cached WU's.


And for the greater audience:
IF YOU DO NOT KNOW WHAT YOU ARE DOING DO NOT TRY.


Next 20

Copyright © 2016 University of California