Posts by Gianfranco Lizzio

21) Message boards : Number crunching : OpenCL WU test on OS X 10.11.5 (Message 1789726)
Posted 23 May 2016 by Profile Gianfranco Lizzio
Post:
What revision you compiled from, BTW?


I compiled the SoG version Rev. 3463.

Gianfranco
22) Message boards : Number crunching : OpenCL WU test on OS X 10.11.5 (Message 1789717)
Posted 23 May 2016 by Profile Gianfranco Lizzio
Post:
MB OpenCL apps continue to return incorrect values ​on OS X 10.11.4 and 10.11.5. I did the test with both SoG version and the classic one, always getting the same result . Then I compiled myself the OpenCL apps with the same result! I tested the apps using the test WU downloaded from Lunatics .
But all this does not happen with the CUDA apps and MB CPU apps that returns the correct values ​​during the test.
I have kept all the files obtained during the various tests, so i can send the result if necessary.

Gianfranco
23) Message boards : Number crunching : GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this? (Message 1786756)
Posted 11 May 2016 by Profile Gianfranco Lizzio
Post:
Assuming you changed maxregisters up to 64 as you stated elsewhere, try dialling it back to 32.


I'm using maxrregcount=64 in El Capitan without any problem.

Gianfranco
24) Message boards : Number crunching : GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this? (Message 1786742)
Posted 11 May 2016 by Profile Gianfranco Lizzio
Post:
Running 'guppi' 3 at a time on my i7 4770 using avx optimized code i take 42-45 min.
Running 1 at time on my GTX 960 using Petri's cuda code i take 17-18 min.

This means that i7 with 'guppi' is 20% more efficient than GTX 960 and uses 36% less power!

Gianfranco
25) Message boards : Number crunching : GBT MESSIER031 work on GPU NVIDIA (Message 1781723)
Posted 23 Apr 2016 by Profile Gianfranco Lizzio
Post:
The interesting thing of the overflows is that the result are not simple spikes but triplets.
26) Message boards : Number crunching : GBT MESSIER031 work on GPU NVIDIA (Message 1781656)
Posted 23 Apr 2016 by Profile Gianfranco Lizzio
Post:
Good news the Berkeley servers started the distribution of guppi work unit on CUDA NVIDIA GPU.
27) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1773057)
Posted 21 Mar 2016 by Profile Gianfranco Lizzio
Post:
Hmmm, so you're using a LapTop with an External video card?


No Tom...I'm using a self-built Macintosh with performance comparable to a Mac Pro 2013.
28) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1773017)
Posted 21 Mar 2016 by Profile Gianfranco Lizzio
Post:
If you're seeing inconclusives directly as a result of switching maxregcount, then you have Cuda kernels failing silently (and so looking quick) due to launch restrictions in Drivers and hardware.....


The most likely explanation seems to be the use of Nvidia Beta Driver. But there is nothing to do because they are the only ones available.
29) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1772774)
Posted 20 Mar 2016 by Profile Gianfranco Lizzio
Post:
Increasing the maxrregcount value increase inconclusive results. Now I'm trying to find a good compromise between speed of execution and inconclusive results.
30) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1772613)
Posted 19 Mar 2016 by Profile Gianfranco Lizzio
Post:
Jason the same thing happens with your code, placing maxrregcount = 64 and testing with the reference work unit present on Lunatics I register an increase in performance of 3.6%.
The performance increase is modest compared with 21% of Petri code, but still present.
31) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1772600)
Posted 19 Mar 2016 by Profile Gianfranco Lizzio
Post:
http://setiathome.berkeley.edu/result.php?resultid=4800007576

with maxrregcount=32

http://setiathome.berkeley.edu/result.php?resultid=4800260714

with maxrregcount=64 and the same AR 0.415

Performance increase 19,8%.

This result are on my GTX 960 with gencode 35,50 and 52.

[/b]
32) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1772597)
Posted 19 Mar 2016 by Profile Gianfranco Lizzio
Post:
Okay, I just searched the Xbranch folder from the last build and found;
$(cuda_cu_objs): cuda/$(@:.o=.cu)
$(NVCC) -c cuda/$(@:.o=.cu) -o $@ -Icuda $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) -I$(top_srcdir)/db $(BOINC_CFLAGS) --maxrregcount=32 $(NVCCFLAGS) $(CUDA_CFLAGS)

Since changing it has never been mentioned before, All the Apps have been built with the default which appears to be 32.

Hmmm, I might try setting it to 64 the next time I have the urge to pull all the 750s out and boot to Mountain Lion....just to see how it works on the 750s.

Any other suggestions?


Tom, I recompiled the code with maxrregcount = 64 and the results are very promising. The client seems faster for average AR of about 1 minute, however, the use of the CPU is always close to 100%.

I will do other tests and I will let you know.
33) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1772083)
Posted 17 Mar 2016 by Profile Gianfranco Lizzio
Post:

In cudaAcceleration.cu the stock code is

96 bool cudaAcc_setBlockingSync(int device)
97 {
98 // CUdevice hcuDevice;
99 // CUcontext hcuContext;
100
101 /* CUresult status = cuInit(0);
102 if(status != CUDA_SUCCESS)
103 return false;
104
105 status = cuDeviceGet( &hcuDevice, device);
106 if(status != CUDA_SUCCESS)
107 return false;
108
109 status = cuCtxCreate( &hcuContext, 0x4, hcuDevice ); //0x4 is CU_CTX_BLOCKING_SYNC
110 if(status != CUDA_SUCCESS)
111 return false;*/
112
113 #if CUDART_VERSION < 4000
114 CUDA_ACC_SAFE_CALL(cudaSetDeviceFlags(cudaDeviceBlockingSync),false);
115 // CUDA_ACC_SAFE_CALL(cudaSetDeviceFlags(cudaDeviceScheduleYield),false);
116 #else
117 CUDA_ACC_SAFE_CALL(cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync),false);
118 // CUDA_ACC_SAFE_CALL(cudaSetDeviceFlags(cudaDeviceScheduleYield),false);
119 #endif
120 return true;
121 }


my code is different. Try using the same as in stock.


Petri i replaced this part of stock code in yours but the result is the same 100% cpu is using.
34) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1771505)
Posted 14 Mar 2016 by Profile Gianfranco Lizzio
Post:
Tom after ./compile ... the problem is in analyzeFuncs.cpp in the client folder. You have to append the follow line code

#include <fft8g.h>

after #endif // USE_IPP

It works successfully for me.
35) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1761657)
Posted 2 Feb 2016 by Profile Gianfranco Lizzio
Post:
Anyone using the older versions might want to try the newer versions and see if they are any better.


CPU Intel(R) Core(TM) i7-4770K @3.70GHz (running 8 instances of SETI=

AVX Build 3352 Vs AVX Build 3366

Build 3352
Run time: 1 h 54 min 52 sec
CPU time: 1 h 48 min 7 sec
VLAR=0.010316

Build 3366
Run time: 1 h 45 min 40 sec
CPU time: 1 h 40 min 10 sec
VLAR=0.010306

Build 3366 is 8,6% faster!
36) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1758271)
Posted 22 Jan 2016 by Profile Gianfranco Lizzio
Post:
Computer use with CUDA app is much more fluid unlike OpenCL where I experience slowdowns in the animations on the screen.
Someone else has noticed these slowdowns using OpenCL?
37) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1758222)
Posted 22 Jan 2016 by Profile Gianfranco Lizzio
Post:
TBar the setiathome_x41zi_x86_64-apple-darwin_cuda65 app is 50% more faster compared with the MBv8_8.05r3346_nvidia_ssse3_x86_64-apple-darwin and uses 25% CPU against 33% the OpenCL app. It's a very good result.

On my GTX750Ti the shorties are about the same as with OpenCL but the longer tasks are significantly faster with cuda65. Task taking around 1100 secs in OpenCL finish in a little under 800 secs in cuda65. The cuda42 App is a little slower but is still faster than OpenCL on the longer tasks with my 750Ti. Testing the cuda42 App in Mountain Lion with my GTS250 gives about the same times as it was receiving in Windows 8.1. The cuda42 App should work with the Pre-Fermi GPUs in Snow Leopard to Mavericks. CUDA 6.5 required by Yosemite Will Not work with the Pre-Fermi GPUs.


OK it's the same for me. For high angle range the computing time is the same as the OpenCL app.
38) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1758153)
Posted 22 Jan 2016 by Profile Gianfranco Lizzio
Post:
TBar the setiathome_x41zi_x86_64-apple-darwin_cuda65 app is 50% more faster compared with the MBv8_8.05r3346_nvidia_ssse3_x86_64-apple-darwin and uses 25% CPU against 33% the OpenCL app. It's a very good result.
39) Message boards : Number crunching : Solarix 10 on x86 (Message 100802)
Posted 18 Apr 2005 by Profile Gianfranco Lizzio
Post:
How fast is the seti_boinc client running under Solaris 10 (x86 machine) compared with Linux or Windows?


Previous 20


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.