Linux/NVIDIA/AP questions

Message boards : Number crunching : Linux/NVIDIA/AP questions
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1471350 - Posted: 31 Jan 2014, 21:41:16 UTC

You didn't mention if you installed the Nvidia Linux drivers or the stock Linux ones that install when it sees the GPU. That might just be the first step in getting the openCL to be recognized.

I believe arkayn or mike have a repository of current "lunatics" apps that work with linux/windows/apple products. Just look at their websites for more info. also don't forget to set your permissions on the files. No permissions = not able to use it.


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1471350 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1471369 - Posted: 31 Jan 2014, 22:04:21 UTC
Last modified: 31 Jan 2014, 22:06:32 UTC

Well, perhaps Urs could answer in more details about Linux config cause he ported app to Linux, I acted mostly as guinea pig with portable Linux setup for this release.

But few comments:
1) BOINC doesn't see OpenCL support. BOINC is recent enough so it's safe to assume that if BOINC doesn't see OpenCL there is no runnable OpenCL runtime on your system still. How to get OpenCL properly installed - question to Linux gurus. Apps' stderr just confirmed that - it says no OpenCL platforms found so host has no OpenCL runtime properly configured.

2) App can be build from publicly available sources. Also it's available on our (Lunatics) site here: http://lunatics.kwsn.net/index.php?module=Downloads;catd=1
Partcularly, NV Linux build: http://lunatics.kwsn.net/index.php?module=Downloads;sa=dlview;id=370

EDIT: yes, different rev number. Perhaps Petri33 built more recent rev by himself.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1471369 · Report as offensive
spitfire_mk_2
Avatar

Send message
Joined: 14 Apr 00
Posts: 563
Credit: 27,306,885
RAC: 0
United States
Message 1471380 - Posted: 31 Jan 2014, 22:21:17 UTC

C2D E7200

MMX instructions
SSE / Streaming SIMD Extensions
SSE2 / Streaming SIMD Extensions 2
SSE3 / Streaming SIMD Extensions 3
SSSE3 / Supplemental Streaming SIMD Extensions 3
SSE4.1 / Streaming SIMD Extensions 4.1
EM64T / Extended Memory 64 technology / Intel 64

http://www.cpu-world.com/CPUs/Core_2/Intel-Core%202%20Duo%20E7200%20EU80571PH0613M%20%28BX80571E7200%20-%20BXC80571E7200%29.html#cmp_cpus
ID: 1471380 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1471387 - Posted: 31 Jan 2014, 22:41:23 UTC - in response to Message 1471337.  

[quote]Petri33, /quote]
I have seen Your message, but I'm just about to go to asleep after a family Sauna and .... I'm sorry that I have no more juice (in me) to write a reply to You.

Please remind me if I forget tomorrow.

*winding down* Sound THX slowing... #to be awake tomorrow#

--
p.l.z. Chrunch on! I'll nanoSleep(x). 'I know, You'll take time from this till the next reply'

PLZ -2x ()____)_________)))))~~~.

p.s. The ASUS P9X79-E WS is still idle on my desk and the 560Ti and GTX-660 are on the shelf. The Corsair AX-1200 is still being replaced. My 2x780GF and i7-3930K@4.3GHz are purring on AX-650.

Scratch on whatever hide, may it be kibbles Chrunchy, Itchy, WoW!, juicy, milky, new, never seen before, just heard about it, dried out, misbalanced, screwed, badly credited, newly found, worked on, twice thinked, three times checked, beta tested, alpha discussed, I've done that, ....

Waiting for
a) my mama
b) ET
.. to call.

:D
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1471387 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1471427 - Posted: 1 Feb 2014, 2:40:42 UTC - in response to Message 1471337.  

Hi Guy

I think for a start you should just try using the standard the BOINC/Seti setup - which will get you most of what you are after under linux, ie crunching AP on gpu and Seti 7.x on your CPU.

Remove the app_info file as well (probably best just do do a fresh start in a blank directory of your "/home" directory. (I always run boinc/seti this way it bypasses any OS updates etc.

Remember that most of what is in the standard seti programs from seti these days are actually optimised code from the lunatics team. At this point Linux users (Im running Ubuntu here) cant process seti 7 work units on their gpu, but I believe that will change when the lunatics/seti team produce the next version of the client which is a major re-write of pretty much everything I believe lol.

BTW with your setup you should use one core free to feed the GPU leaving the other to crunch seti 7.x work units.

BTW on my setup it dont know what driver Im using either, yet, it does see what CUDA and OpenCL versions are available for crunching.

Just my 2c worth, and just be patient ;)

Cheers

Mark
ID: 1471427 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1471493 - Posted: 1 Feb 2014, 9:09:52 UTC - in response to Message 1471337.  
Last modified: 1 Feb 2014, 9:25:20 UTC

Hi,

Now awake.


I have a very old linux installation (Fedora 14) and I had to download new compiler source and compile a new compiler.

I have downloaded NVIDIA drivers from nvidia downloads

The source code is from svn branches sah_v7_opt (Xbranch and AP from there) Xbranch is for NVIDIA cuda and works in linux and AP is for NVIDIA opencl AP and works in linux too.

I have not touched the version number and I guess the current numbers from that repository are over 2000. You can look here Astropulse source

And I downloaded CUDA developer package from NVIDIA

I have done some modifications to the code and I attach here after the BOINC log two examples. I'd like to hear from your experience should you try them.

To build under linux I had to get the libraries right, path right, then just _autosetup, configure and make.

PATH=/usr/local/cuda-5.5/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/cuda/bin:/home/petri/bin


LD_LIBRARY_PATH=/usr/local/cuda-5.5/lib64:/usr/local/cuda-5.5/lib:


My configure (too complicated), but works:

File Edit Options Buffers Tools Help                                                                                                                                                                                                        
!/bin/sh                                                                                                                                                                                                                                    
export CFLAGS=-mavx                                                                                                                                                                                                                         
./configure BOINCDIR=/home/petri/boinc_repo --enable-sse3 CFLAGS='-O3 -march=core2 -mtune=core2 -msse2avx -mavx -mpreferred-stack-boundary=8 -fexceptions -fno-rounding-math -fno-signaling-nans  -fcx-limited-range -fno-math-errno  -fno-\
trapping-math --param inline-unit-growth=3000 -DPINNED -DNDEBUG -DHAVE_STRCASECMP -fpeel-loops -funroll-loops -fgcse-sm -fgcse-las -fweb -I/usr/local/cuda-5.5/include -L/opt/lib-4.12/lib ' LIBS="/opt/lib-4.12/lib/libm.so.6 /opt/lib-4.1\
2/lib/libc.so /opt/lib-4.12/lib/libpthread.so /usr/lib64/libstdc++.so /usr/lib64/lib/libm.so.6"        


And ...

Here is my log
Sat 01 Feb 2014 10:29:02 AM EET		Unrecognized tag in cc_config.xml: <http_11_0>
Sat 01 Feb 2014 10:29:02 AM EET		Starting BOINC client version 6.10.58 for x86_64-pc-linux-gnu
Sat 01 Feb 2014 10:29:02 AM EET		Config: report completed tasks immediately
Sat 01 Feb 2014 10:29:02 AM EET		Config: use all coprocessors
Sat 01 Feb 2014 10:29:02 AM EET		log flags: file_xfer, sched_ops, task, cpu_sched
Sat 01 Feb 2014 10:29:02 AM EET		Libraries: libcurl/7.21.0 NSS/3.12.10.0 zlib/1.2.5 libidn/1.18 libssh2/1.2.4
Sat 01 Feb 2014 10:29:02 AM EET		Data directory: /var/lib/boinc
Sat 01 Feb 2014 10:29:02 AM EET		Processor: 12 GenuineIntel Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz [Family 6 Model 45 Stepping 6]
Sat 01 Feb 2014 10:29:02 AM EET		Processor: 12.00 MB cache
Sat 01 Feb 2014 10:29:02 AM EET		Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmu
Sat 01 Feb 2014 10:29:02 AM EET		OS: Linux: 2.6.35.14-106.fc14.x86_64
Sat 01 Feb 2014 10:29:02 AM EET		Memory: 7.79 GB physical, 9.78 GB virtual
Sat 01 Feb 2014 10:29:02 AM EET		Disk: 47.37 GB total, 30.28 GB free
Sat 01 Feb 2014 10:29:02 AM EET		Local time is UTC +2 hours
Sat 01 Feb 2014 10:29:02 AM EET		NVIDIA GPU 0: GeForce GTX 780 (driver version unknown, CUDA version 6000, compute capability 3.5, 3072MB, 723 GFLOPS peak)
Sat 01 Feb 2014 10:29:02 AM EET		NVIDIA GPU 1: GeForce GTX 780 (driver version unknown, CUDA version 6000, compute capability 3.5, 3071MB, 723 GFLOPS peak)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Found app_info.xml; using anonymous platform
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	URL http://setiathome.berkeley.edu/; Computer ID 5643864; resource share 9100
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	General prefs: from SETI@home (last modified 21-Nov-2013 17:26:39)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Computer location: home
Sat 01 Feb 2014 10:29:02 AM EET		General prefs: using separate prefs for home
Sat 01 Feb 2014 10:29:02 AM EET		Reading preferences override file
Sat 01 Feb 2014 10:29:02 AM EET		Preferences:
Sat 01 Feb 2014 10:29:02 AM EET		   max memory usage when active: 7182.89MB
Sat 01 Feb 2014 10:29:02 AM EET		   max memory usage when idle: 7182.89MB
Sat 01 Feb 2014 10:29:02 AM EET		   max disk usage: 23.68GB
Sat 01 Feb 2014 10:29:02 AM EET		   max CPUs used: 6
Sat 01 Feb 2014 10:29:02 AM EET		   (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Sat 01 Feb 2014 10:29:02 AM EET		Using proxy info from GUI
Sat 01 Feb 2014 10:29:02 AM EET		Not using a proxy
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting ap_21my13ag_B6_P1_00090_20140129_01158.wu_0(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task ap_21my13ag_B6_P1_00090_20140129_01158.wu_0 using astropulse_v6 version 603
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting ap_21my13ag_B6_P1_00146_20140129_01158.wu_1(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task ap_21my13ag_B6_P1_00146_20140129_01158.wu_1 using astropulse_v6 version 603
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting ap_22au13ac_B4_P1_00141_20140129_06644.wu_0(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task ap_22au13ac_B4_P1_00141_20140129_06644.wu_0 using astropulse_v6 version 603
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting ap_22au13ac_B4_P1_00178_20140129_06644.wu_0(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task ap_22au13ac_B4_P1_00178_20140129_06644.wu_0 using astropulse_v6 version 603
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting ap_22au13ac_B5_P0_00308_20140129_18763.wu_0(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task ap_22au13ac_B5_P0_00308_20140129_18763.wu_0 using astropulse_v6 version 603
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting ap_22au13ac_B6_P1_00214_20140129_32125.wu_0(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task ap_22au13ac_B6_P1_00214_20140129_32125.wu_0 using astropulse_v6 version 603
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting 11ap13ac.4503.13324.438086664201.12.213_0(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task 11ap13ac.4503.13324.438086664201.12.213_0 using setiathome_v7 version 709
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting 08se13aa.19612.16427.438086664206.12.57_1(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task 08se13aa.19612.16427.438086664206.12.57_1 using setiathome_v7 version 709
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting 08se13aa.19612.16427.438086664206.12.21_0(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task 08se13aa.19612.16427.438086664206.12.21_0 using setiathome_v7 version 709
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting 08se13ad.19679.18237.438086664206.12.44_0(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task 08se13ad.19679.18237.438086664206.12.44_0 using setiathome_v7 version 709
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting 08se13ad.19679.18237.438086664206.12.38_1(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task 08se13ad.19679.18237.438086664206.12.38_1 using setiathome_v7 version 709
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	[cpu_sched] Starting 11ap13ac.4503.17414.438086664201.12.83_0(resume)
Sat 01 Feb 2014 10:29:02 AM EET	SETI@home	Restarting task 11ap13ac.4503.17414.438086664201.12.83_0 using setiathome_v7 version 709


and some code

... from cudaAcc_CalcChiprData.cu ...
// modified by petri33
#define NEWW 1
#ifdef NEWW
#define B 8
#define N_TIMES B/2
#define THREADS 192   
#else
...
#endif
...

// some new stuff by petri33
#ifdef NEWW
__global__ void __launch_bounds__(THREADS, 4)
cudaAcc_CalcChirpData_kernel_sm13(int NumDataPoints, double ccr, const float2 * const __restrict__ cx_DataArray, float2 * const __restrict__ cx_ChirpDataArray)
//cudaAcc_CalcChirpData_kernel_sm13(int NumDataPoints, double ccr, float2 *cx_DataArray, float2 *cx_ChirpDataArray)
{
  int iblock = blockIdx.x + blockIdx.y * gridDim.x;
  int ix = (iblock * blockDim.x + threadIdx.x) * B;

  double time = ix;
  float time2;
  float time3;

  float4 cx[N_TIMES];

  for(int i = 0; i < N_TIMES; i++) // load
    cx[i] = *(float4 *)(&cx_DataArray[ix + (i<<1)]);

  time = __dmul_rn(time, time);
  time2 = (float)((ix << 1) + 1);
  time3 = __fmul_rn((float)ccr, 2.0f);

  time = __dmul_rn(ccr, time);
  time2 = __fmul_rn((float)ccr, time2);

  time = __dsub_rn(time, __double2int_rd(time));
//  time2 = __fsub_rn(time2, __float2int_rd(time2));
//  time3 = __fsub_rn(time3, __float2int_rd(time3));

  float ft1 = time;
  float ft2 = time2;
  float ft3 = time3;

  ft1 = __fmul_rn(ft1, M_2PIf);
  ft2 = __fmul_rn(ft2, M_2PIf);
  ft3 = __fmul_rn(ft3, M_2PIf);

  float cf, sf, ca, sa, cb, sb;

  __sincosf(ft1, &sf, &cf);
  __sincosf(ft2, &sa, &ca);
  __sincosf(ft3, &sb, &cb);

  float4 tmp = cx[0];
  const float nsb = -sb;

  for(int i = 0; i < N_TIMES; i++) // use f and g to rot
    {
      float tsca, tcca, sg, cg, sacb, cacb, tsa;

      float ft1f = __fmul_rn(tmp.y, -sf);
      float ft2f = __fmul_rn(tmp.y, cf);

      tsca = __fmul_rn(sf, ca); // rot f by a to make g
      tcca = __fmul_rn(cf, ca); //
      sacb = __fmul_rn(sa, cb); // rot a by b
      cacb = __fmul_rn(ca, cb); //

      sg = __fmaf_rn(cf, sa, tsca); //
      cg = __fmaf_rn(sf, -sa, tcca); // rot f to g by a ready
      tsa = sa; //
      sa = __fmaf_rn(ca, sb, sacb); //
      ca = __fmaf_rn(tsa, nsb, cacb); // rot a by b ready

      float ft3g = __fmul_rn(tmp.w, -sg);
      float ft4g = __fmul_rn(tmp.w, cg);

      cx[i].y = __fmaf_rn(tmp.x, sf, ft2f);
      cx[i].x = __fmaf_rn(tmp.x, cf, ft1f);

      tsca = __fmul_rn(sg, ca); // rot g by a to make f
      tcca = __fmul_rn(cg, ca); //

      cx[i].w = __fmaf_rn(tmp.z, sg, ft4g);
      cx[i].z = __fmaf_rn(tmp.z, cg, ft3g);

      tmp = cx[i+1];

      sacb = __fmul_rn(sa, cb); // rot a by b again
      cacb = __fmul_rn(ca, cb); //

      tsa = sa; //
      sf = __fmaf_rn(cg, sa, tsca); //
      sf = __fmaf_rn(cg, sa, tsca); //
      cf = __fmaf_rn(sg, -sa, tcca); // rot g to f by a ready
      sa = __fmaf_rn(ca, sb, sacb); //
      ca = __fmaf_rn(tsa, nsb, cacb); // rot a by b ready
    }

  for(int i = 0; i < N_TIMES; i++) // store
    {
      *(float4 *)&(cx_ChirpDataArray[ix + (i<<1)]) = cx[i];
    }
}
#else
...
#endif
...
// call
  double ccr = 0.5*chirp_rate*recip_sample_rate*recip_sample_rate;                                                                                                                                                                          
  CUDA_ACC_SAFE_LAUNCH( (cudaAcc_CalcChirpData_kernel_sm13<<<grid, block>>>(cudaAcc_NumDataPoints, ccr, dev_cx_DataArray, dev_cx_ChirpDataArray)),true);       
...



and for AP I can not remember what I touched here ...

inline float4 calc_chirp(float4 cconst, float dm_start, int i) 

{

  float4 result;

  float phase1, phase2;

	

  float phase_const = M_PI * (dm_start + (float)i);



  float freq1 = cconst.x * cconst.x;

  cconst.x *= 0.00176f;



  float freq2 = cconst.y * cconst.y;

  cconst.y *= 0.00176f;



  cconst.x += 1.0f;

  cconst.y += 1.0f;



  cconst.x = 1.0f/cconst.x;

  cconst.y = 1.0f/cconst.y;



  freq1 = freq1 * cconst.x;

  freq2 = freq2 * cconst.y;



  phase1 = phase_const * freq1;

  phase2 = phase_const * freq2;



  result.x = native_cos(phase1);

  result.y = native_sin(phase1);

  result.z = native_cos(phase2);	

  result.w = native_sin(phase2);	



  result.x *= cconst.z;

  result.y *= cconst.z;

  result.z *= cconst.w;

  result.w *= cconst.w;



  return result;

}



__kernel void dechirp_range1_kernel ( __global float4* gpu_data ,

										__global float4* gpu_dechirped,

										__const float dm_start)

{

  uint tid = get_global_id(0);

  uint dchunk=get_global_id(1);

  int j = tid*2;

  float4 cconst;

  float nrcp = -rcp_2SigmaSqr;

	

  float g1, g2;

  float freq1, freq2;

  float multiplier1, multiplier2;	



  if(j < (FFT_SIZE/2)) 

    {	

      g1 = (float)j + 0.5f;

      g2 = (float)j + 1.5f;

      float float_j1 = (float)j;

      float float_j2 = (float)(j + 1);



      g1 *= g1;

      g2 *= g2;



      g1 *= nrcp;

      g2 *= nrcp;

	

      cconst.x = float_j1*(1.0f/32768.0f);

      cconst.y = float_j2*(1.0f/32768.0f);

	

      multiplier1 = native_sqrt(1.0f - native_exp(g1));

      multiplier2 = native_sqrt(1.0f - native_exp(g2));



      multiplier1 *= normalize_val;

      multiplier2 *= normalize_val;



      cconst.z = multiplier1; 

      cconst.w = multiplier2;

    }

  else

    {

      g1 = (float)(16383 -(j - 16384)) + 0.5f;

      int j1 = j - 16384;

      g2 = (float)(16383 -(j - 16384 + 1)) + 0.5f;

      int j2 = j - 16384 + 1;

		

      g1 *= g1;

      g2 *= g2;



      g1 *= nrcp;

      g2 *= nrcp;

	

      cconst.x = (float)(j1 + 16384)*(1.0f/32768.0f) -1.0f;

      cconst.y = (float)(j2 + 16384)*(1.0f/32768.0f) -1.0f;

	

      multiplier1 = native_sqrt(1.0f - native_exp(g1));

      multiplier2 = native_sqrt(1.0f - native_exp(g2));



      multiplier1 *= normalize_val;

      multiplier2 *= normalize_val;



      cconst.z = multiplier1;	

      cconst.w = multiplier2;	

    }



	//R: each work item will process 2 complex data elements and write 2*2*16 new complex elements into dechirped array

	

	float4 data=gpu_data[tid+dchunk*(FFT_SIZE/2)];

	float4 cur_chirp1;

	float4 cur_chirp2;

	float4 cur_chirp3;

	float4 cur_chirp4;

	float4 cur_dechirp11, cur_dechirp12;

	float4 cur_dechirp21, cur_dechirp22;

	float4 cur_dechirp31, cur_dechirp32;

	float4 cur_dechirp41, cur_dechirp42;

	float xx, yy, zz, ww, yx, xy, wz, zw;

	float xx2, yy2, zz2, ww2, yx2, xy2, wz2, zw2;

	float xx3, yy3, zz3, ww3, yx3, xy3, wz3, zw3;

	float xx4, yy4, zz4, ww4, yx4, xy4, wz4, zw4;

	

	for(uint i = 0; i < 16; i += 4)

	  {    //R: can be optimized via mad instruction probably



    		cur_chirp1 = calc_chirp(cconst, dm_start, i);

    		cur_chirp2 = calc_chirp(cconst, dm_start, i+1);

    		cur_chirp3 = calc_chirp(cconst, dm_start, i+2);

    		cur_chirp4 = calc_chirp(cconst, dm_start, i+3);



		//negative sign

		yy = data.y*cur_chirp1.y;

		xy = data.x*cur_chirp1.y;

		ww = data.w*cur_chirp1.w;

		zw = data.z*cur_chirp1.w;

		yy2 = data.y*cur_chirp2.y;

		xy2 = data.x*cur_chirp2.y;

		ww2 = data.w*cur_chirp2.w;

		zw2 = data.z*cur_chirp2.w;

		yy3 = data.y*cur_chirp3.y;

		xy3 = data.x*cur_chirp3.y;

		ww3 = data.w*cur_chirp3.w;

		zw3 = data.z*cur_chirp3.w;

		yy4 = data.y*cur_chirp4.y;

		xy4 = data.x*cur_chirp4.y;

		ww4 = data.w*cur_chirp4.w;

		zw4 = data.z*cur_chirp4.w;



		cur_dechirp11.x = mad(data.x, cur_chirp1.x,  yy);

		cur_dechirp12.y = mad(data.y, cur_chirp1.x,  xy);

		cur_dechirp11.z = mad(data.z, cur_chirp1.z,  ww);

		cur_dechirp12.w = mad(data.w, cur_chirp1.z,  zw);

		cur_dechirp21.x = mad(data.x, cur_chirp2.x, yy2);

		cur_dechirp22.y = mad(data.y, cur_chirp2.x, xy2);

		cur_dechirp21.z = mad(data.z, cur_chirp2.z, ww2);

		cur_dechirp22.w = mad(data.w, cur_chirp2.z, zw2);



		cur_dechirp31.x = mad(data.x, cur_chirp3.x, yy3);

		cur_dechirp32.y = mad(data.y, cur_chirp3.x, xy3);

		cur_dechirp31.z = mad(data.z, cur_chirp3.z, ww3);

		cur_dechirp32.w = mad(data.w, cur_chirp3.z, zw3);

		cur_dechirp41.x = mad(data.x, cur_chirp4.x, yy4);

		cur_dechirp42.y = mad(data.y, cur_chirp4.x, xy4);

		cur_dechirp41.z = mad(data.z, cur_chirp4.z, ww4);

		cur_dechirp42.w = mad(data.w, cur_chirp4.z, zw4);



		yy = -yy;

		xy = -xy;

		ww = -ww;	

		zw = -zw;

		yy2 = -yy2;

		xy2 = -xy2;

		ww2 = -ww2;

		zw2 = -zw2;

		yy3 = -yy3;

		xy3 = -xy3;

		ww3 = -ww3;

		zw3 = -zw3;

		yy4 = -yy4;

		xy4 = -xy4;

		ww4 = -ww4;

		zw4 = -zw4;



		cur_dechirp12.x = mad(data.x, cur_chirp1.x, yy);

		cur_dechirp11.y = mad(data.y, cur_chirp1.x, xy);

		cur_dechirp12.z = mad(data.z, cur_chirp1.z, ww);

		cur_dechirp11.w = mad(data.w, cur_chirp1.z, zw);

		cur_dechirp22.x = mad(data.x, cur_chirp2.x, yy2);

		cur_dechirp21.y = mad(data.y, cur_chirp2.x, xy2);

		cur_dechirp22.z = mad(data.z, cur_chirp2.z, ww2);

		cur_dechirp21.w = mad(data.w, cur_chirp2.z, zw2);



		cur_dechirp32.x = mad(data.x, cur_chirp3.x, yy3);

		cur_dechirp31.y = mad(data.y, cur_chirp3.x, xy3);

		cur_dechirp32.z = mad(data.z, cur_chirp3.z, ww3);

		cur_dechirp31.w = mad(data.w, cur_chirp3.z, zw3);

		cur_dechirp42.x = mad(data.x, cur_chirp4.x, yy4);

		cur_dechirp41.y = mad(data.y, cur_chirp4.x, xy4);

		cur_dechirp42.z = mad(data.z, cur_chirp4.z, ww4);

		cur_dechirp41.w = mad(data.w, cur_chirp4.z, zw4);



		gpu_dechirped[32*(FFT_SIZE/2)*dchunk+(2*i+0)*(FFT_SIZE/2)+tid] = cur_dechirp11;

		gpu_dechirped[32*(FFT_SIZE/2)*dchunk+(2*i+1)*(FFT_SIZE/2)+tid] = cur_dechirp12;

		gpu_dechirped[32*(FFT_SIZE/2)*dchunk+(2*i+2)*(FFT_SIZE/2)+tid] = cur_dechirp21;

		gpu_dechirped[32*(FFT_SIZE/2)*dchunk+(2*i+3)*(FFT_SIZE/2)+tid] = cur_dechirp22;

		gpu_dechirped[32*(FFT_SIZE/2)*dchunk+(2*i+4)*(FFT_SIZE/2)+tid] = cur_dechirp31;

		gpu_dechirped[32*(FFT_SIZE/2)*dchunk+(2*i+5)*(FFT_SIZE/2)+tid] = cur_dechirp32;

		gpu_dechirped[32*(FFT_SIZE/2)*dchunk+(2*i+6)*(FFT_SIZE/2)+tid] = cur_dechirp41;

		gpu_dechirped[32*(FFT_SIZE/2)*dchunk+(2*i+7)*(FFT_SIZE/2)+tid] = cur_dechirp42;



		}

}



#else	// dechirping via look up table



To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1471493 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1471527 - Posted: 1 Feb 2014, 10:46:00 UTC
Last modified: 1 Feb 2014, 10:55:28 UTC

What does your 'ldd' say?
# ldd setiathome_x41zc_x86_64-pc-linux-gnu_cuda55 
	linux-vdso.so.1 =>  (0x00007fff84dff000)
	libm.so.6 => /lib64/libm.so.6 (0x0000003ab9000000)
	libc.so.6 => /var/lib/boinc/projects/setiathome.berkeley.edu/libc.so.6 (0x00007f7714520000)
	libpthread.so.0 => /var/lib/boinc/projects/setiathome.berkeley.edu/libpthread.so.0 (0x00007f7714303000)
	libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f7713fc9000)
	libcudart.so.5.5 => /var/lib/boinc/projects/setiathome.berkeley.edu/libcudart.so.5.5 (0x00007f7713d7b000)
	libcufft.so.5.5 => /var/lib/boinc/projects/setiathome.berkeley.edu/libcufft.so.5.5 (0x00007f770f25a000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003ab9800000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003ab7c00000)
	libdl.so.2 => /lib64/libdl.so.2 (0x0000003ab8800000)
	librt.so.1 => /lib64/librt.so.1 (0x0000003755e00000)


and for AP

# ldd ap_6.07r1952_avx_clGPU_x86_64-pc-linux-gnu 
	linux-vdso.so.1 =>  (0x00007fff891ff000)
	libOpenCL.so.1 => /usr/lib64/libOpenCL.so.1 (0x00007fb50239b000)
	libfftw3f.so.3 => /usr/lib64/libfftw3f.so.3 (0x0000003862e00000)
	libz.so.1 => /lib64/libz.so.1 (0x0000003ab9400000)
	libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003acb800000)
	libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fb50205f000)
	libm.so.6 => /lib64/libm.so.6 (0x0000003ab9000000)
	libdl.so.2 => /lib64/libdl.so.2 (0x0000003ab8800000)
	libc.so.6 (0x00007fb501cd0000)
	libpthread.so.0 (0x00007fb501ab3000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003ab7c00000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003ab9800000)


To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1471527 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1471528 - Posted: 1 Feb 2014, 10:51:58 UTC

.. And to compile AP on my host I use this kind of configure script

(again works, may be too complicated)

#Pre-prepared configuration for an AstroPulse AMD/ATI-GPU build using OpenCL :
export BOINC_DIR=/petri/boinc_repo
export BOINCDIR=/home/petri/boinc_repo
export INCLUDE=/root/NVIDIA_GPU_Computing_SDK/OpenCL/common/inc
export GPU=NV

#./configure --enable-bitness=64 --build=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-boinc-platform=x86_64-pc-linux-gnu --enable-static --enable-static-client  --enable-avx --disable-shared --disable-graphics --enable-intrinsics CXXFLAGS=" -O3 -march=core2 -mtune=core2 -mfpmath=sse -mavx --param inline-unit-growth=3000 -I/root/NVIDIA_GPU_Computing_SDK/OpenCL/common/inc -I/root/NVIDIA_GPU_Computing_SDK/shared/inc" CPPFLAGS=" -DUSE_FFTW -DUSE_CONVERSION_OPT -DUSE_INCREASED_PRECISION -DSMALL_CHIRP_TABLE -DUSE_OPENCL -DUSE_AVX -DTWIN_FFA -DUSE_OPENCL_NV -DOPENCL_WRITE -DCOMBINED_DECHIRP_KERNEL -DOCL_ZERO_COPY -DAP_CLIENT" LIBS=" -L/usr/lib64 -lOpenCL -L/opt/lib-4.12/lib" LDFLAGS=" -static-libgcc -static-libstdc++" BOINCDIR=" /home/petri/boinc_repo" SETI_BOINC_DIR=" ../../AKv8"  LIBS="/opt/lib-4.12/lib/libm.so.6 /opt/lib-4.12/lib/libc.so /opt/lib-4.12/lib/libpthread.so /usr/lib64/libstdc++.so /usr/lib64/lib/libm.so.6 " 

./configure --enable-bitness=64 --build=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-boinc-platform=x86_64-pc-linux-gnu --enable-static --enable-static-client  --enable-sse2 --disable-shared --disable-graphics --enable-intrinsics CXXFLAGS=" -O3 -march=core2 -mtune=core2 -mfpmath=sse -msse2 --param inline-unit-growth=3000 -I/root/NVIDIA_GPU_Computing_SDK/OpenCL/common/inc -I/root/NVIDIA_GPU_Computing_SDK/shared/inc" CPPFLAGS=" -DUSE_FFTW -DUSE_CONVERSION_OPT -DUSE_INCREASED_PRECISION -DSMALL_CHIRP_TABLE -DUSE_OPENCL -DUSE_SSE2 -DUSE_OPENCL_NV -DOPENCL_WRITE -DCOMBINED_DECHIRP_KERNEL -DOCL_ZERO_COPY -DAP_CLIENT" LIBS=" -L/usr/lib64 -lOpenCL -L/opt/lib-4.12/lib" LDFLAGS=" -static-libgcc -static-libstdc++" BOINCDIR=" /home/petri/boinc_repo" SETI_BOINC_DIR=" ../../AKv8"  LIBS="/opt/lib-4.12/lib/libm.so.6 /opt/lib-4.12/lib/libc.so /opt/lib-4.12/lib/libpthread.so /usr/lib64/libstdc++.so /usr/lib64/lib/libm.so.6 " 

#For Nvidia version please add "-DUSE_OPENCL_NV" to CPPFLAGS above.


To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1471528 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1471531 - Posted: 1 Feb 2014, 11:04:12 UTC - in response to Message 1471529.  

...
Forgot to mention, I installed OpenCL from the Fedora repositories and it looks like I have OpenCL version 1.0. Maybe version 1.1 is needed? Also downloaded the SDK and installed it.
...


If I remember correctly I have not downloaded any extra packages to get OpenCL. I have a feeling that it comes with the NVIDIA driver.

I have to install the driver under init 1 on my machine and kill X before installing. then just reboot.

--
Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1471531 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1471549 - Posted: 1 Feb 2014, 12:20:36 UTC - in response to Message 1471529.  
Last modified: 1 Feb 2014, 12:22:27 UTC

I like Fedora 14. It has the last of the good windows-like interface.

As in Gnome 2? There's MATE which is a fork of Gnome 2 and I think I read something about Gnome 3 having a "classic" mode. I haven't used either one but if you want to stay in Redhad-land you could give those a try.

edit: Oh, right. PNI = Prescott New Instructions aka SSE3
ID: 1471549 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1471630 - Posted: 1 Feb 2014, 16:14:20 UTC - in response to Message 1471536.  



I just may break down and install the latest version of Ubuntu if I throw in the towel on Fedora. Been with them since Redhat 6, so I probably have an unhealthy attachment to it. I may need to see a therapist to fix that.



If you do that, go with Mint instead of Ubuntu.
http://www.linuxmint.com/

ID: 1471630 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1471703 - Posted: 1 Feb 2014, 21:07:38 UTC - in response to Message 1471630.  



I just may break down and install the latest version of Ubuntu if I throw in the towel on Fedora. Been with them since Redhat 6, so I probably have an unhealthy attachment to it. I may need to see a therapist to fix that.



If you do that, go with Mint instead of Ubuntu.
http://www.linuxmint.com/


In a not so distant future I'm going to build a new machine. Which linux should I pick?

--
Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1471703 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1471713 - Posted: 1 Feb 2014, 21:38:06 UTC - in response to Message 1471710.  

setiathome_enhanced is the old v6 application we left behind in the spring of 2013.

You need to be on setiathome_v7 now - global search and replace. I *think* x41g will handle that, but double-check.
ID: 1471713 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1471715 - Posted: 1 Feb 2014, 21:41:44 UTC - in response to Message 1471710.  

PRE tags.

-rw-rw-r--. 1 guy   guy       2488 Feb  1 14:51 app_info.xml
-rw-rw-r--. 1 guy   guy       2438 Feb  1 14:46 app_info.xml~
-rw-r--r--. 1 boinc boinc    52779 Feb  1 11:21 arecibo_181.png
-rwxr-xr-x. 1 root  root    313872 Feb  1 10:42 libcudart.so.3
-rwxr-xr-x. 1 root  root  28996288 Feb  1 10:42 libcufft.so.3
-rw-r--r--. 1 boinc boinc     2536 Feb  1 11:21 sah_40.png
-rw-r--r--. 1 boinc boinc    25488 Feb  1 11:21 sah_banner_290.png
-rw-r--r--. 1 boinc boinc    35399 Feb  1 11:21 sah_ss_290.png
-rwxr-xr-x. 1 root  root   1564204 Feb  1 14:34 setiathome_7.01_x86_64-pc-linux-gnu
-rwxr-xr-x. 1 root  root   6373944 Feb  1 10:43 setiathome_x41g_x86_64-pc-linux-gnu_cuda32
-rw-r--r--. 1 boinc boinc       71 Feb  1 14:52 slideshow_setiathome_enhanced_00
-rw-r--r--. 1 boinc boinc       72 Feb  1 14:52 slideshow_setiathome_enhanced_01
-rw-r--r--. 1 boinc boinc       75 Feb  1 14:52 slideshow_setiathome_enhanced_02
-rw-r--r--. 1 boinc boinc       67 Feb  1 14:52 stat_icon

(I wish I could figure out a way to maintain the columns for you readers)
ID: 1471715 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1471722 - Posted: 1 Feb 2014, 21:57:55 UTC - in response to Message 1471713.  

setiathome_enhanced is the old v6 application we left behind in the spring of 2013.

You need to be on setiathome_v7 now - global search and replace. I *think* x41g will handle that, but double-check.


It does, I have an updated app_info with it on my site.
http://www.arkayn.us/forum/index.php?action=tpmod;dl=item24

ID: 1471722 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Linux/NVIDIA/AP questions


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.