OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 18 · Next

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765215 - Posted: 15 Feb 2016, 18:21:29 UTC - in response to Message 1765207.  

Tut,

Did you run an AP along side the OpenCl....SoG to see how they reacted?

Just curious.

Zalster
ID: 1765215 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765231 - Posted: 15 Feb 2016, 19:27:19 UTC - in response to Message 1765219.  
Last modified: 15 Feb 2016, 19:27:43 UTC

Running 4 CPU work units (VLARS) combined with 4 GPU shows 99-100% CPU usage.

When the kernal activity reaches these high levels I see a drop off in GPU usage, momentary drops from 98% down to 92%. I've noticed that the times to complete for the GPU have increased by about 3-5 minutes.

As Kernal activity decrease, the GPU Usage goes back up to around 97-98%.

I'll give this a few more hours but will eventually drop CPU MB to 2 and see if that provides a better through put with less momentary drops in the GPUs and how the times adjust.
ID: 1765231 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1765254 - Posted: 15 Feb 2016, 21:09:32 UTC - in response to Message 1765231.  

Running 4 CPU work units (VLARS) combined with 4 GPU shows 99-100% CPU usage.

When the kernal activity reaches these high levels I see a drop off in GPU usage, momentary drops from 98% down to 92%. I've noticed that the times to complete for the GPU have increased by about 3-5 minutes.

As Kernal activity decrease, the GPU Usage goes back up to around 97-98%.

I'll give this a few more hours but will eventually drop CPU MB to 2 and see if that provides a better through put with less momentary drops in the GPUs and how the times adjust.


And what about elapsed times for CPU apps? Did they drop after you increased their priority?
ID: 1765254 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1765256 - Posted: 15 Feb 2016, 21:10:24 UTC - in response to Message 1765196.  

Any idea when there will be a Linux/Nvidia SoG version on beta to test?


?????

No idea for now.
ID: 1765256 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765262 - Posted: 15 Feb 2016, 21:26:47 UTC - in response to Message 1765254.  


And what about elapsed times for CPU apps? Did they drop after you increased their priority?



I have not increased their priority yet.

I wanted to get a baseline with the new OpenCl SoG before put in the commandline to increase the priority

When the current 4 finish, I will put it in place. They have about 30 minutes left
ID: 1765262 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34376
Credit: 79,922,639
RAC: 80
Germany
Message 1765263 - Posted: 15 Feb 2016, 21:32:13 UTC
Last modified: 15 Feb 2016, 21:33:41 UTC

Increase -oclfft_tune_bn and -oclfft_tune_cw to 64 on your Titan and change -no_cpu_lock to -cpu_lock.

Should be faster on this host.


With each crime and every kindness we birth our future.
ID: 1765263 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765276 - Posted: 15 Feb 2016, 22:05:38 UTC - in response to Message 1765263.  

ok, will do that now
ID: 1765276 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765283 - Posted: 15 Feb 2016, 22:48:26 UTC - in response to Message 1765276.  
Last modified: 15 Feb 2016, 22:53:10 UTC

Mike,

CPU per work unit went down to 0.02, Time to complete as gone up on some.

I'm seeing a weird effect where they slowly progress to anywhere from 6 minutes to 20 minutes then the progress goes back to 0 and they start again.

Not all of them do this but at least a third do.
I've also note that 2 of the 4 GPU dropped out of P2 states and went to P8 and utilization went to 0 then came back up a few minutes later.

Ignore the 4 abort work units in my results

Edit.

I'll leave it for now, will see how they do over a longer period of time
ID: 1765283 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765341 - Posted: 16 Feb 2016, 2:37:48 UTC - in response to Message 1765283.  
Last modified: 16 Feb 2016, 2:48:01 UTC

So I removed the -cpu_lock as 2/5 of all work were restarting back at 0

It's hard to explain, they started with lower CPU usage and went down to low single digit percentage. They would continue to crunch to different points. 3/5 would continue onward to completion but the other 2/5 would restart the percentage at 0 and the CPU usage would then move up to 30-40% of a core.

I would say it almost looks like the cpu lock "slipped"

The 2/5 that did this would eventually reach conclusion but their times to complete were higher than they should have been.

I've tried running 4 CPU work units along with the GPUs.

There are periods where the demand from the CPU forces the GPU to pause, you can see Utilization drop to as much as 50% for a few moments, then it resumes.

I had to stop CPU workage right now as the screen was flickering and it was starting to stall, spinning icon. I have 2 particular work units that are taking a lot longer time to complete than all the rest. Currently at 40 minutes for 58% completion. Once it's done I'll look at it's angle to see what it was. I think these 2 work units were stressing the system.

Edit...
http://setiathome.berkeley.edu/workunit.php?wuid=2064540563

WU true angle range is : 0.082070
ID: 1765341 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765384 - Posted: 16 Feb 2016, 7:04:05 UTC - in response to Message 1765378.  

I believe you.

I remember seeing it on Beta.

What was disconcerting was normal angles were taking just as abnormally long.

0.4 should take about 20 minute to crunch, they were taking 42 minutes.

Since the removal of the cpu lock times have returned to normal.

I still see the 0.01% process once in a while but it's only on the high angle work units.

The only one now taking a abnormal amount of time to crunch is the low angle ones.

Yes there is an abnormal high CPU now but I counter that by not doing any CPU work. The GPU more than make up for any I would have done to begin with.
ID: 1765384 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1765391 - Posted: 16 Feb 2016, 8:21:25 UTC

@Zalster
1. And most important. Cause you starting to use CPUlock don't deceive app (and it will not deceive you in return, hehe :)). You tell that there are only 2 instances per GPU but actually run 4. So part of them just await free slot.
Be consistent, if you run N instances tell app there are N instances no more no less. And take into account that with CPUlock only 64 instances per host can work in usual regime. What will be with more than 64 instances - unknown (64 is the size of MutEx array that can be waited in Windows OS )
ID: 1765391 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1765457 - Posted: 16 Feb 2016, 13:22:43 UTC - in response to Message 1765378.  

I'm sure Raistmer can confirm this, but explain the reasons for it, much better, and why that behaviour is more visible in the SoG app, than in other MB apps.

Help me out here Raistmer, the water is too deep for this old man. It's your app, you do the explanations of why it behaves like it does :-)


It was explained elsewhere (on beta?) already but lets reiterate a little:
1) MultiBeam has inhomogenious work distribution inside the task.
It has consequence: progress scale non-linear.
It's non-linear for ALL MB tasks, CPU, GPU, ATI, CUDA and so on.
But the observed degree of that non-linearity will be different. It depends on how differ in performance algorithms for different types of work MB task contains.
If different type of work (different searches) processed almost with the same performance one will see almost linear progress. And otherwise.

2) SoG build is more unique case cause it's first time implemented real parallel execution of different searches completely on GPU device.
Progress marks embedded in CPU code and with high AR that CPU code does only non-blocking kernel enqueuing. This done very fast and then GPU process data w/o interaction with CPU code. So no more marks of progress going to BOINC - highly non-linear behavior on VHARs.

3) BOINC implements own estimation of progress in case app doesn't report progress in timely fashion. So, progress will "tick" w/o ANY connection with real progress. No wonder sometimes it will drop back or forth if app's reported progress will differ from what BOINC thought it should be.


And regarding last Zalster reports - it was just example of misconfigured (as I wrote already) host - app supplied with wrong data. So, when BOINC launches 4 tasks only 2 execution slots reserved inside CPUlock inner mechanism. So, 2 instances going to wait for free slot in suspended state. And BOINC continues its "ticks" of progress. But then, when instance aquires lock on free slot and starts real (!) execution, it reports real progress to BOINC... oops, we have zero progress instead of all those false ticks BOINC already did. Progress drops to zero and then start counting again.
ID: 1765457 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1765857 - Posted: 17 Feb 2016, 22:57:34 UTC

Ok, finally got some APs.

So now can run them in parallel to see how they interact with the SoGs.

Will be back later with results.
ID: 1765857 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1766039 - Posted: 18 Feb 2016, 14:18:59 UTC - in response to Message 1765857.  

Ok, finally got some APs.

So now can run them in parallel to see how they interact with the SoGs.

Will be back later with results.

And please don't forget to take into account instances num corrections. It's important for OpenCL AP app as well as for MB OpenCL one.
ID: 1766039 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1766043 - Posted: 18 Feb 2016, 14:49:43 UTC - in response to Message 1766040.  
Last modified: 18 Feb 2016, 14:50:39 UTC

maximum number of AP, MB instances should be used for both apps if possible mix anticipated.
Or each app just could get own number of instances in own CMD line (3 for AP and 4 for MB in this particular case) it has better cosmetics but actually the same as providing max to both.
ID: 1766043 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1766070 - Posted: 18 Feb 2016, 17:43:07 UTC - in response to Message 1766056.  

The number of instances of MB and APs on each card are the same on my machines. (though for lower series GPUs that is not always the case, ie 750ti can do 2 mb or 1 Ap depending on the rest of the system)

They worked together perfectly.

I didn't see any slowing down or prolongation of the MB, which is the case when cuda MB are mixed with OpenCL APs.
ID: 1766070 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1766093 - Posted: 18 Feb 2016, 19:18:09 UTC - in response to Message 1766070.  

Looking at the time to complete, I've noticed that the time for a OpenCL Ap to complete is comparable to running all nothing but APs on that GPU.

In other words,

If I ran an AP on a card with MB at the same time, the AP usually completes in 14-18 minutes, while the MBs have their time to complete elongated to as much as an hour or more.

If I ran all APs on the GPU with no MB, the time to complete should be around 50-54 minutes

Running an AP with SoG MB, the time to complete is now 50 mins. There is no elongation of the time to complete that I can see with the SoG MBs. So that is good news. I have not seen any other issues with the SoG MB when run alongside the APs.

Will continue to check.
ID: 1766093 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1766237 - Posted: 19 Feb 2016, 4:29:54 UTC

Ok, I have a question.

I put the -cpu_lock back in to see how it does.

One of the things I'm seeing is new work is started, they start with really low CPU usage.

That is fine, Looks like they all decide to work on a few core rather than spread out over many.

My question is this.

While these work units are crunching, Several of the GPU will stop crunching.

They go from a P2 state to a P8 state and the speed drops down.

This may last several minutes.

Eventually the GPU will start up again as newer work units start.

I've seen this done over all 4 GPUs at different times.

Can someone explain how this is happening?
ID: 1766237 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1766275 - Posted: 19 Feb 2016, 7:59:38 UTC - in response to Message 1766237.  

List complete config (cmd line and app_info/config) you use currently and provide links to results of task that were ran at time when GPU downclocked.
ID: 1766275 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1767003 - Posted: 23 Feb 2016, 0:47:04 UTC

This is my SoG setup. I will show my complete list of current config files (cmd line and app_info/config). When I make any changes, I will post the changes to that file only. I will show my SoG app_info.xml file first, because I started with just SoG. Once I got things to work, I took my Cuda/AP/CPU app_info.xml and gutted all my cuda50 lines with SoG file to make my current app_info.xml work for SoG/APs and CPU. If you see something that could be improved, or a better way of tuning/trying, please comment:) Note: We don't make mistakes, we make changes.

When I used -sbs 384 I got this: WARNING: can't open binary kernel file for oclFFT plan: C:\SETI\Data/projects/setiathome.berkeley.edu\MB_clFFTplan_GeForceGTX780_1024_gr256_lr16_wg256_tw0_ls512_bn16_cw16_r3366.bin_35330, continue with recompile...

Workunit 2069914747,
http://setiathome.berkeley.edu/result.php?resultid=4744742557

I have the Stderr output file saved. If you need it just ask. All my slots had this warning in the stderr.txt files. Changing -sbs to either 256 or 192 stopped the warnings. I'm using -sbs 192 for now.

My goal is to use as little of CPU as possible without starving the GPU's. Sometimes I'm at 100% on all cores, but mostly in the range of 39-90%. 3 at the time seems the best so far. When I tried 4, the CPU usage was between 1-5% with very slow times (like I broke it), so I stopped the test.

I also started working on my MultiBeam_NV_config.xml file to give me some different tuning help for my GTX 960 and 780. I will post file before using.
Joe

app_info.xml(SoG only)I recommend this to start, because I believe in KISS.
<app_info>
<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>MB8_win_x86_SSE3_OpenCL_NV_r3366_SoG.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x86.dll</name>
<executable/>
</file_info>
<file_info>
<name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</name>
</file_info>
<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_SoG</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>MB8_win_x86_SSE3_OpenCL_NV_r3366_SoG.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
</app_info>

app_info.xml (SoG/AP/CPU) “current”
<app_info>
<app>
<name>setiathome_v7</name>
</app>
<file_info>
<name>AKv8c_r2549_winx86-64_AVXxjfs.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x64.dll</name>
<executable/>
</file_info>
<file_info>
<name>cmdline_AKv8c_r2549_winx86-64_AVXxjfs.txt</name>
</file_info>
<app_version>
<app_name>setiathome_v7</app_name>
<version_num>700</version_num>
<platform>windows_intelx86</platform>
<file_ref>
<file_name>AKv8c_r2549_winx86-64_AVXxjfs.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>cmdline_AKv8c_r2549_winx86-64_AVXxjfs.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>setiathome_v7</app_name>
<version_num>700</version_num>
<platform>windows_x86_64</platform>
<file_ref>
<file_name>AKv8c_r2549_winx86-64_AVXxjfs.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>cmdline_AKv8c_r2549_winx86-64_AVXxjfs.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
<app>
<name>astropulse_v7</name>
</app>
<file_info>
<name>AP7_win_x64_AVX_CPU_r2692.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x64.dll</name>
<executable/>
</file_info>
<file_info>
<name>ap_cmdline_win_x64_AVX_CPU.txt</name>
</file_info>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>703</version_num>
<platform>windows_x86_64</platform>
<plan_class>sse2</plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>703</version_num>
<platform>windows_intelx86</platform>
<plan_class>sse2</plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>703</version_num>
<platform>windows_x86_64</platform>
<plan_class>sse</plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>703</version_num>
<platform>windows_intelx86</platform>
<plan_class>sse</plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>701</version_num>
<platform>windows_x86_64</platform>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>701</version_num>
<platform>windows_intelx86</platform>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>700</version_num>
<platform>windows_x86_64</platform>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>700</version_num>
<platform>windows_intelx86</platform>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app>
<name>astropulse_v7</name>
</app>
<file_info>
<name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x86.dll</name>
<executable/>
</file_info>
<file_info>
<name>AstroPulse_Kernels_r2887.cl</name>
</file_info>
<file_info>
<name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</name>
</file_info>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>MB8_win_x64_AVX_VS2010_r3330.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x64.dll</name>
<executable/>
</file_info>
<file_info>
<name>mb_cmdline_win_x64_AVX_VS2010.txt</name>
</file_info>
<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_x86_64</platform>
<api_version>7.5.0</api_version>
<file_ref>
<file_name>MB8_win_x64_AVX_VS2010_r3330.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x64_AVX_VS2010.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_intelx86</platform>
<api_version>7.5.0</api_version>
<file_ref>
<file_name>MB8_win_x64_AVX_VS2010_r3330.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x64_AVX_VS2010.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>MB8_win_x86_SSE3_OpenCL_NV_r3366_SoG.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x86.dll</name>
<executable/>
</file_info>
<file_info>
<name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</name>
</file_info>
<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_SoG</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>MB8_win_x86_SSE3_OpenCL_NV_r3366_SoG.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
</app_info>

mb_cmdline_win_x86_SSE3_OpenCL_NV.txt “current”
-no_cpu_lock -sbs 192 -instances_per_device 3 -period_iterations_num 40 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16


app_config.xml “current” (For now I will stay with 1 AP, hopefully I can make it the same number for SoG, so there both equal with no slow down.)
<app_config>
<app>
<name>astropulse_v7</name>
<max_concurrent>1</max_concurrent>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>0.33</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v8</name>
<max_concurrent>12</max_concurrent>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>0.33</cpu_usage>
</gpu_versions>
</app>
</app_config>

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1767003 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 18 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.