OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 18 · Next

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1779380 - Posted: 15 Apr 2016, 20:05:41 UTC - in response to Message 1779374.  
Last modified: 15 Apr 2016, 20:06:05 UTC

Kinda, but in the long run, it is still faster than the CPU version.

Regarding Guppi...

1 work unit CPU is 55 minutes vs 14 minutes GPU

where issues arises is when comparing how fast non vlars are crunched, especially when multiple instances.

I would think a separate plan class would be need in the app_config if one planned on running the Guppi along with non-vlars on a low core system and the configurations would have to be worked out. Rough idea would be along the lines

1 GPU on a dual core, single instance

1 GPU in a 4 core CPU probably not a problem

2 GPU in a 4 core, ok as long as not running multiple instance

2 GPU in a 8 core, probably ok

3 GPU in a 8 core, might be manageable but limited instances

3 GPU on a 12 core, mangable

4 GPU on a 12 core, won't recommend it other than single instance or limited

4 GPU on a 16 core, might be possible but limited instances

I didn't throw in 6 cores but they would be between the 2 and 3 GPU set ups.

But here's the good news, with each year we get better equipment that allows us to build upon these things. So who is to say what we can do in 1-2 years time.
ID: 1779380 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1779383 - Posted: 15 Apr 2016, 20:09:03 UTC - in response to Message 1779380.  
Last modified: 15 Apr 2016, 20:11:12 UTC

I am running vintage equipment. And I cannot afford to upgrade.
The kitty farm is what it is.
9 old rigs getting older by the day.
Every day I wake up and don't find one crashed is a good day.

If some think that it is OK to spend CPU cycles to support a weak GPU app, I am afraid I cannot agree.
The GPU apps should be better able to stand on their own with minimal CPU support.
That is why they are GPU apps.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1779383 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1779394 - Posted: 15 Apr 2016, 20:35:16 UTC - in response to Message 1779383.  

It's the GBT data that is requiring such large amount of CPU usage.

On normal MB, the SoG uses very little CPU time.

Actually, it's faster than cuda on nonvlar MB.

But it's main purpose is those VLARs.

So to crunch Guppi on GPU or not to crunch Guppi on GPU that is the question, lol....
ID: 1779394 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1779405 - Posted: 15 Apr 2016, 21:08:03 UTC - in response to Message 1779394.  

Try with -use_sleep and increased sizes of PulseFind kernel (-sbs 512 for example)
ID: 1779405 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1779418 - Posted: 15 Apr 2016, 21:59:13 UTC - in response to Message 1779371.  
Last modified: 15 Apr 2016, 21:59:30 UTC

So 3 per Card generates 36-37 minutes, about 1 minute faster than running a single Guppi by itself on the GPU

CPU utilization is down anywhere from 80-92% of a CPU rather than 97%

Only problem is, this require large number of CPU cores to make this happen.


I need to correct this.. I was using commandlines supplied by Mike at this point

When I ran this same experiment without commandlines, the time to complete were 46-47 minutes

Going to try Raistmer's recommendations now...
ID: 1779418 · Report as offensive
Bruce
Volunteer tester

Send message
Joined: 15 Mar 02
Posts: 123
Credit: 124,955,234
RAC: 11
United States
Message 1779433 - Posted: 15 Apr 2016, 22:53:46 UTC

I'm not seeing the same results that you are, but then I am using AMD cpus and not Intel.

The r3430_SoG seems to be running just like the r3401_SoG.
The shorties run pretty quick and use very little cpu resources. When you get past those and into the mid and lower AR it takes a full core per WU.

This seems to be the same problem that I have always had running APs.

I am still using the same command lines that I used for r3401, so may need to retune, but I don't expect any reduction in cpu usage.

Just tried a quick test using the sleep switch and it does not seem to work for me, still used a full core per.
Bruce
ID: 1779433 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1779448 - Posted: 15 Apr 2016, 23:56:14 UTC - in response to Message 1779433.  

Hey Bruce,

Sorry I should have specified that I was talking about SoG for Nvidia.

I don't know how they do for ATI..

I'm restarting my test with the -use_sleep and nothing else and will progress over the evening to see how it does.
ID: 1779448 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1779543 - Posted: 16 Apr 2016, 6:04:55 UTC - in response to Message 1779448.  


I'm restarting my test with the -use_sleep and nothing else and will progress over the evening to see how it does.

-use_sleep can be used along with full tuning line.
ID: 1779543 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1779545 - Posted: 16 Apr 2016, 6:22:47 UTC - in response to Message 1779543.  
Last modified: 16 Apr 2016, 6:28:45 UTC

Was just testing them to see how they all combine.

I've found that -use_sleep with -sbs 512 along with the command line Mike gave me works the best if I use the -use_sleep

16 minutes with 3-5% CPU usage running 1 work unit per card

at 3 work units per card

38 minutes average with 3% CPU along with -use_sleep -sbs 512 and command line

I'm going to try again 1, 2, and 3 at a time per card but without the -use_sleep

Tomorrow I will post result in a better format.

I've run into a problem this evening with over 12 errors, not sure why it occurred was only using -use_sleep and nothing else. Maybe a bad batch of work,not sure. Will have to wait and see if wingmen also error out or if they complete the work units.

Edit...

Just check those that error, some of the wingman also errored out, only arm wingmen seemed to have completed them but in very very short time. So I have to doubt those results.
ID: 1779545 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1779548 - Posted: 16 Apr 2016, 6:44:08 UTC
Last modified: 16 Apr 2016, 6:44:59 UTC

Looks like a bad batch of GBT on beta...

They start off normal then quickly CPU usage does all the way down to below 1%...and I'm not using -use_sleep on these so they should be using close to 97% of a core each...

I've seen this happen with all the ones that have errored out tonight.

At first I thought it was the -use_sleep but I removed it and restarted the machine.

But the errors continue to happen, plus I can see some of my wingmen are erroring out too.

Anyone else seeing these?

I'm not currently crunching any GBT CPU work units on Main so don't know if the same thing is happening here...

Edit...

Going to run out of work soon on Beta due to the high number of errors, restricted number of work units downloaded.
ID: 1779548 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1779559 - Posted: 16 Apr 2016, 7:28:05 UTC - in response to Message 1779548.  

Looks like a bad batch of GBT on beta...


Anyone else seeing these?

I'm not currently crunching any GBT CPU work units on Main so don't know if the same thing is happening here...


On Beta, 10 of ~150 GPU WUs (both SAH & SOG) errored out with
"ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance"
the rest validated OK. Duplicated this on at least 2 of 4 machines crunching beta.
Also saw some errors like that on GBT work done on main (CPU). Guess it's a GUPPI issue, not just GPU or CPU.
Vanilla setup here, no special command line info going on.
ID: 1779559 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1779564 - Posted: 16 Apr 2016, 8:24:53 UTC

Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message.

Zalster's are Error tasks for computer 75417: the error message is "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance". You'll be letting both Eric and Raistmer know, of course?
ID: 1779564 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34379
Credit: 79,922,639
RAC: 80
Germany
Message 1779565 - Posted: 16 Apr 2016, 8:31:12 UTC - in response to Message 1779433.  

I'm not seeing the same results that you are, but then I am using AMD cpus and not Intel.

The r3430_SoG seems to be running just like the r3401_SoG.
The shorties run pretty quick and use very little cpu resources. When you get past those and into the mid and lower AR it takes a full core per WU.

This seems to be the same problem that I have always had running APs.

I am still using the same command lines that I used for r3401, so may need to retune, but I don't expect any reduction in cpu usage.

Just tried a quick test using the sleep switch and it does not seem to work for me, still used a full core per.


First of all you need to remove -no_cpu_lock.
Also period_iterations_num 20 is a little low.
Increase it to 50 or better 80 for SoG.


With each crime and every kindness we birth our future.
ID: 1779565 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34379
Credit: 79,922,639
RAC: 80
Germany
Message 1779567 - Posted: 16 Apr 2016, 8:34:53 UTC - in response to Message 1779564.  
Last modified: 16 Apr 2016, 8:39:35 UTC

Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message.

Zalster's are Error tasks for computer 75417: the error message is "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance". You'll be letting both Eric and Raistmer know, of course?


CPU affinity adjustment disabled


Its the same here.

Running multiple instances on GPU requires enabled cpu affinty.

A bug in cpu affinty adjustment has been fixed since r_3391.

So remove -no_cpu_lock.


With each crime and every kindness we birth our future.
ID: 1779567 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1779573 - Posted: 16 Apr 2016, 9:05:00 UTC - in response to Message 1779564.  
Last modified: 16 Apr 2016, 9:33:24 UTC

Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message.

Zalster's are Error tasks for computer 75417: the error message is "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance". You'll be letting both Eric and Raistmer know, of course?

I think it's autocorr protection false positive.
We debated with Joe level of this protection some time ago when there were few false positives in usual Arecibo's data. Instead of all other types of protection levels there is no sensible theoretical limits for autocorr value that we can expect from data.
So, chosen one was only our guess what it could be. I already increased it some time ago. Apparently - not high enough. Maybe this sanity check should be disabled completely. I'll try to consult with Eric on this topic again.

  //R: sanity check for found result
  if(swi.analysis_cfg.autocorr_fftlen==131072 && ai.a.peak_power > 135.0){
	  //R: it's possible for good result to have >135 in autocorr though it's rare event 
	  //so check if too much signals found later
		was_big_autocorr++;
	  //boinc_temporary_exit(5*60,"Suspicious autocorr results, host needs reboot or maintenance");
  }



period_iterations_num=40
Spike: peak=26.84568, time=7.158, d_freq=1684106822.19, chirp=0, fft_len=32k
Spike: peak=24.5848, time=10.02, d_freq=1684097740.05, chirp=0, fft_len=32k
Spike: peak=24.25558, time=12.88, d_freq=1684095968.67, chirp=0, fft_len=32k
Spike: peak=29.58042, time=18.61, d_freq=1684105388.54, chirp=0, fft_len=32k
Spike: peak=25.99703, time=21.47, d_freq=1684098330.62, chirp=0, fft_len=32k
Spike: peak=30.16515, time=30.06, d_freq=1684106456.88, chirp=0, fft_len=32k
Spike: peak=24.45428, time=41.52, d_freq=1684105782.14, chirp=0, fft_len=32k
Spike: peak=32.34881, time=8.59, d_freq=1684096755.87, chirp=0, fft_len=64k
Spike: peak=43.98743, time=14.32, d_freq=1684096109.25, chirp=0, fft_len=64k
Spike: peak=46.22211, time=20.04, d_freq=1684105388.36, chirp=0, fft_len=64k
Spike: peak=38.42199, time=25.77, d_freq=1684103954.18, chirp=0, fft_len=64k
Spike: peak=38.20225, time=31.5, d_freq=1684106456.88, chirp=0, fft_len=64k
Spike: peak=34.22185, time=42.95, d_freq=1684104966.65, chirp=0, fft_len=64k
Autocorr: peak=154.4745, time=5.727, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k
Autocorr: peak=209.9964, time=17.18, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k
Autocorr: peak=240.9282, time=28.63, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k
Autocorr: peak=102.848, time=40.09, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k
Autocorr: peak=93.54303, time=51.54, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k
Autocorr: peak=58.89613, time=62.99, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k
Spike: peak=44.38844, time=5.727, d_freq=1684105781.87, chirp=0, fft_len=128k
Spike: peak=54.02288, time=17.18, d_freq=1684105782.05, chirp=0, fft_len=128k
Spike: peak=40.94735, time=28.63, d_freq=1684106456.79, chirp=0, fft_len=128k
Spike: peak=31.99356, time=40.09, d_freq=1684097430.88, chirp=0, fft_len=128k
Spike: peak=31.74863, time=51.54, d_freq=1684105388.36, chirp=0, fft_len=128k
Spike: peak=28.48955, time=62.99, d_freq=1684096109.33, chirp=0, fft_len=128k
Autocorr: peak=155.0494, time=5.727, delay=0.28451, d_freq=1684101103.99, chirp=0.0012693, fft_len=128k
Autocorr: peak=207.8813, time=17.18, delay=0.28451, d_freq=1684101104.01, chirp=0.0012693, fft_len=128k
Autocorr: peak=240.4265, time=28.63, delay=0.28451, d_freq=1684101104.02, chirp=0.0012693, fft_len=128k
Autocorr: peak=101.7792, time=40.09, delay=0.28451, d_freq=1684101104.04, chirp=0.0012693, fft_len=128k
Autocorr: peak=94.91109, time=51.54, delay=0.28451, d_freq=1684101104.05, chirp=0.0012693, fft_len=128k
ID: 1779573 · Report as offensive
Bruce
Volunteer tester

Send message
Joined: 15 Mar 02
Posts: 123
Credit: 124,955,234
RAC: 11
United States
Message 1779587 - Posted: 16 Apr 2016, 10:57:39 UTC - in response to Message 1779565.  

First of all you need to remove -no_cpu_lock.
Also period_iterations_num 20 is a little low.
Increase it to 50 or better 80 for SoG.


Hi Mike.

I do not use -no_cpu_lock in my command line, I run a Titan-Z and not a ATI video card.
My off line testing has shown, that with my setup, period_iterations_num 20 seems to work better than the default of 50 or higher numbers.

Here is my command line: -sbs 384 -pref_wg_size 128 -period_iterations_num 20 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64.

Of course, I may need to adjust my tuning for the new r3430_SoG app. I haven't done much of that yet. My old AMD processors -FX-74's - act differently than Intel CPUs, so I have the usual problems with open_cl.
Bruce
ID: 1779587 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1779596 - Posted: 16 Apr 2016, 12:50:56 UTC - in response to Message 1779587.  


Here is my command line: -sbs 384 -pref_wg_size 128 -period_iterations_num 20 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64.


What other numbers did you try for bolded values already?
ID: 1779596 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34379
Credit: 79,922,639
RAC: 80
Germany
Message 1779601 - Posted: 16 Apr 2016, 13:50:53 UTC - in response to Message 1779587.  
Last modified: 16 Apr 2016, 13:51:30 UTC

First of all you need to remove -no_cpu_lock.
Also period_iterations_num 20 is a little low.
Increase it to 50 or better 80 for SoG.


Hi Mike.

I do not use -no_cpu_lock in my command line, I run a Titan-Z and not a ATI video card.
My off line testing has shown, that with my setup, period_iterations_num 20 seems to work better than the default of 50 or higher numbers.

Here is my command line: -sbs 384 -pref_wg_size 128 -period_iterations_num 20 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64.

Of course, I may need to adjust my tuning for the new r3430_SoG app. I haven't done much of that yet. My old AMD processors -FX-74's - act differently than Intel CPUs, so I have the usual problems with open_cl.


So check with task manager on which cores 3430 is pinned at.
You are missing this line

CPU affinity adjustment enabled
.
.
Info: CPU affinity mask used: 2; system mask is ff

CPU affinity should be enabled by default.
It is important for FX CPU`s.
Have no clue if NV app is different in this case.

So i suggest to add -cpu_lock to your comand line switches.


With each crime and every kindness we birth our future.
ID: 1779601 · Report as offensive
W3Perl Project Donor
Volunteer tester

Send message
Joined: 29 Apr 99
Posts: 251
Credit: 3,696,783,867
RAC: 12,606
France
Message 1779656 - Posted: 16 Apr 2016, 16:30:30 UTC

Hi,

I have also some computation error using opencl_nvidia_sah using blc wu.

https://setiweb.ssl.berkeley.edu/beta//results.php?userid=38948&offset=0&show_names=0&state=6&appid=

I use :
-use_sleep_ex 2 -sbs 192 -spike_fft_thresh 2048 -tune 1 64 1 4
with my GTX 950.

Hope it could help.
ID: 1779656 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1779660 - Posted: 16 Apr 2016, 16:48:38 UTC - in response to Message 1779565.  


First of all you need to remove -no_cpu_lock.
Also period_iterations_num 20 is a little low.
Increase it to 50 or better 80 for SoG.


Thanks Mike, will make that change.

Will Also try with and without the -no_cpu_lock just to see how they do.

Looks like another day of full testing to see how they go.



Mike here is the new Commandline I will use, look ok?

-sbs 512 -period_iterations_num 80 _spike_fft_thresh 8192 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp

Alright back to testing...
ID: 1779660 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 18 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.