OpenCL NV MultiBeam v8 SoG edition for Windows

Author	Message
Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1779380 - Posted: 15 Apr 2016, 20:05:41 UTC - in response to Message 1779374. Last modified: 15 Apr 2016, 20:06:05 UTC Kinda, but in the long run, it is still faster than the CPU version. Regarding Guppi... 1 work unit CPU is 55 minutes vs 14 minutes GPU where issues arises is when comparing how fast non vlars are crunched, especially when multiple instances. I would think a separate plan class would be need in the app_config if one planned on running the Guppi along with non-vlars on a low core system and the configurations would have to be worked out. Rough idea would be along the lines 1 GPU on a dual core, single instance 1 GPU in a 4 core CPU probably not a problem 2 GPU in a 4 core, ok as long as not running multiple instance 2 GPU in a 8 core, probably ok 3 GPU in a 8 core, might be manageable but limited instances 3 GPU on a 12 core, mangable 4 GPU on a 12 core, won't recommend it other than single instance or limited 4 GPU on a 16 core, might be possible but limited instances I didn't throw in 6 cores but they would be between the 2 and 3 GPU set ups. But here's the good news, with each year we get better equipment that allows us to build upon these things. So who is to say what we can do in 1-2 years time. ID: 1779380 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1779383 - Posted: 15 Apr 2016, 20:09:03 UTC - in response to Message 1779380. Last modified: 15 Apr 2016, 20:11:12 UTC I am running vintage equipment. And I cannot afford to upgrade. The kitty farm is what it is. 9 old rigs getting older by the day. Every day I wake up and don't find one crashed is a good day. If some think that it is OK to spend CPU cycles to support a weak GPU app, I am afraid I cannot agree. The GPU apps should be better able to stand on their own with minimal CPU support. That is why they are GPU apps. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1779383 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1779394 - Posted: 15 Apr 2016, 20:35:16 UTC - in response to Message 1779383. It's the GBT data that is requiring such large amount of CPU usage. On normal MB, the SoG uses very little CPU time. Actually, it's faster than cuda on nonvlar MB. But it's main purpose is those VLARs. So to crunch Guppi on GPU or not to crunch Guppi on GPU that is the question, lol.... ID: 1779394 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1779405 - Posted: 15 Apr 2016, 21:08:03 UTC - in response to Message 1779394. Try with -use_sleep and increased sizes of PulseFind kernel (-sbs 512 for example) ID: 1779405 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1779418 - Posted: 15 Apr 2016, 21:59:13 UTC - in response to Message 1779371. Last modified: 15 Apr 2016, 21:59:30 UTC So 3 per Card generates 36-37 minutes, about 1 minute faster than running a single Guppi by itself on the GPU CPU utilization is down anywhere from 80-92% of a CPU rather than 97% Only problem is, this require large number of CPU cores to make this happen. I need to correct this.. I was using commandlines supplied by Mike at this point When I ran this same experiment without commandlines, the time to complete were 46-47 minutes Going to try Raistmer's recommendations now... ID: 1779418 ·

Bruce Volunteer tester Send message Joined: 15 Mar 02 Posts: 123 Credit: 124,955,234 RAC: 11	Message 1779433 - Posted: 15 Apr 2016, 22:53:46 UTC I'm not seeing the same results that you are, but then I am using AMD cpus and not Intel. The r3430_SoG seems to be running just like the r3401_SoG. The shorties run pretty quick and use very little cpu resources. When you get past those and into the mid and lower AR it takes a full core per WU. This seems to be the same problem that I have always had running APs. I am still using the same command lines that I used for r3401, so may need to retune, but I don't expect any reduction in cpu usage. Just tried a quick test using the sleep switch and it does not seem to work for me, still used a full core per. *Bruce* ID: 1779433 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1779448 - Posted: 15 Apr 2016, 23:56:14 UTC - in response to Message 1779433. Hey Bruce, Sorry I should have specified that I was talking about SoG for Nvidia. I don't know how they do for ATI.. I'm restarting my test with the -use_sleep and nothing else and will progress over the evening to see how it does. ID: 1779448 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1779543 - Posted: 16 Apr 2016, 6:04:55 UTC - in response to Message 1779448. I'm restarting my test with the -use_sleep and nothing else and will progress over the evening to see how it does. -use_sleep can be used along with full tuning line. ID: 1779543 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1779545 - Posted: 16 Apr 2016, 6:22:47 UTC - in response to Message 1779543. Last modified: 16 Apr 2016, 6:28:45 UTC Was just testing them to see how they all combine. I've found that -use_sleep with -sbs 512 along with the command line Mike gave me works the best if I use the -use_sleep 16 minutes with 3-5% CPU usage running 1 work unit per card at 3 work units per card 38 minutes average with 3% CPU along with -use_sleep -sbs 512 and command line I'm going to try again 1, 2, and 3 at a time per card but without the -use_sleep Tomorrow I will post result in a better format. I've run into a problem this evening with over 12 errors, not sure why it occurred was only using -use_sleep and nothing else. Maybe a bad batch of work,not sure. Will have to wait and see if wingmen also error out or if they complete the work units. Edit... Just check those that error, some of the wingman also errored out, only arm wingmen seemed to have completed them but in very very short time. So I have to doubt those results. ID: 1779545 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1779548 - Posted: 16 Apr 2016, 6:44:08 UTC Last modified: 16 Apr 2016, 6:44:59 UTC Looks like a bad batch of GBT on beta... They start off normal then quickly CPU usage does all the way down to below 1%...and I'm not using -use_sleep on these so they should be using close to 97% of a core each... I've seen this happen with all the ones that have errored out tonight. At first I thought it was the -use_sleep but I removed it and restarted the machine. But the errors continue to happen, plus I can see some of my wingmen are erroring out too. Anyone else seeing these? I'm not currently crunching any GBT CPU work units on Main so don't know if the same thing is happening here... Edit... Going to run out of work soon on Beta due to the high number of errors, restricted number of work units downloaded. ID: 1779548 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1779559 - Posted: 16 Apr 2016, 7:28:05 UTC - in response to Message 1779548. Looks like a bad batch of GBT on beta... Anyone else seeing these? I'm not currently crunching any GBT CPU work units on Main so don't know if the same thing is happening here... On Beta, 10 of ~150 GPU WUs (both SAH & SOG) errored out with "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance" the rest validated OK. Duplicated this on at least 2 of 4 machines crunching beta. Also saw some errors like that on GBT work done on main (CPU). Guess it's a GUPPI issue, not just GPU or CPU. Vanilla setup here, no special command line info going on. ID: 1779559 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1779564 - Posted: 16 Apr 2016, 8:24:53 UTC Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message. Zalster's are Error tasks for computer 75417: the error message is "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance". You'll be letting both Eric and Raistmer know, of course? ID: 1779564 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1779565 - Posted: 16 Apr 2016, 8:31:12 UTC - in response to Message 1779433. I'm not seeing the same results that you are, but then I am using AMD cpus and not Intel. The r3430_SoG seems to be running just like the r3401_SoG. The shorties run pretty quick and use very little cpu resources. When you get past those and into the mid and lower AR it takes a full core per WU. This seems to be the same problem that I have always had running APs. I am still using the same command lines that I used for r3401, so may need to retune, but I don't expect any reduction in cpu usage. Just tried a quick test using the sleep switch and it does not seem to work for me, still used a full core per. First of all you need to remove -no_cpu_lock. Also period_iterations_num 20 is a little low. Increase it to 50 or better 80 for SoG. With each crime and every kindness we birth our future. ID: 1779565 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1779567 - Posted: 16 Apr 2016, 8:34:53 UTC - in response to Message 1779564. Last modified: 16 Apr 2016, 8:39:35 UTC Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message. Zalster's are Error tasks for computer 75417: the error message is "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance". You'll be letting both Eric and Raistmer know, of course? CPU affinity adjustment disabled Its the same here. Running multiple instances on GPU requires enabled cpu affinty. A bug in cpu affinty adjustment has been fixed since r_3391. So remove -no_cpu_lock. With each crime and every kindness we birth our future. ID: 1779567 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1779573 - Posted: 16 Apr 2016, 9:05:00 UTC - in response to Message 1779564. Last modified: 16 Apr 2016, 9:33:24 UTC Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message. Zalster's are Error tasks for computer 75417: the error message is "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance". You'll be letting both Eric and Raistmer know, of course? I think it's autocorr protection false positive. We debated with Joe level of this protection some time ago when there were few false positives in usual Arecibo's data. Instead of all other types of protection levels there is no sensible theoretical limits for autocorr value that we can expect from data. So, chosen one was only our guess what it could be. I already increased it some time ago. Apparently - not high enough. Maybe this sanity check should be disabled completely. I'll try to consult with Eric on this topic again. //R: sanity check for found result if(swi.analysis_cfg.autocorr_fftlen==131072 && ai.a.peak_power > 135.0){ //R: it's possible for good result to have >135 in autocorr though it's rare event //so check if too much signals found later was_big_autocorr++; //boinc_temporary_exit(560,"Suspicious autocorr results, host needs reboot or maintenance"); } period_iterations_num=40 Spike: peak=26.84568, time=7.158, d_freq=1684106822.19, chirp=0, fft_len=32k Spike: peak=24.5848, time=10.02, d_freq=1684097740.05, chirp=0, fft_len=32k Spike: peak=24.25558, time=12.88, d_freq=1684095968.67, chirp=0, fft_len=32k Spike: peak=29.58042, time=18.61, d_freq=1684105388.54, chirp=0, fft_len=32k Spike: peak=25.99703, time=21.47, d_freq=1684098330.62, chirp=0, fft_len=32k Spike: peak=30.16515, time=30.06, d_freq=1684106456.88, chirp=0, fft_len=32k Spike: peak=24.45428, time=41.52, d_freq=1684105782.14, chirp=0, fft_len=32k Spike: peak=32.34881, time=8.59, d_freq=1684096755.87, chirp=0, fft_len=64k Spike: peak=43.98743, time=14.32, d_freq=1684096109.25, chirp=0, fft_len=64k Spike: peak=46.22211, time=20.04, d_freq=1684105388.36, chirp=0, fft_len=64k Spike: peak=38.42199, time=25.77, d_freq=1684103954.18, chirp=0, fft_len=64k Spike: peak=38.20225, time=31.5, d_freq=1684106456.88, chirp=0, fft_len=64k Spike: peak=34.22185, time=42.95, d_freq=1684104966.65, chirp=0, fft_len=64k Autocorr: peak=154.4745, time=5.727, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k Autocorr: peak=209.9964, time=17.18, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k Autocorr: peak=240.9282, time=28.63, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k Autocorr: peak=102.848, time=40.09, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k Autocorr: peak=93.54303, time=51.54, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k Autocorr: peak=58.89613, time=62.99, delay=0.28451, d_freq=1684101103.99, chirp=0, fft_len=128k Spike: peak=44.38844, time=5.727, d_freq=1684105781.87, chirp=0, fft_len=128k Spike: peak=54.02288, time=17.18, d_freq=1684105782.05, chirp=0, fft_len=128k Spike: peak=40.94735, time=28.63, d_freq=1684106456.79, chirp=0, fft_len=128k Spike: peak=31.99356, time=40.09, d_freq=1684097430.88, chirp=0, fft_len=128k Spike: peak=31.74863, time=51.54, d_freq=1684105388.36, chirp=0, fft_len=128k Spike: peak=28.48955, time=62.99, d_freq=1684096109.33, chirp=0, fft_len=128k Autocorr: peak=155.0494, time=5.727, delay=0.28451, d_freq=1684101103.99, chirp=0.0012693, fft_len=128k Autocorr: peak=207.8813, time=17.18, delay=0.28451, d_freq=1684101104.01, chirp=0.0012693, fft_len=128k Autocorr: peak=240.4265*, time=28.63, delay=0.28451, d_freq=1684101104.02, chirp=0.0012693, fft_len=128k Autocorr: peak=101.7792, time=40.09, delay=0.28451, d_freq=1684101104.04, chirp=0.0012693, fft_len=128k Autocorr: peak=94.91109, time=51.54, delay=0.28451, d_freq=1684101104.05, chirp=0.0012693, fft_len=128k ID: 1779573 ·

Bruce Volunteer tester Send message Joined: 15 Mar 02 Posts: 123 Credit: 124,955,234 RAC: 11	Message 1779587 - Posted: 16 Apr 2016, 10:57:39 UTC - in response to Message 1779565. First of all you need to remove -no_cpu_lock. Also period_iterations_num 20 is a little low. Increase it to 50 or better 80 for SoG. Hi Mike. I do not use -no_cpu_lock in my command line, I run a Titan-Z and not a ATI video card. My off line testing has shown, that with my setup, period_iterations_num 20 seems to work better than the default of 50 or higher numbers. Here is my command line: -sbs 384 -pref_wg_size 128 -period_iterations_num 20 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64. Of course, I may need to adjust my tuning for the new r3430_SoG app. I haven't done much of that yet. My old AMD processors -FX-74's - act differently than Intel CPUs, so I have the usual problems with open_cl. *Bruce* ID: 1779587 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1779596 - Posted: 16 Apr 2016, 12:50:56 UTC - in response to Message 1779587. Here is my command line: -sbs 384 -pref_wg_size 128 -period_iterations_num 20 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64. What other numbers did you try for bolded values already? ID: 1779596 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1779601 - Posted: 16 Apr 2016, 13:50:53 UTC - in response to Message 1779587. Last modified: 16 Apr 2016, 13:51:30 UTC First of all you need to remove -no_cpu_lock. Also period_iterations_num 20 is a little low. Increase it to 50 or better 80 for SoG. Hi Mike. I do not use -no_cpu_lock in my command line, I run a Titan-Z and not a ATI video card. My off line testing has shown, that with my setup, period_iterations_num 20 seems to work better than the default of 50 or higher numbers. Here is my command line: -sbs 384 -pref_wg_size 128 -period_iterations_num 20 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64. Of course, I may need to adjust my tuning for the new r3430_SoG app. I haven't done much of that yet. My old AMD processors -FX-74's - act differently than Intel CPUs, so I have the usual problems with open_cl. So check with task manager on which cores 3430 is pinned at. You are missing this line CPU affinity adjustment enabled . . Info: CPU affinity mask used: 2; system mask is ff CPU affinity should be enabled by default. It is important for FX CPU`s. Have no clue if NV app is different in this case. So i suggest to add -cpu_lock to your comand line switches. With each crime and every kindness we birth our future. ID: 1779601 ·

W3Perl Volunteer tester Send message Joined: 29 Apr 99 Posts: 251 Credit: 3,696,783,867 RAC: 12,606	Message 1779656 - Posted: 16 Apr 2016, 16:30:30 UTC Hi, I have also some computation error using opencl_nvidia_sah using blc wu. https://setiweb.ssl.berkeley.edu/beta//results.php?userid=38948&offset=0&show_names=0&state=6&appid= I use : -use_sleep_ex 2 -sbs 192 -spike_fft_thresh 2048 -tune 1 64 1 4 with my GTX 950. Hope it could help. ID: 1779656 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1779660 - Posted: 16 Apr 2016, 16:48:38 UTC - in response to Message 1779565. First of all you need to remove -no_cpu_lock. Also period_iterations_num 20 is a little low. Increase it to 50 or better 80 for SoG. Thanks Mike, will make that change. Will Also try with and without the -no_cpu_lock just to see how they do. Looks like another day of full testing to see how they go. Mike here is the new Commandline I will use, look ok? -sbs 512 -period_iterations_num 80 _spike_fft_thresh 8192 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp Alright back to testing... ID: 1779660 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.