Message boards :
Number crunching :
OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 18 · Next
Author | Message |
---|---|
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Kinda, but in the long run, it is still faster than the CPU version. Regarding Guppi... 1 work unit CPU is 55 minutes vs 14 minutes GPU where issues arises is when comparing how fast non vlars are crunched, especially when multiple instances. I would think a separate plan class would be need in the app_config if one planned on running the Guppi along with non-vlars on a low core system and the configurations would have to be worked out. Rough idea would be along the lines 1 GPU on a dual core, single instance 1 GPU in a 4 core CPU probably not a problem 2 GPU in a 4 core, ok as long as not running multiple instance 2 GPU in a 8 core, probably ok 3 GPU in a 8 core, might be manageable but limited instances 3 GPU on a 12 core, mangable 4 GPU on a 12 core, won't recommend it other than single instance or limited 4 GPU on a 16 core, might be possible but limited instances I didn't throw in 6 cores but they would be between the 2 and 3 GPU set ups. But here's the good news, with each year we get better equipment that allows us to build upon these things. So who is to say what we can do in 1-2 years time. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
I am running vintage equipment. And I cannot afford to upgrade. The kitty farm is what it is. 9 old rigs getting older by the day. Every day I wake up and don't find one crashed is a good day. If some think that it is OK to spend CPU cycles to support a weak GPU app, I am afraid I cannot agree. The GPU apps should be better able to stand on their own with minimal CPU support. That is why they are GPU apps. "Time is simply the mechanism that keeps everything from happening all at once." |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
It's the GBT data that is requiring such large amount of CPU usage. On normal MB, the SoG uses very little CPU time. Actually, it's faster than cuda on nonvlar MB. But it's main purpose is those VLARs. So to crunch Guppi on GPU or not to crunch Guppi on GPU that is the question, lol.... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Try with -use_sleep and increased sizes of PulseFind kernel (-sbs 512 for example) |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
So 3 per Card generates 36-37 minutes, about 1 minute faster than running a single Guppi by itself on the GPU I need to correct this.. I was using commandlines supplied by Mike at this point When I ran this same experiment without commandlines, the time to complete were 46-47 minutes Going to try Raistmer's recommendations now... |
Bruce Send message Joined: 15 Mar 02 Posts: 123 Credit: 124,955,234 RAC: 11 |
I'm not seeing the same results that you are, but then I am using AMD cpus and not Intel. The r3430_SoG seems to be running just like the r3401_SoG. The shorties run pretty quick and use very little cpu resources. When you get past those and into the mid and lower AR it takes a full core per WU. This seems to be the same problem that I have always had running APs. I am still using the same command lines that I used for r3401, so may need to retune, but I don't expect any reduction in cpu usage. Just tried a quick test using the sleep switch and it does not seem to work for me, still used a full core per. Bruce |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Hey Bruce, Sorry I should have specified that I was talking about SoG for Nvidia. I don't know how they do for ATI.. I'm restarting my test with the -use_sleep and nothing else and will progress over the evening to see how it does. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
-use_sleep can be used along with full tuning line. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Was just testing them to see how they all combine. I've found that -use_sleep with -sbs 512 along with the command line Mike gave me works the best if I use the -use_sleep 16 minutes with 3-5% CPU usage running 1 work unit per card at 3 work units per card 38 minutes average with 3% CPU along with -use_sleep -sbs 512 and command line I'm going to try again 1, 2, and 3 at a time per card but without the -use_sleep Tomorrow I will post result in a better format. I've run into a problem this evening with over 12 errors, not sure why it occurred was only using -use_sleep and nothing else. Maybe a bad batch of work,not sure. Will have to wait and see if wingmen also error out or if they complete the work units. Edit... Just check those that error, some of the wingman also errored out, only arm wingmen seemed to have completed them but in very very short time. So I have to doubt those results. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Looks like a bad batch of GBT on beta... They start off normal then quickly CPU usage does all the way down to below 1%...and I'm not using -use_sleep on these so they should be using close to 97% of a core each... I've seen this happen with all the ones that have errored out tonight. At first I thought it was the -use_sleep but I removed it and restarted the machine. But the errors continue to happen, plus I can see some of my wingmen are erroring out too. Anyone else seeing these? I'm not currently crunching any GBT CPU work units on Main so don't know if the same thing is happening here... Edit... Going to run out of work soon on Beta due to the high number of errors, restricted number of work units downloaded. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
Looks like a bad batch of GBT on beta... On Beta, 10 of ~150 GPU WUs (both SAH & SOG) errored out with "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance" the rest validated OK. Duplicated this on at least 2 of 4 machines crunching beta. Also saw some errors like that on GBT work done on main (CPU). Guess it's a GUPPI issue, not just GPU or CPU. Vanilla setup here, no special command line info going on. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message. Zalster's are Error tasks for computer 75417: the error message is "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance". You'll be letting both Eric and Raistmer know, of course? |
Mike Send message Joined: 17 Feb 01 Posts: 34353 Credit: 79,922,639 RAC: 80 |
I'm not seeing the same results that you are, but then I am using AMD cpus and not Intel. First of all you need to remove -no_cpu_lock. Also period_iterations_num 20 is a little low. Increase it to 50 or better 80 for SoG. With each crime and every kindness we birth our future. |
Mike Send message Joined: 17 Feb 01 Posts: 34353 Credit: 79,922,639 RAC: 80 |
Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message. CPU affinity adjustment disabled Its the same here. Running multiple instances on GPU requires enabled cpu affinty. A bug in cpu affinty adjustment has been fixed since r_3391. So remove -no_cpu_lock. With each crime and every kindness we birth our future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message. I think it's autocorr protection false positive. We debated with Joe level of this protection some time ago when there were few false positives in usual Arecibo's data. Instead of all other types of protection levels there is no sensible theoretical limits for autocorr value that we can expect from data. So, chosen one was only our guess what it could be. I already increased it some time ago. Apparently - not high enough. Maybe this sanity check should be disabled completely. I'll try to consult with Eric on this topic again. //R: sanity check for found result if(swi.analysis_cfg.autocorr_fftlen==131072 && ai.a.peak_power > 135.0){ //R: it's possible for good result to have >135 in autocorr though it's rare event //so check if too much signals found later was_big_autocorr++; //boinc_temporary_exit(5*60,"Suspicious autocorr results, host needs reboot or maintenance"); }
|
Bruce Send message Joined: 15 Mar 02 Posts: 123 Credit: 124,955,234 RAC: 11 |
First of all you need to remove -no_cpu_lock. Hi Mike. I do not use -no_cpu_lock in my command line, I run a Titan-Z and not a ATI video card. My off line testing has shown, that with my setup, period_iterations_num 20 seems to work better than the default of 50 or higher numbers. Here is my command line: -sbs 384 -pref_wg_size 128 -period_iterations_num 20 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64. Of course, I may need to adjust my tuning for the new r3430_SoG app. I haven't done much of that yet. My old AMD processors -FX-74's - act differently than Intel CPUs, so I have the usual problems with open_cl. Bruce |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
What other numbers did you try for bolded values already? |
Mike Send message Joined: 17 Feb 01 Posts: 34353 Credit: 79,922,639 RAC: 80 |
First of all you need to remove -no_cpu_lock. So check with task manager on which cores 3430 is pinned at. You are missing this line CPU affinity adjustment enabled . . Info: CPU affinity mask used: 2; system mask is ff CPU affinity should be enabled by default. It is important for FX CPU`s. Have no clue if NV app is different in this case. So i suggest to add -cpu_lock to your comand line switches. With each crime and every kindness we birth our future. |
W3Perl Send message Joined: 29 Apr 99 Posts: 251 Credit: 3,696,783,867 RAC: 12,606 |
Hi, I have also some computation error using opencl_nvidia_sah using blc wu. https://setiweb.ssl.berkeley.edu/beta//results.php?userid=38948&offset=0&show_names=0&state=6&appid= I use : -use_sleep_ex 2 -sbs 192 -spike_fft_thresh 2048 -tune 1 64 1 4 with my GTX 950. Hope it could help. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Thanks Mike, will make that change. Will Also try with and without the -no_cpu_lock just to see how they do. Looks like another day of full testing to see how they go. Mike here is the new Commandline I will use, look ok? -sbs 512 -period_iterations_num 80 _spike_fft_thresh 8192 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp Alright back to testing... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.