Message boards :
Number crunching :
OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 21 · Next
Author | Message |
---|---|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14474 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message. Zalster's are Error tasks for computer 75417: the error message is "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance". You'll be letting both Eric and Raistmer know, of course? |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 33273 Credit: 79,922,639 RAC: 80 ![]() ![]() |
I'm not seeing the same results that you are, but then I am using AMD cpus and not Intel. First of all you need to remove -no_cpu_lock. Also period_iterations_num 20 is a little low. Increase it to 50 or better 80 for SoG. With each crime and every kindness we birth our future. |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 33273 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message. CPU affinity adjustment disabled Its the same here. Running multiple instances on GPU requires enabled cpu affinty. A bug in cpu affinty adjustment has been fixed since r_3391. So remove -no_cpu_lock. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Well, this is what Beta testing is for - to see if the applications can cope with all the data that is thrown at them, or at least expire gracefully with a meaningful error message. I think it's autocorr protection false positive. We debated with Joe level of this protection some time ago when there were few false positives in usual Arecibo's data. Instead of all other types of protection levels there is no sensible theoretical limits for autocorr value that we can expect from data. So, chosen one was only our guess what it could be. I already increased it some time ago. Apparently - not high enough. Maybe this sanity check should be disabled completely. I'll try to consult with Eric on this topic again. //R: sanity check for found result if(swi.analysis_cfg.autocorr_fftlen==131072 && ai.a.peak_power > 135.0){ //R: it's possible for good result to have >135 in autocorr though it's rare event //so check if too much signals found later was_big_autocorr++; //boinc_temporary_exit(5*60,"Suspicious autocorr results, host needs reboot or maintenance"); }
|
Bruce Send message Joined: 15 Mar 02 Posts: 123 Credit: 124,955,234 RAC: 11 ![]() |
First of all you need to remove -no_cpu_lock. Hi Mike. I do not use -no_cpu_lock in my command line, I run a Titan-Z and not a ATI video card. My off line testing has shown, that with my setup, period_iterations_num 20 seems to work better than the default of 50 or higher numbers. Here is my command line: -sbs 384 -pref_wg_size 128 -period_iterations_num 20 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64. Of course, I may need to adjust my tuning for the new r3430_SoG app. I haven't done much of that yet. My old AMD processors -FX-74's - act differently than Intel CPUs, so I have the usual problems with open_cl. Bruce |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
What other numbers did you try for bolded values already? |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 33273 Credit: 79,922,639 RAC: 80 ![]() ![]() |
First of all you need to remove -no_cpu_lock. So check with task manager on which cores 3430 is pinned at. You are missing this line CPU affinity adjustment enabled . . Info: CPU affinity mask used: 2; system mask is ff CPU affinity should be enabled by default. It is important for FX CPU`s. Have no clue if NV app is different in this case. So i suggest to add -cpu_lock to your comand line switches. With each crime and every kindness we birth our future. |
W3Perl ![]() Send message Joined: 29 Apr 99 Posts: 251 Credit: 3,696,783,867 RAC: 12,606 ![]() ![]() |
Hi, I have also some computation error using opencl_nvidia_sah using blc wu. https://setiweb.ssl.berkeley.edu/beta//results.php?userid=38948&offset=0&show_names=0&state=6&appid= I use : -use_sleep_ex 2 -sbs 192 -spike_fft_thresh 2048 -tune 1 64 1 4 with my GTX 950. Hope it could help. ![]() |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5516 Credit: 528,817,460 RAC: 242 ![]() ![]() |
Thanks Mike, will make that change. Will Also try with and without the -no_cpu_lock just to see how they do. Looks like another day of full testing to see how they go. Mike here is the new Commandline I will use, look ok? -sbs 512 -period_iterations_num 80 _spike_fft_thresh 8192 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp Alright back to testing... |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5516 Credit: 528,817,460 RAC: 242 ![]() ![]() |
You'll be letting both Eric and Raistmer know, of course? Raistmer knows by now, lol... Why I posted over here, the Beta site had not been getting much traffic in the Message boards. Of course that has now change ;) |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 33273 Credit: 79,922,639 RAC: 80 ![]() ![]() |
-spike_fft_thresh 8192 looks a bit high to me. Check the first char _ instead of - With each crime and every kindness we birth our future. |
Bruce Send message Joined: 15 Mar 02 Posts: 123 Credit: 124,955,234 RAC: 11 ![]() |
Hi Raistmer. Please keep in mind that this command line is the tune that I used for r3401_SoG, and that I have not done any retesting to speak of for r3430_SoG yet. I don't think you made any drastic changes in the update, so do not expect any major changes in the tune, if any. For sbs I tried -sbs 96 thru -sbs 1664 in increments of 32. The ones that worked best are -sbs 256 and/or -sbs 384. For wg_size I tried -pref_wg_size 32 (default?) thru -pref_wg_size 1024 in increments of 32. The one that worked best is the -pref_wg_size 128. Hopefully this next week I can sit down and retest for the r3430_SoG app. These settings may be specific to my particular hardware and software, and might not work the same on something else. @Mike According to Task Manager each instance of r3430 (2) is using a full core, mid AR work units, that is 25% each of my total core available (4 cores). The work load seems to be fairly distributed across all four cores. One core is just slightly higher than the other three, but not by much. This seems like a good thing to me. I will try the cpu_lock in my next round of testing. Many thanks to both Raistmer and Mike. Bruce |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Both these values can be sensible to GBT data/VLAR so pay attention to type of task you use for re-tuning. Best tuning to GBT/VLAR could be slightly different than ordinary one for mix of all ranges of AR. If we will have continuos stream of GBT/VLAR data, tuning specially to GBT/VLAR could make sense. |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5516 Credit: 528,817,460 RAC: 242 ![]() ![]() |
Sorry about that Mike, was a misprint while typing it in, correct on my computer, just my little finger pushing down while I types, lol... In other news, -cpu_lock is still having issues once work units numbers get passed actual # of cores. Not good for multi-GPU machine with small CPU core. So I've removed it from now my system for now. Single GPU system may find it useful but not for my Mega Crunchers. Trying to test the different configs but Rain brings in the crowds so not a lot of free time right now. Will post results when I get the change, probably late tonight. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Please make more detailed reports. What exactly was wrong? |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5516 Credit: 528,817,460 RAC: 242 ![]() ![]() |
cpu lock is good as long as # of work units is less than or equal to the number of actual physical cores. (ie HT has no effect here, it's the actual physical cores we are dealing with) If the number of work units exceeds the number of actual physical cores then those extra work units will work to completion without cpu lock, but when a new work unit starts, it will start with cpu lock and "kick" of of the older "cpu_lock" work units off the cpu and it will then default to zero and start from scratch (prolonging the time to complete) It's hard to explain but easy to see when you watch work progress on BoincTask. You can actually see the work units progress by time elapsed and when an non cpu_lock work until completes and a new one starts at the bottom of the chain, it pushes a cpu_lock work unit off the core and it starts again from zero but time passed continues. Example I have an Intel 8 core hyperthreaded to 16 I have 4 GPUs in the computer If I run 2 work units per card then I have 8 total work units and cpu_lock works as predicted. When I run 3 work units per card then I have 12 total work units. This means I have 4 more work units than "actual" cores. 2 of the 3 work units are cpu_lock and the 3rd is unlock Looking at all 4 GPUs, 2 of the 3 are lock and the 3rd on each are unlock. The unlock work unit will progress much faster and complete quicker than the cpu_locked work units When a new work unit is started on each GPU, one of the formerly "cpu_locked" work units gets bumped off the cpu_lock for the new work unit. That old work units now is unlocked and must start from scratch. This gets worse if you were to go to 4 work units per GPU, ie 2 are "cpu lock" and 2 are "unlocked" |
Grumpy Swede (I stand with Ukraine) ![]() Send message Joined: 1 Nov 08 Posts: 8928 Credit: 49,849,242 RAC: 65 ![]() ![]() |
Maybe: -total_GPU_instances_num N : To use together with -cpu_lock on multi-vendor GPU hosts. Set N to total number of simultaneously running GPU OpenCL SETI apps for host (total among all used GPU of all vendors). App needs to know this number to properly select logical CPU for execution in affinity-management (-cpu_lock) mode. Should not exceed 64. And of course the important: -instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances). N - integer number of allowed instances. Should not exceed 64. ![]() |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
cpu lock is good as long as # of work units is less than or equal to the number of actual physical cores. (ie HT has no effect here, it's the actual physical cores we are dealing with) Sorry, but your explanation in terms of "locked" and "unlocked" doesn't correspond to pattern one could expect from CPU affinity code at all. Please, could you provide screenshots of TaskManager with process affinity dialog showing affinity of task you named "unlocked" one? And please provide links to those particular tasks you observed during description of situation. I'd like to look stderrs. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Maybe: yep. CPUlock will hardly work correctly w/o knowing number of instances per GPU. |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5516 Credit: 528,817,460 RAC: 242 ![]() ![]() |
I understand that. Expected vs actual Why we test these things. I will try to get you those but that's about 3 hours worth of work that I can't spare just yet. Probably later tonight |
©2022 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.