Message boards :
Number crunching :
Lunatics Windows Installer v0.43 Release Notes
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next
Author | Message |
---|---|
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I have lowered my ffa_block and -ffa_block_fetch to what I had before on AP v6. Those that fix the problem in your host too? Just curiosity. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Nice, it´s about the same here i could live with a small memory hugg. So we have a way to partialy fix the problem. So now the question to answer is: Why on Mike´s hosts the WU runs without any problem and in ours the memory hugging apears? But that´s is for the dev team, maybe someday they found the answer. Thanks for your help. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I have lowered my ffa_block and -ffa_block_fetch to what I had before on AP v6. I seem to have one taking more memory than normal. Normally they run under 100mb, in fact, another one is running at 33v & 54real at the same time this other one is running at 339v & 359real. The one hogging is ap_21jn14aa_B4_P1_00143... It's running faster than normal, so, it must have a high count. My settings on the Mac are -unroll 16 -oclFFT_plan 256 16 256 -ffa_block 3584 -ffa_block_fetch 1792 -sbs 256 And here it is, http://setiathome.berkeley.edu/result.php?resultid=3797431768 single pulses: 23 repetitive pulses: 30 percent blanked: 7.45 This task consumed ~10 times as much Memory as a low count task...in Yosemite. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
-oclFFT_plan is case sensitive. Uh, oh. I actually just cut and pasted the recommendations exactly the way you had provided them over in The GTX750(Ti) Thread for my two boxes, one being -use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 and the other being -use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 Could that be why I've been seeing about a 15%-40% increase in my AP run times on those boxes? (Huge decrease in CPU times, though.) |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
I think this is host dependend. You don't follow your own advice ;) You often make the typo "ffa_fetch"/"ffa_fetch_block" instead of ffa_block/ffa_block_fetch (I can't believe you type such a long line of switches, why not copy from your working ap_cmdline*.txt file? (and just change numbers if needed)) But if app ignores -oclfft_plan it will probably safely ignore also ffa_fetch/ffa_fetch_block ? Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
I think this is host dependend. Yes, it seems i was a bit to tired for those long conversations. I`m very sorry. With each crime and every kindness we birth our future. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ I understand what you say. Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great. With each crime and every kindness we birth our future. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
-oclFFT_plan is case sensitive. The -use_sleep command will cause the time to increase, but will allow you to use the CPU that is normally dedicated to the app for something else. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
-oclFFT_plan is case sensitive. Well, since the GPUs are the real workhorses, I'm afraid the modest gains in CPU availability won't offset the large increase in run times on the GPUs. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
-oclFFT_plan is case sensitive. The oclFFT_plan will more than compensate it. It speeds up at least by 10% if set correctly. Try this for your multi GPU host. -use_sleep -unroll 10 -oclFFT_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 With each crime and every kindness we birth our future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ Well, it's more hard question :) But my own host sees this issue so I can proceed with more detail exploration. Corresponding picture posted here: http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227 |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ Dont get me wrong but what i`ve learned in the univercity 30 years ago is if you can`t reproduce same behaviour in different places under same conditions nothing is confirmed. With each crime and every kindness we birth our future. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
The oclFFT_plan will more than compensate it. Okay, thanks. -oclFFT_plan it will be on both. ;^) Now I just need to start getting AP tasks again. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ signal storage imlemented via STL's vector storage template so maybe some differencies in STL implementation DLLs installed on your host that avoid memory leak... maybe another reason. Will see. Currently running test case task with exactly same cmd line (but with ATi app on HD6950 card) with workset size logging into file each second. Will post resulting picture later. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
... The 30/30 end is not necessary, any task with 30 repetitive pulses reported might use more memory. Roughly, the memory usage is indicating how many pulses could be reported if there were no limit, since parallel processing is doing many parts of the processing at the same time. However, if the data pattern has enough variability to produce a lot of repetitive pulses it probably does tend to produce many single pulses too. It is not a new technique, the AP v6 apps do the same kind of FFA processing. There have been some additional options added, and because Mike's testing did not see the issue his recommended option settings may be too optimistic for some systems. And that observed difference may possibly lead to code improvements to avoid the issue on all systems, once it's understood. Raistmer's "Hence I need to keep all found pulses while GPU finds it and later to reorder them to match same serial sequence as CPU processing does." might possibly be modified so "later" comes sooner. IOW, once the logic has detected the limit has been reached that reordering could be done immediately and any extra memory used could be freed. For the common case where 30 rep pulses are found during the first FFA, there could be a very temporary large memory usage spike followed by low memory usage for the rest of the run time. That would not completely eliminate concern about two tasks simultaneously needing more than half of memory, but would at least reduce the likelihood of that happening. I don't know for sure this is practical, though. Joe |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ You forgot my sons PC it is not that optimized like mine. I have build it thats the only similarity. No memory leak too tho. With each crime and every kindness we birth our future. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
My logical way of thinking doesn`t allow to agree on this. Considiering this technique is not new and many volunteers are running AP`s for more than a year with high ffa_fetch values and we already had quite a few periods with lots overflow tasks and nobody got a issue til now ? Some are running 3 or more instances at a time and never complaint ? Coincidence ? I honestly dont think so. With each crime and every kindness we birth our future. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Just curiosity Mike your son´s computer uses AMD or Intel CPU? A long long not logical & totaly insane shoot, all the hosts i see with the issue are powered by low end Intels CPUs, Mike´s uses AMD CPU (at least the one listed by Boinc), could be possible a diference in the way the host deals with the memory be the source of the problem? Ok i know it´s wierd but i belive in witches. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Just curiosity Mike your son´s computer uses AMD or Intel CPU? Yes, AMD CPU. A very slow 5000+ With each crime and every kindness we birth our future. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Just curiosity Mike your son´s computer uses AMD or Intel CPU? So it´s another AMD without the issue. Could you try to run the WU on an Intel slow (old) CPU like ours? That could explain why you don´t have the issue and we all have. Or maybe Raistmer who have AMD CPU could try the opositive? I know have almost no sense what i sugest. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.