Lunatics Windows Installer v0.43 Release Notes

Author	Message
juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1591397 - Posted: 24 Oct 2014, 15:23:47 UTC - in response to Message 1591396. Last modified: 24 Oct 2014, 15:24:13 UTC I have lowered my ffa_block and -ffa_block_fetch to what I had before on AP v6. Those that fix the problem in your host too? Just curiosity. ID: 1591397 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1591433 - Posted: 24 Oct 2014, 16:30:19 UTC - in response to Message 1591431. Last modified: 24 Oct 2014, 16:31:09 UTC Nice, itÂ´s about the same here i could live with a small memory hugg. So we have a way to partialy fix the problem. So now the question to answer is: Why on MikeÂ´s hosts the WU runs without any problem and in ours the memory hugging apears? But thatÂ´s is for the dev team, maybe someday they found the answer. Thanks for your help. ID: 1591433 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1591438 - Posted: 24 Oct 2014, 16:36:53 UTC - in response to Message 1591431. Last modified: 24 Oct 2014, 16:48:28 UTC I have lowered my ffa_block and -ffa_block_fetch to what I had before on AP v6. Those that fix the problem in your host too? Just curiosity. Well, I haven't had any 30/30 tasks yet after that, and those are the only WU's that shows this problem. (Like Raistmer says) http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653#1590653 With the below settings as I run now, the WU's runs a bit slower, but I rather take that, than the memory hogging. -unroll 16 -oclFFT_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 -instances_per_device 2 OK, with my new (old) ffa_block and -ffa_block_fetch settings, the next 30/30 task only went upp to 405 MB (virtual), 379 MB (memory). That I can live with, even if it might happen that the GPU runs two of those 30/30 tasks at the same time. I seem to have one taking more memory than normal. Normally they run under 100mb, in fact, another one is running at 33v & 54real at the same time this other one is running at 339v & 359real. The one hogging is ap_21jn14aa_B4_P1_00143... It's running faster than normal, so, it must have a high count. My settings on the Mac are -unroll 16 -oclFFT_plan 256 16 256 -ffa_block 3584 -ffa_block_fetch 1792 -sbs 256 And here it is, http://setiathome.berkeley.edu/result.php?resultid=3797431768 single pulses: 23 repetitive pulses: 30 percent blanked: 7.45 This task consumed ~10 times as much Memory as a low count task...in Yosemite. ID: 1591438 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1591441 - Posted: 24 Oct 2014, 16:40:17 UTC - in response to Message 1591332. -oclFFT_plan is case sensitive. Uh, oh. I actually just cut and pasted the recommendations exactly the way you had provided them over in The GTX750(Ti) Thread for my two boxes, one being -use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 and the other being -use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 Could that be why I've been seeing about a 15%-40% increase in my AP run times on those boxes? (Huge decrease in CPU times, though.) ID: 1591441 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1591443 - Posted: 24 Oct 2014, 16:41:26 UTC - in response to Message 1591332. Last modified: 24 Oct 2014, 17:02:59 UTC I think this is host dependend. I belive is not, why? itÂ´s happening on a totaly diferent OS & GPU, the same happening on a 2x690 who runs Win Server. The only thing they have in common is they use the same old I5 CPU. After few test i made here remotely, i could confirm the problem is aparently related to the size of the -ffa_block, a 4096 block makes some WU ussing 150MB, larger sizes huge increase the memory usage, a 8192 makes the WU uses about 250MB and so on, the 1GB apears when you use a 16k block size. For some reason who i cant understand until now, my hosts simply apears to ignore the -oclfft_plan 256 16 256 switch. -oclFFT_plan is case sensitive. Make sure FFT is upper case or just snip it from the read me. -use_sleep -unroll 16 -oclFFT_plan 256 16 256 -ffa_fetch 8192 -ffa_fetch_block 4096-tune 1 64 4 1 -tune 2 64 4 1 You don't follow your own advice ;) You often make the typo "ffa_fetch"/"ffa_fetch_block" instead of ffa_block/ffa_block_fetch (I can't believe you type such a long line of switches, why not copy from your working ap_cmdline.txt file? (and just change numbers if needed)) But if app ignores -oclfft_plan it will probably safely ignore also ffa_fetch/ffa_fetch_block ? Â - ALF - "Find out what you don't do well ..... then don't do it!"* :) Â ID: 1591443 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1591451 - Posted: 24 Oct 2014, 16:58:56 UTC - in response to Message 1591443. I think this is host dependend. I belive is not, why? itÂ´s happening on a totaly diferent OS & GPU, the same happening on a 2x690 who runs Win Server. The only thing they have in common is they use the same old I5 CPU. After few test i made here remotely, i could confirm the problem is aparently related to the size of the -ffa_block, a 4096 block makes some WU ussing 150MB, larger sizes huge increase the memory usage, a 8192 makes the WU uses about 250MB and so on, the 1GB apears when you use a 16k block size. For some reason who i cant understand until now, my hosts simply apears to ignore the -oclfft_plan 256 16 256 switch. -oclFFT_plan is case sensitive. Make sure FFT is upper case or just snip it from the read me. -use_sleep -unroll 16 -oclFFT_plan 256 16 256 -ffa_fetch 8192 -ffa_fetch_block 4096-tune 1 64 4 1 -tune 2 64 4 1 You don't follow your own advice ;) You often make the typo "ffa_fetch"/"ffa_fetch_block" instead of ffa_block/ffa_block_fetch But if app ignores -oclfft_plan it will probably safely ignore also ffa_fetch/ffa_fetch_block ? Yes, it seems i was a bit to tired for those long conversations. I`m very sorry. With each crime and every kindness we birth our future. ID: 1591451 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1591452 - Posted: 24 Oct 2014, 17:01:44 UTC - in response to Message 1591390. Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653 No need to invent new entities w/o need. I understand what you say. Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great. With each crime and every kindness we birth our future. ID: 1591452 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1591460 - Posted: 24 Oct 2014, 17:33:17 UTC - in response to Message 1591441. -oclFFT_plan is case sensitive. Uh, oh. I actually just cut and pasted the recommendations exactly the way you had provided them over in The GTX750(Ti) Thread for my two boxes, one being -use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 and the other being -use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 Could that be why I've been seeing about a 15%-40% increase in my AP run times on those boxes? (Huge decrease in CPU times, though.) The -use_sleep command will cause the time to increase, but will allow you to use the CPU that is normally dedicated to the app for something else. ID: 1591460 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1591476 - Posted: 24 Oct 2014, 18:20:49 UTC - in response to Message 1591460. -oclFFT_plan is case sensitive. Uh, oh. I actually just cut and pasted the recommendations exactly the way you had provided them over in The GTX750(Ti) Thread for my two boxes, one being -use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 and the other being -use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 Could that be why I've been seeing about a 15%-40% increase in my AP run times on those boxes? (Huge decrease in CPU times, though.) The -use_sleep command will cause the time to increase, but will allow you to use the CPU that is normally dedicated to the app for something else. Well, since the GPUs are the real workhorses, I'm afraid the modest gains in CPU availability won't offset the large increase in run times on the GPUs. ID: 1591476 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1591537 - Posted: 24 Oct 2014, 20:56:17 UTC - in response to Message 1591460. Last modified: 24 Oct 2014, 20:57:08 UTC -oclFFT_plan is case sensitive. Uh, oh. I actually just cut and pasted the recommendations exactly the way you had provided them over in The GTX750(Ti) Thread for my two boxes, one being -use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 and the other being -use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 Could that be why I've been seeing about a 15%-40% increase in my AP run times on those boxes? (Huge decrease in CPU times, though.) The -use_sleep command will cause the time to increase, but will allow you to use the CPU that is normally dedicated to the app for something else. The oclFFT_plan will more than compensate it. It speeds up at least by 10% if set correctly. Try this for your multi GPU host. -use_sleep -unroll 10 -oclFFT_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 With each crime and every kindness we birth our future. ID: 1591537 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1591542 - Posted: 24 Oct 2014, 21:10:55 UTC - in response to Message 1591452. Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653 No need to invent new entities w/o need. I understand what you say. Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great. Well, it's more hard question :) But my own host sees this issue so I can proceed with more detail exploration. Corresponding picture posted here: http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227 ID: 1591542 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1591545 - Posted: 24 Oct 2014, 21:16:45 UTC - in response to Message 1591542. Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653 No need to invent new entities w/o need. I understand what you say. Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great. Well, it's more hard question :) But my own host sees this issue so I can proceed with more detail exploration. Corresponding picture posted here: http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227 Dont get me wrong but what i`ve learned in the univercity 30 years ago is if you can`t reproduce same behaviour in different places under same conditions nothing is confirmed. With each crime and every kindness we birth our future. ID: 1591545 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1591551 - Posted: 24 Oct 2014, 21:20:43 UTC - in response to Message 1591537. The oclFFT_plan will more than compensate it. It speeds up at least by 10% if set correctly. Try this for your multi GPU host. -use_sleep -unroll 10 -oclFFT_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 Okay, thanks. -oclFFT_plan it will be on both. ;^) Now I just need to start getting AP tasks again. ID: 1591551 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1591559 - Posted: 24 Oct 2014, 21:34:25 UTC - in response to Message 1591545. Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653 No need to invent new entities w/o need. I understand what you say. Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great. Well, it's more hard question :) But my own host sees this issue so I can proceed with more detail exploration. Corresponding picture posted here: http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227 Dont get me wrong but what i`ve learned in the univercity 30 years ago is if you can`t reproduce same behaviour in different places under same conditions nothing is confirmed. signal storage imlemented via STL's vector storage template so maybe some differencies in STL implementation DLLs installed on your host that avoid memory leak... maybe another reason. Will see. Currently running test case task with exactly same cmd line (but with ATi app on HD6950 card) with workset size logging into file each second. Will post resulting picture later. ID: 1591559 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1591560 - Posted: 24 Oct 2014, 21:37:13 UTC - in response to Message 1591444. ... Yup it seems as if all tasks ending with 30/30 will take more memory than a normal task. That is as I understand it the result of Raistmers new technique in AP v7 to store found pulses inside FFA: http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653#1590653 The 30/30 end is not necessary, any task with 30 repetitive pulses reported might use more memory. Roughly, the memory usage is indicating how many pulses could be reported if there were no limit, since parallel processing is doing many parts of the processing at the same time. However, if the data pattern has enough variability to produce a lot of repetitive pulses it probably does tend to produce many single pulses too. It is not a new technique, the AP v6 apps do the same kind of FFA processing. There have been some additional options added, and because Mike's testing did not see the issue his recommended option settings may be too optimistic for some systems. And that observed difference may possibly lead to code improvements to avoid the issue on all systems, once it's understood. Raistmer's "Hence I need to keep all found pulses while GPU finds it and later to reorder them to match same serial sequence as CPU processing does." might possibly be modified so "later" comes sooner. IOW, once the logic has detected the limit has been reached that reordering could be done immediately and any extra memory used could be freed. For the common case where 30 rep pulses are found during the first FFA, there could be a very temporary large memory usage spike followed by low memory usage for the rest of the run time. That would not completely eliminate concern about two tasks simultaneously needing more than half of memory, but would at least reduce the likelihood of that happening. I don't know for sure this is practical, though. Joe ID: 1591560 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1591561 - Posted: 24 Oct 2014, 21:38:13 UTC - in response to Message 1591559. Last modified: 24 Oct 2014, 21:38:35 UTC Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/ http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653 No need to invent new entities w/o need. I understand what you say. Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great. Well, it's more hard question :) But my own host sees this issue so I can proceed with more detail exploration. Corresponding picture posted here: http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227 Dont get me wrong but what i`ve learned in the univercity 30 years ago is if you can`t reproduce same behaviour in different places under same conditions nothing is confirmed. signal storage imlemented via STL's vector storage template so maybe some differencies in STL implementation DLLs installed on your host that avoid memory leak... maybe another reason. Will see. Currently running test case task with exactly same cmd line (but with ATi app on HD6950 card) with workset size logging into file each second. Will post resulting picture later. You forgot my sons PC it is not that optimized like mine. I have build it thats the only similarity. No memory leak too tho. With each crime and every kindness we birth our future. ID: 1591561 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1591578 - Posted: 24 Oct 2014, 22:12:41 UTC Last modified: 24 Oct 2014, 22:13:39 UTC My logical way of thinking doesn`t allow to agree on this. Considiering this technique is not new and many volunteers are running AP`s for more than a year with high ffa_fetch values and we already had quite a few periods with lots overflow tasks and nobody got a issue til now ? Some are running 3 or more instances at a time and never complaint ? Coincidence ? I honestly dont think so. With each crime and every kindness we birth our future. ID: 1591578 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1591586 - Posted: 24 Oct 2014, 22:23:44 UTC Last modified: 24 Oct 2014, 22:25:07 UTC Just curiosity Mike your sonÂ´s computer uses AMD or Intel CPU? A long long not logical & totaly insane shoot, all the hosts i see with the issue are powered by low end Intels CPUs, MikeÂ´s uses AMD CPU (at least the one listed by Boinc), could be possible a diference in the way the host deals with the memory be the source of the problem? Ok i know itÂ´s wierd but i belive in witches. ID: 1591586 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1591588 - Posted: 24 Oct 2014, 22:28:03 UTC - in response to Message 1591586. Just curiosity Mike your sonÂ´s computer uses AMD or Intel CPU? A long long not logical & totaly insane shoot, all the hosts i see with the issue are powered by low end Intels CPUs, MikeÂ´s uses AMD CPU (at least the one listed by Boinc), could be possible a diference in the way the host deals with the memory be the source of the problem? Ok i know itÂ´s wierd but i belive in witches. Yes, AMD CPU. A very slow 5000+ With each crime and every kindness we birth our future. ID: 1591588 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1591590 - Posted: 24 Oct 2014, 22:35:36 UTC - in response to Message 1591588. Just curiosity Mike your sonÂ´s computer uses AMD or Intel CPU? A long long not logical & totaly insane shoot, all the hosts i see with the issue are powered by low end Intels CPUs, MikeÂ´s uses AMD CPU (at least the one listed by Boinc), could be possible a diference in the way the host deals with the memory be the source of the problem? Ok i know itÂ´s wierd but i belive in witches. Yes, AMD CPU. A very slow 5000+ So itÂ´s another AMD without the issue. Could you try to run the WU on an Intel slow (old) CPU like ours? That could explain why you donÂ´t have the issue and we all have. Or maybe Raistmer who have AMD CPU could try the opositive? I know have almost no sense what i sugest. ID: 1591590 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.