Lunatics Windows Installer v0.43 Release Notes

Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1591397 - Posted: 24 Oct 2014, 15:23:47 UTC - in response to Message 1591396.  
Last modified: 24 Oct 2014, 15:24:13 UTC

I have lowered my ffa_block and -ffa_block_fetch to what I had before on AP v6.

Those that fix the problem in your host too? Just curiosity.
ID: 1591397 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1591433 - Posted: 24 Oct 2014, 16:30:19 UTC - in response to Message 1591431.  
Last modified: 24 Oct 2014, 16:31:09 UTC

Nice, it´s about the same here i could live with a small memory hugg. So we have a way to partialy fix the problem.

So now the question to answer is: Why on Mike´s hosts the WU runs without any problem and in ours the memory hugging apears?

But that´s is for the dev team, maybe someday they found the answer.

Thanks for your help.
ID: 1591433 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1591438 - Posted: 24 Oct 2014, 16:36:53 UTC - in response to Message 1591431.  
Last modified: 24 Oct 2014, 16:48:28 UTC

I have lowered my ffa_block and -ffa_block_fetch to what I had before on AP v6.

Those that fix the problem in your host too? Just curiosity.


Well, I haven't had any 30/30 tasks yet after that, and those are the only WU's that shows this problem. (Like Raistmer says)
http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653#1590653

With the below settings as I run now, the WU's runs a bit slower, but I rather take that, than the memory hogging.

-unroll 16 -oclFFT_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 -instances_per_device 2


OK, with my new (old) ffa_block and -ffa_block_fetch settings, the next 30/30 task only went upp to 405 MB (virtual), 379 MB (memory). That I can live with, even if it might happen that the GPU runs two of those 30/30 tasks at the same time.

I seem to have one taking more memory than normal. Normally they run under 100mb, in fact, another one is running at 33v & 54real at the same time this other one is running at 339v & 359real. The one hogging is ap_21jn14aa_B4_P1_00143... It's running faster than normal, so, it must have a high count.
My settings on the Mac are -unroll 16 -oclFFT_plan 256 16 256 -ffa_block 3584 -ffa_block_fetch 1792 -sbs 256

And here it is, http://setiathome.berkeley.edu/result.php?resultid=3797431768
single pulses: 23
repetitive pulses: 30
percent blanked: 7.45
This task consumed ~10 times as much Memory as a low count task...in Yosemite.
ID: 1591438 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1591441 - Posted: 24 Oct 2014, 16:40:17 UTC - in response to Message 1591332.  

-oclFFT_plan is case sensitive.

Uh, oh. I actually just cut and pasted the recommendations exactly the way you had provided them over in The GTX750(Ti) Thread for my two boxes, one being
-use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1

and the other being
-use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1


Could that be why I've been seeing about a 15%-40% increase in my AP run times on those boxes? (Huge decrease in CPU times, though.)
ID: 1591441 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1591443 - Posted: 24 Oct 2014, 16:41:26 UTC - in response to Message 1591332.  
Last modified: 24 Oct 2014, 17:02:59 UTC

I think this is host dependend.

I belive is not, why? it´s happening on a totaly diferent OS & GPU, the same happening on a 2x690 who runs Win Server. The only thing they have in common is they use the same old I5 CPU.

After few test i made here remotely, i could confirm the problem is aparently related to the size of the -ffa_block, a 4096 block makes some WU ussing 150MB, larger sizes huge increase the memory usage, a 8192 makes the WU uses about 250MB and so on, the 1GB apears when you use a 16k block size.

For some reason who i cant understand until now, my hosts simply apears to ignore the -oclfft_plan 256 16 256 switch.


-oclFFT_plan is case sensitive.
Make sure FFT is upper case or just snip it from the read me.
-use_sleep -unroll 16 -oclFFT_plan 256 16 256 -ffa_fetch 8192 -ffa_fetch_block 4096-tune 1 64 4 1 -tune 2 64 4 1


You don't follow your own advice ;)
You often make the typo "ffa_fetch"/"ffa_fetch_block" instead of ffa_block/ffa_block_fetch
(I can't believe you type such a long line of switches, why not copy from your working ap_cmdline*.txt file? (and just change numbers if needed))

But if app ignores -oclfft_plan it will probably safely ignore also ffa_fetch/ffa_fetch_block

?
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1591443 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1591451 - Posted: 24 Oct 2014, 16:58:56 UTC - in response to Message 1591443.  

I think this is host dependend.

I belive is not, why? it´s happening on a totaly diferent OS & GPU, the same happening on a 2x690 who runs Win Server. The only thing they have in common is they use the same old I5 CPU.

After few test i made here remotely, i could confirm the problem is aparently related to the size of the -ffa_block, a 4096 block makes some WU ussing 150MB, larger sizes huge increase the memory usage, a 8192 makes the WU uses about 250MB and so on, the 1GB apears when you use a 16k block size.

For some reason who i cant understand until now, my hosts simply apears to ignore the -oclfft_plan 256 16 256 switch.


-oclFFT_plan is case sensitive.
Make sure FFT is upper case or just snip it from the read me.
-use_sleep -unroll 16 -oclFFT_plan 256 16 256 -ffa_fetch 8192 -ffa_fetch_block 4096-tune 1 64 4 1 -tune 2 64 4 1


You don't follow your own advice ;)
You often make the typo "ffa_fetch"/"ffa_fetch_block" instead of ffa_block/ffa_block_fetch

But if app ignores -oclfft_plan it will probably safely ignore also ffa_fetch/ffa_fetch_block

?


Yes, it seems i was a bit to tired for those long conversations.
I`m very sorry.


With each crime and every kindness we birth our future.
ID: 1591451 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1591452 - Posted: 24 Oct 2014, 17:01:44 UTC - in response to Message 1591390.  

Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/

http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653

No need to invent new entities w/o need.


I understand what you say.

Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great.


With each crime and every kindness we birth our future.
ID: 1591452 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1591460 - Posted: 24 Oct 2014, 17:33:17 UTC - in response to Message 1591441.  

-oclFFT_plan is case sensitive.

Uh, oh. I actually just cut and pasted the recommendations exactly the way you had provided them over in The GTX750(Ti) Thread for my two boxes, one being
-use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1

and the other being
-use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1


Could that be why I've been seeing about a 15%-40% increase in my AP run times on those boxes? (Huge decrease in CPU times, though.)


The -use_sleep command will cause the time to increase, but will allow you to use the CPU that is normally dedicated to the app for something else.

ID: 1591460 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1591476 - Posted: 24 Oct 2014, 18:20:49 UTC - in response to Message 1591460.  

-oclFFT_plan is case sensitive.

Uh, oh. I actually just cut and pasted the recommendations exactly the way you had provided them over in The GTX750(Ti) Thread for my two boxes, one being
-use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1

and the other being
-use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1


Could that be why I've been seeing about a 15%-40% increase in my AP run times on those boxes? (Huge decrease in CPU times, though.)


The -use_sleep command will cause the time to increase, but will allow you to use the CPU that is normally dedicated to the app for something else.

Well, since the GPUs are the real workhorses, I'm afraid the modest gains in CPU availability won't offset the large increase in run times on the GPUs.
ID: 1591476 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1591537 - Posted: 24 Oct 2014, 20:56:17 UTC - in response to Message 1591460.  
Last modified: 24 Oct 2014, 20:57:08 UTC

-oclFFT_plan is case sensitive.

Uh, oh. I actually just cut and pasted the recommendations exactly the way you had provided them over in The GTX750(Ti) Thread for my two boxes, one being
-use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1

and the other being
-use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1


Could that be why I've been seeing about a 15%-40% increase in my AP run times on those boxes? (Huge decrease in CPU times, though.)


The -use_sleep command will cause the time to increase, but will allow you to use the CPU that is normally dedicated to the app for something else.


The oclFFT_plan will more than compensate it.
It speeds up at least by 10% if set correctly.
Try this for your multi GPU host.

-use_sleep -unroll 10 -oclFFT_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1


With each crime and every kindness we birth our future.
ID: 1591537 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1591542 - Posted: 24 Oct 2014, 21:10:55 UTC - in response to Message 1591452.  

Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/

http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653

No need to invent new entities w/o need.


I understand what you say.

Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great.

Well, it's more hard question :)

But my own host sees this issue so I can proceed with more detail exploration.
Corresponding picture posted here:
http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227
ID: 1591542 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1591545 - Posted: 24 Oct 2014, 21:16:45 UTC - in response to Message 1591542.  

Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/

http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653

No need to invent new entities w/o need.


I understand what you say.

Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great.

Well, it's more hard question :)

But my own host sees this issue so I can proceed with more detail exploration.
Corresponding picture posted here:
http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227


Dont get me wrong but what i`ve learned in the univercity 30 years ago is if you can`t reproduce same behaviour in different places under same conditions nothing is confirmed.


With each crime and every kindness we birth our future.
ID: 1591545 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1591551 - Posted: 24 Oct 2014, 21:20:43 UTC - in response to Message 1591537.  

The oclFFT_plan will more than compensate it.
It speeds up at least by 10% if set correctly.
Try this for your multi GPU host.

-use_sleep -unroll 10 -oclFFT_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1

Okay, thanks. -oclFFT_plan it will be on both. ;^)

Now I just need to start getting AP tasks again.
ID: 1591551 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1591559 - Posted: 24 Oct 2014, 21:34:25 UTC - in response to Message 1591545.  

Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/

http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653

No need to invent new entities w/o need.


I understand what you say.

Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great.

Well, it's more hard question :)

But my own host sees this issue so I can proceed with more detail exploration.
Corresponding picture posted here:
http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227


Dont get me wrong but what i`ve learned in the univercity 30 years ago is if you can`t reproduce same behaviour in different places under same conditions nothing is confirmed.


signal storage imlemented via STL's vector storage template so maybe some differencies in STL implementation DLLs installed on your host that avoid memory leak... maybe another reason. Will see. Currently running test case task with exactly same cmd line (but with ATi app on HD6950 card) with workset size logging into file each second.
Will post resulting picture later.
ID: 1591559 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1591560 - Posted: 24 Oct 2014, 21:37:13 UTC - in response to Message 1591444.  

...
Yup it seems as if all tasks ending with 30/30 will take more memory than a normal task. That is as I understand it the result of Raistmers new technique in AP v7 to store found pulses inside FFA:
http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653#1590653

The 30/30 end is not necessary, any task with 30 repetitive pulses reported might use more memory. Roughly, the memory usage is indicating how many pulses could be reported if there were no limit, since parallel processing is doing many parts of the processing at the same time. However, if the data pattern has enough variability to produce a lot of repetitive pulses it probably does tend to produce many single pulses too.

It is not a new technique, the AP v6 apps do the same kind of FFA processing. There have been some additional options added, and because Mike's testing did not see the issue his recommended option settings may be too optimistic for some systems. And that observed difference may possibly lead to code improvements to avoid the issue on all systems, once it's understood.

Raistmer's "Hence I need to keep all found pulses while GPU finds it and later to reorder them to match same serial sequence as CPU processing does." might possibly be modified so "later" comes sooner. IOW, once the logic has detected the limit has been reached that reordering could be done immediately and any extra memory used could be freed. For the common case where 30 rep pulses are found during the first FFA, there could be a very temporary large memory usage spike followed by low memory usage for the rest of the run time. That would not completely eliminate concern about two tasks simultaneously needing more than half of memory, but would at least reduce the likelihood of that happening. I don't know for sure this is practical, though.
                                                                  Joe
ID: 1591560 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1591561 - Posted: 24 Oct 2014, 21:38:13 UTC - in response to Message 1591559.  
Last modified: 24 Oct 2014, 21:38:35 UTC

Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/

http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653

No need to invent new entities w/o need.


I understand what you say.

Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great.

Well, it's more hard question :)

But my own host sees this issue so I can proceed with more detail exploration.
Corresponding picture posted here:
http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227


Dont get me wrong but what i`ve learned in the univercity 30 years ago is if you can`t reproduce same behaviour in different places under same conditions nothing is confirmed.


signal storage imlemented via STL's vector storage template so maybe some differencies in STL implementation DLLs installed on your host that avoid memory leak... maybe another reason. Will see. Currently running test case task with exactly same cmd line (but with ATi app on HD6950 card) with workset size logging into file each second.
Will post resulting picture later.


You forgot my sons PC it is not that optimized like mine.
I have build it thats the only similarity.
No memory leak too tho.


With each crime and every kindness we birth our future.
ID: 1591561 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1591578 - Posted: 24 Oct 2014, 22:12:41 UTC
Last modified: 24 Oct 2014, 22:13:39 UTC

My logical way of thinking doesn`t allow to agree on this.
Considiering this technique is not new and many volunteers are running AP`s for more than a year with high ffa_fetch values and we already had quite a few periods with lots overflow tasks and nobody got a issue til now ?
Some are running 3 or more instances at a time and never complaint ?
Coincidence ?

I honestly dont think so.


With each crime and every kindness we birth our future.
ID: 1591578 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1591586 - Posted: 24 Oct 2014, 22:23:44 UTC
Last modified: 24 Oct 2014, 22:25:07 UTC

Just curiosity Mike your son´s computer uses AMD or Intel CPU?

A long long not logical & totaly insane shoot, all the hosts i see with the issue are powered by low end Intels CPUs, Mike´s uses AMD CPU (at least the one listed by Boinc), could be possible a diference in the way the host deals with the memory be the source of the problem? Ok i know it´s wierd but i belive in witches.
ID: 1591586 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1591588 - Posted: 24 Oct 2014, 22:28:03 UTC - in response to Message 1591586.  

Just curiosity Mike your son´s computer uses AMD or Intel CPU?

A long long not logical & totaly insane shoot, all the hosts i see with the issue are powered by low end Intels CPUs, Mike´s uses AMD CPU (at least the one listed by Boinc), could be possible a diference in the way the host deals with the memory be the source of the problem? Ok i know it´s wierd but i belive in witches.


Yes, AMD CPU.
A very slow 5000+


With each crime and every kindness we birth our future.
ID: 1591588 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1591590 - Posted: 24 Oct 2014, 22:35:36 UTC - in response to Message 1591588.  

Just curiosity Mike your son´s computer uses AMD or Intel CPU?

A long long not logical & totaly insane shoot, all the hosts i see with the issue are powered by low end Intels CPUs, Mike´s uses AMD CPU (at least the one listed by Boinc), could be possible a diference in the way the host deals with the memory be the source of the problem? Ok i know it´s wierd but i belive in witches.


Yes, AMD CPU.
A very slow 5000+

So it´s another AMD without the issue.

Could you try to run the WU on an Intel slow (old) CPU like ours? That could explain why you don´t have the issue and we all have.

Or maybe Raistmer who have AMD CPU could try the opositive?

I know have almost no sense what i sugest.
ID: 1591590 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.