Lunatics Windows Installer v0.43 Release Notes

Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

AuthorMessage
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1590327 - Posted: 22 Oct 2014, 20:05:47 UTC - in response to Message 1590312.  

@ Raistmer

I made some more tests and the problem is aparently related to the -oclfft_plan switch, when you run without the switch the problem dissapears.

The same problem apears on others hosts with the same GPU (780FTW) all running Win7 and i see at least one example on a 2x690 host who runs Win Server and on a 670 hosts with win 7.


Try running oclfft_plan without ffa_fetch ffa_fetch_block and see how this works Juan.
I guess the timings are to high on multi GPU hosts.


With each crime and every kindness we birth our future.
ID: 1590327 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1590333 - Posted: 22 Oct 2014, 20:19:54 UTC - in response to Message 1590312.  

@ Raistmer

I made some more tests and the problem is aparently related to the -oclfft_plan switch, when you run without the switch the problem dissapears.

The same problem apears on others hosts with the same GPU (780FTW) all running Win7 and i see at least one example on a 2x690 host who runs Win Server and on a 670 hosts with win 7.


No signs that that option was acknowledged by app...

Running on device number: 1
Sleep() & wait for event loops will be used in some places
DATA_CHUNK_UNROLL set to:12
FFA thread block override value:16384
FFA thread fetchblock override value:8192
TUNE: kernel 1 now has workgroup size of (64,4,1)
TUNE: kernel 2 now has workgroup size of (64,4,1)
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
BOINC assigns device 1
Info: BOINC provided OpenCL device ID used
Used GPU device parameters are:
Number of compute units: 8
Single buffer allocation size: 256MB
Total device global memory: 2048MB
max WG size: 1024
local mem type: Real
FERMI path used: yes
ID: 1590333 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1590362 - Posted: 22 Oct 2014, 20:47:36 UTC - in response to Message 1590333.  
Last modified: 22 Oct 2014, 20:55:53 UTC

I just put ro run the app without the switch and watch the memory usage by taskmgr and the memory usage stay bellow 100MB when run without the switch.

But i was wrong, i just see now 1 WU who uses >1GB without the switch enabled, so i will try Mike sugestion. Just one point noticed, without the siwtch the core usage allmost dropto zero.

@Mike
Trying now back ASAP
ID: 1590362 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1590412 - Posted: 22 Oct 2014, 22:14:44 UTC - in response to Message 1590362.  
Last modified: 22 Oct 2014, 22:15:14 UTC

@ Mike

I do what you sugest and leave the hosts working, not see any > 1GB memory usage for now, but see some in the range of 200MB to 400MB (high normal memory usage is <100MB), will leave running for a while to get more data. I leave one of the hosts only with -use_sleep -unroll 16 taking out all the other switches.
ID: 1590412 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1590652 - Posted: 23 Oct 2014, 10:53:46 UTC

I leave the hosts running all night, the problem aparently only apears when both switches are enabled.

So i remove the -ffa_block 16384 -ffa_block_fetch 8192 and set the command line on the 780 hosts to:

-use_sleep -unroll 16 -oclfft_plan 256 16 256 -tune 1 64 4 1 -tune 2 64 4 1

and the same with -unroll 12 on the 670/690 hosts.

Not see any 1 GB memory usage Wu anymore will keep and eye on that.
ID: 1590652 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1590653 - Posted: 23 Oct 2014, 11:18:34 UTC
Last modified: 23 Oct 2014, 11:20:05 UTC

Well, all this supports guess that increased memory consumption caused by found signals storage space inside FFA.
There was same issue before when size of array of signals was unrestricted. Now issue should be smaller cause each thread can keep only 30 signals.
But with big -ffa_block numbers that define number of threads (hence number of separate signal storages) it seems this issue shows itself again.

If so one could use smaller number in -ffa_block param.
Also, this issue should correlate with number of found repetitive signals. Please check that all tasks with increased memory consumption have 30 rep pulses in log.

Unfortunately this issue arised from the way validator handles overflows for CPU and GPU results.
Currently CPU result (even overflow) must match GPU result. But GPU processes data in different order. It doesn't matter when ALL data processed, but when we have early exit data processing order does matter (and this was issue in APv6 builds - increased number of inconclusives between CPU and GPU overlows).

Hence I need to keep all found pulses while GPU finds it and later to reorder them to match same serial sequence as CPU processing does. With many pulses per thread and many threads that storage space grows. I'll check if there is some additional memory leak or not but afraid this isn't fixable for now w/o changes in handling of overflows by validator.
ID: 1590653 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1590677 - Posted: 23 Oct 2014, 13:01:27 UTC - in response to Message 1590134.  

IMHO there is a small bug in the installer - APv7_r2559_SSE3_OSX64.zip

/Files_to_Install/apv7_install.command

in line 24:

change sed s/sahv7_install.command//
to sed s/apv7_install.command//

Many thanks to the team.

Jürgen

Thanks for noting. Will get corrected ...
_\|/_
U r s
ID: 1590677 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1590686 - Posted: 23 Oct 2014, 13:18:24 UTC - in response to Message 1590653.  

Well, all this supports guess that increased memory consumption caused by found signals storage space inside FFA.
There was same issue before when size of array of signals was unrestricted. Now issue should be smaller cause each thread can keep only 30 signals.
But with big -ffa_block numbers that define number of threads (hence number of separate signal storages) it seems this issue shows itself again.

If so one could use smaller number in -ffa_block param.
Also, this issue should correlate with number of found repetitive signals. Please check that all tasks with increased memory consumption have 30 rep pulses in log.

Unfortunately this issue arised from the way validator handles overflows for CPU and GPU results.
Currently CPU result (even overflow) must match GPU result. But GPU processes data in different order. It doesn't matter when ALL data processed, but when we have early exit data processing order does matter (and this was issue in APv6 builds - increased number of inconclusives between CPU and GPU overlows).

Hence I need to keep all found pulses while GPU finds it and later to reorder them to match same serial sequence as CPU processing does. With many pulses per thread and many threads that storage space grows. I'll check if there is some additional memory leak or not but afraid this isn't fixable for now w/o changes in handling of overflows by validator.


I did run both units Juan posted.

ap_26jn14aa_B6_P0_00184_20141017_17161.wu is a 30/30 and memory consumption didn`t go over 104K for one second.
Of course memory consumption on GPU went up to 406K because of high ffa_fetch values.
But thats no big deal with 3GB GPU.


With each crime and every kindness we birth our future.
ID: 1590686 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1590698 - Posted: 23 Oct 2014, 13:45:42 UTC - in response to Message 1590686.  

I did run both units Juan posted.

ap_26jn14aa_B6_P0_00184_20141017_17161.wu is a 30/30 and memory consumption didn`t go over 104K for one second.
Of course memory consumption on GPU went up to 406K because of high ffa_fetch values.
But thats no big deal with 3GB GPU.

I have no ideia about how the program itself works, but what is increasing is the computer main memory (up to about 1.7 GB by the last WU i see) who´s normaly apears in the range of 50-100Mb on the taskmgr list.
ID: 1590698 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1590701 - Posted: 23 Oct 2014, 13:50:05 UTC - in response to Message 1590698.  

I did run both units Juan posted.

ap_26jn14aa_B6_P0_00184_20141017_17161.wu is a 30/30 and memory consumption didn`t go over 104K for one second.
Of course memory consumption on GPU went up to 406K because of high ffa_fetch values.
But thats no big deal with 3GB GPU.

I have no ideia about how the program itself works, but what is increasing is the computer main memory (up to about 1.7 GB by the last WU i see) who´s normaly apears in the range of 50-100Mb on the taskmgr list.


In task manager you only see system memory not GPU memory consumption.


With each crime and every kindness we birth our future.
ID: 1590701 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1590715 - Posted: 23 Oct 2014, 14:14:13 UTC - in response to Message 1590701.  

In task manager you only see system memory not GPU memory consumption.

That´s exactly why this issue is wierd, why the main memory usage rises to that kind of level on a determinate WU while all others remains low?

When the system memory usage rises to that large number the system runs out of memory very fast.
ID: 1590715 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1590718 - Posted: 23 Oct 2014, 14:22:14 UTC - in response to Message 1590686.  

BTW, I completed the oclFFT_plan tests on the nVidia 8800 GT. All the tests using the oclFFT_plan settings produced No Found Pulses even though the App Found Pulses without using the oclFFT_plan setting. I have all the Completed files.

Anyone want the Files? Or the results posted?
ID: 1590718 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1590727 - Posted: 23 Oct 2014, 14:33:58 UTC - in response to Message 1590718.  

BTW, I completed the oclFFT_plan tests on the nVidia 8800 GT. All the tests using the oclFFT_plan settings produced No Found Pulses even though the App Found Pulses without using the oclFFT_plan setting. I have all the Completed files.

Anyone want the Files? Or the results posted?


What i have seen told me already that oclfft_plan doesn`t work with the 8800GT.


With each crime and every kindness we birth our future.
ID: 1590727 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1590733 - Posted: 23 Oct 2014, 14:41:03 UTC - in response to Message 1590727.  

BTW, I completed the oclFFT_plan tests on the nVidia 8800 GT. All the tests using the oclFFT_plan settings produced No Found Pulses even though the App Found Pulses without using the oclFFT_plan setting. I have all the Completed files.

Anyone want the Files? Or the results posted?


What i have seen told me already that oclfft_plan doesn`t work with the 8800GT.

Yep. You just have to wonder what other cards/drivers don't work...
ID: 1590733 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1590738 - Posted: 23 Oct 2014, 14:43:52 UTC - in response to Message 1590733.  

BTW, I completed the oclFFT_plan tests on the nVidia 8800 GT. All the tests using the oclFFT_plan settings produced No Found Pulses even though the App Found Pulses without using the oclFFT_plan setting. I have all the Completed files.

Anyone want the Files? Or the results posted?


What i have seen told me already that oclfft_plan doesn`t work with the 8800GT.

Yep. You just have to wonder what other cards/drivers don't work...


I have tested 11 different NV GPU`s.
Everything Fermi and up should work.
Not sure about 2xx cards.


With each crime and every kindness we birth our future.
ID: 1590738 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1590890 - Posted: 23 Oct 2014, 18:51:03 UTC - in response to Message 1590715.  

In task manager you only see system memory not GPU memory consumption.

That´s exactly why this issue is wierd, why the main memory usage rises to that kind of level on a determinate WU while all others remains low?

When the system memory usage rises to that large number the system runs out of memory very fast.


yep, exactly system (OS, CPU.., not GPU) memory increase expected with issue I described earlier.
ID: 1590890 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1590895 - Posted: 23 Oct 2014, 18:56:19 UTC - in response to Message 1590718.  
Last modified: 23 Oct 2014, 18:57:33 UTC

BTW, I completed the oclFFT_plan tests on the nVidia 8800 GT. All the tests using the oclFFT_plan settings produced No Found Pulses even though the App Found Pulses without using the oclFFT_plan setting. I have all the Completed files.

Anyone want the Files? Or the results posted?


-oclFFT_plan settings placed in advanced options category not w/o reason.
Only few possible combinations work correctly and we still finding why and wich.

So either do thorough offline tests before using live or stay with Mike's recommended combo that known to be faster and safe enough.

If you would collect your results in some table with clear designation what GPU what combo worked correctly what produced invalid results and what failed it would be appreciated.

So far I collected such table for Loveland APU (C-60) and will post it on Lunatics.
ID: 1590895 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1590900 - Posted: 23 Oct 2014, 19:03:30 UTC - in response to Message 1590890.  
Last modified: 23 Oct 2014, 19:04:06 UTC

In task manager you only see system memory not GPU memory consumption.

That´s exactly why this issue is wierd, why the main memory usage rises to that kind of level on a determinate WU while all others remains low?

When the system memory usage rises to that large number the system runs out of memory very fast.


yep, exactly system (OS, CPU.., not GPU) memory increase expected with issue I described earlier.


BTW, what tool can be used to log app's system memory consumption during its run (not final value but kinetics) ? I don't have time right now to sit before screen and watch where and how it rises so some logging tools would be appreciated.
ID: 1590900 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1590924 - Posted: 23 Oct 2014, 19:49:26 UTC - in response to Message 1590900.  

ID: 1590924 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1590935 - Posted: 23 Oct 2014, 20:09:39 UTC - in response to Message 1590924.  
Last modified: 23 Oct 2014, 20:27:14 UTC

Maybe this is what you look for:

http://setiathome.berkeley.edu/forum_thread.php?id=75928&postid=1590689


process explorer great tool indeed but I did not find logging there (logging to file I meant to watch after precess finished already).

EDIT: perhaps this: http://stackoverflow.com/questions/69332/tracking-cpu-and-memory-usage-per-process
ID: 1590935 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.