Posts by Raistmer


log in
1) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1591559)
Posted 1 day ago by Profile Raistmer
Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/

http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653

No need to invent new entities w/o need.


I understand what you say.

Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great.

Well, it's more hard question :)

But my own host sees this issue so I can proceed with more detail exploration.
Corresponding picture posted here:
http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227


Dont get me wrong but what i`ve learned in the univercity 30 years ago is if you can`t reproduce same behaviour in different places under same conditions nothing is confirmed.


signal storage imlemented via STL's vector storage template so maybe some differencies in STL implementation DLLs installed on your host that avoid memory leak... maybe another reason. Will see. Currently running test case task with exactly same cmd line (but with ATi app on HD6950 card) with workset size logging into file each second.
Will post resulting picture later.
2) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1591542)
Posted 1 day ago by Profile Raistmer
Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/

http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653

No need to invent new entities w/o need.


I understand what you say.

Now if you could explain why it doesn`t happen neither on my nor on my sons host would be great.

Well, it's more hard question :)

But my own host sees this issue so I can proceed with more detail exploration.
Corresponding picture posted here:
http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57227.html;topicseen#msg57227
3) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1591390)
Posted 1 day ago by Profile Raistmer
Hm... Looks like no matter if I write in English or in Russian... nobody cares to read anyway :/

http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1590653

No need to invent new entities w/o need.
4) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1591330)
Posted 1 day ago by Profile Raistmer
-oclfft_plan 256 16 256 switch.

yep, as I said earlier there is no confirmation from app in log (stderr) that switch was recognized.
5) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590968)
Posted 2 days ago by Profile Raistmer

Like i said earlier i watched closely while benching Juans task i had 104K constantly during this bench and it is overflow task.
I think this is host dependend.


With exactly same command line as he provided?
I configured memory working set logger via perfmon (and learnt smth new , LoL) so now log into file current working set each 10 seconds (configurable).

Will run his workunit later with same command line (but with ATi app cause NV card can't accept same command line).
6) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590935)
Posted 2 days ago by Profile Raistmer
Maybe this is what you look for:

http://setiathome.berkeley.edu/forum_thread.php?id=75928&postid=1590689


process explorer great tool indeed but I did not find logging there (logging to file I meant to watch after precess finished already).

EDIT: perhaps this: http://stackoverflow.com/questions/69332/tracking-cpu-and-memory-usage-per-process
7) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590900)
Posted 2 days ago by Profile Raistmer
In task manager you only see system memory not GPU memory consumption.

That´s exactly why this issue is wierd, why the main memory usage rises to that kind of level on a determinate WU while all others remains low?

When the system memory usage rises to that large number the system runs out of memory very fast.


yep, exactly system (OS, CPU.., not GPU) memory increase expected with issue I described earlier.


BTW, what tool can be used to log app's system memory consumption during its run (not final value but kinetics) ? I don't have time right now to sit before screen and watch where and how it rises so some logging tools would be appreciated.
8) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590895)
Posted 2 days ago by Profile Raistmer
BTW, I completed the oclFFT_plan tests on the nVidia 8800 GT. All the tests using the oclFFT_plan settings produced No Found Pulses even though the App Found Pulses without using the oclFFT_plan setting. I have all the Completed files.

Anyone want the Files? Or the results posted?


-oclFFT_plan settings placed in advanced options category not w/o reason.
Only few possible combinations work correctly and we still finding why and wich.

So either do thorough offline tests before using live or stay with Mike's recommended combo that known to be faster and safe enough.

If you would collect your results in some table with clear designation what GPU what combo worked correctly what produced invalid results and what failed it would be appreciated.

So far I collected such table for Loveland APU (C-60) and will post it on Lunatics.
9) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590890)
Posted 2 days ago by Profile Raistmer
In task manager you only see system memory not GPU memory consumption.

That´s exactly why this issue is wierd, why the main memory usage rises to that kind of level on a determinate WU while all others remains low?

When the system memory usage rises to that large number the system runs out of memory very fast.


yep, exactly system (OS, CPU.., not GPU) memory increase expected with issue I described earlier.
10) Message boards : Number crunching : Perhaps my 7th wingman will be the charm! (or maybe the 8th) (Message 1590887)
Posted 2 days ago by Profile Raistmer
And you are lucky cause last valid result could be invalid with easy:

OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9)

It's unsupported SDK version.

Unsupported for "Windows x86 rev 1832, V6 match, by Raistmer" build? I've been assuming your switch to the newer SDK took place after that build.

The host is of course erroring on all AP v7 tasks, and the user will soon need to update it with a later Catalyst version to have it remain productive. I did send a PM.
Joe


Hm, good question. There were some v6 build with SDK 2.6 but w/o code that blocks execution in case of SDK 2.4 (like in v7). Can't recall in what range falls 1832 though.
11) Message boards : Number crunching : Perhaps my 7th wingman will be the charm! (or maybe the 8th) (Message 1590658)
Posted 2 days ago by Profile Raistmer
And you are lucky cause last valid result could be invalid with easy:

OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9)

It's unsupported SDK version.
12) Message boards : Number crunching : Random Restarts of Astropulses (Message 1590656)
Posted 2 days ago by Profile Raistmer
FFA thread block override value:16384
FFA thread fetchblock override value:8192


I would not recommend to use such big values for -ffa_block until it's proved for particular host that 16k threads perform markedly faster than let say 8k threads.

Though even 16k threads can underload modern high-end GPU there could be another issue discussed here: http://setiathome.berkeley.edu/forum_thread.php?id=75863&postid=1589263 (and few posts below that).

Excessive memory usage can lead to crash with out of memory condition, especially if many GPU tasks in fly and few of them have overlow result.
13) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590653)
Posted 2 days ago by Profile Raistmer
Well, all this supports guess that increased memory consumption caused by found signals storage space inside FFA.
There was same issue before when size of array of signals was unrestricted. Now issue should be smaller cause each thread can keep only 30 signals.
But with big -ffa_block numbers that define number of threads (hence number of separate signal storages) it seems this issue shows itself again.

If so one could use smaller number in -ffa_block param.
Also, this issue should correlate with number of found repetitive signals. Please check that all tasks with increased memory consumption have 30 rep pulses in log.

Unfortunately this issue arised from the way validator handles overflows for CPU and GPU results.
Currently CPU result (even overflow) must match GPU result. But GPU processes data in different order. It doesn't matter when ALL data processed, but when we have early exit data processing order does matter (and this was issue in APv6 builds - increased number of inconclusives between CPU and GPU overlows).

Hence I need to keep all found pulses while GPU finds it and later to reorder them to match same serial sequence as CPU processing does. With many pulses per thread and many threads that storage space grows. I'll check if there is some additional memory leak or not but afraid this isn't fixable for now w/o changes in handling of overflows by validator.
14) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590333)
Posted 3 days ago by Profile Raistmer
@ Raistmer

I made some more tests and the problem is aparently related to the -oclfft_plan switch, when you run without the switch the problem dissapears.

The same problem apears on others hosts with the same GPU (780FTW) all running Win7 and i see at least one example on a 2x690 host who runs Win Server and on a 670 hosts with win 7.


No signs that that option was acknowledged by app...

Running on device number: 1
Sleep() & wait for event loops will be used in some places
DATA_CHUNK_UNROLL set to:12
FFA thread block override value:16384
FFA thread fetchblock override value:8192
TUNE: kernel 1 now has workgroup size of (64,4,1)
TUNE: kernel 2 now has workgroup size of (64,4,1)
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
BOINC assigns device 1
Info: BOINC provided OpenCL device ID used
Used GPU device parameters are:
Number of compute units: 8
Single buffer allocation size: 256MB
Total device global memory: 2048MB
max WG size: 1024
local mem type: Real
FERMI path used: yes
15) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590162)
Posted 3 days ago by Profile Raistmer
thanks, got them.

@juan BFB - Thanks, no need. I got them from Richard's link already.
16) Message boards : Number crunching : Oddly long AP7 wu's? (Message 1590159)
Posted 3 days ago by Profile Raistmer
either free CPU core or use -cpu_lock in command line.
17) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590154)
Posted 3 days ago by Profile Raistmer
I have one who uses more than 1 GB.

SETI@home 7.05 astropulse_v7 (opencl_nvidia_100) ap_26jn14aa_B6_P0_00184_20141017_17161.wu_0 00:21:28 (00:00:10) 0,83 61,261 00:43:27 12/11/2014 07:01:51 0,1C + 1NV Suspended by user Juan 2

Command Line:

-use_sleep -unroll 16 -oclfft_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1

it´s a 2x780FTW host

What else you need?

One observation, the core usage rises to allmost 100% when normaly is very low.

<edit> another one from the same host, suspended it for now (this one uexes 1. GB)

SETI@home 7.05 astropulse_v7 (opencl_nvidia_100) ap_28jn14aa_B1_P1_00131_20141017_13023.wu_1 00:00:02 (-) 24,00 0,044 01:22:51 12/11/2014 06:46:22 0,1C + 1NV Suspended by user Juan 2


I need task itself (binary or link to download binary).


One observation, the core usage rises to allmost 100% when normaly is very low.

yep, it correlates with my current suspiction where it could be. I expected this issue diminished in APv7 though so need test case to explore.
18) Message boards : Number crunching : Lunatics Windows Installer v0.43 Release Notes (Message 1590146)
Posted 3 days ago by Profile Raistmer

This is a capture I did yesterday on the PC with only 4GB, normally there's no problem, but with the app taking 1.4GB I have a problem ;)




Thinks t might have something to do with the command line.

On the 4GB PC http://setiathome.berkeley.edu/show_host_detail.php?hostid=7283519 I had the problem with this command line

-use_sleep -unroll 16 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 8 1 -tune 2 64 8 1

After changing it to

-unroll 16 -ffa_block 10240 -ffa_block_fetch 5120 -use_sleep -tune 1 64 8 1 -tune 2 64 8 1

The problem seems to have gone away.


Next time one will see such excessive system memory consumption please catch that particular task and send it to me along with command line options used in task processing.
19) Message boards : Number crunching : Android errors (Message 1587797)
Posted 9 days ago by Profile Raistmer
1) What Android version do you use?
2) Try these optimized apps instead:
http://nativeboinc.org/site/uncat/downloads
http://files.nativeboinc.org/boinc_apps/setiathome-distrib-0.2.zip

Also, I would recomment to try NativeBOINC from that site. Maybe you could find it more convenient to use than stock one.
20) Message boards : Number crunching : AP V7 (Message 1587784)
Posted 9 days ago by Profile Raistmer
I can confirm this behaviour on both AMD and NV GPUs - when running a mix of AP and MB on the same GPU, the AP task will run faster than when running AP tasks only, and the MB task will run more slowly than when running MB tasks only.


Oh, I wasn't complaining, just making a statement.



And the WU running with it did NOT take much longer than normal, probably due to the short run time of the AP


Thanks for observations. So far I collect info mostly for AP+AP configs.
Info for for APv7 tasks collected up to now is available here:

http://lunatics.kwsn.net/1-discussion-forum/astropulse-v7-performance-illustration.0.html

AP+MB or interactions with other GPU apps from another projects were out of scope so far. Maybe more info will be collected in time.


Next 20

Copyright © 2014 University of California