SETI applications for NVIDIA GPU improvement - how you can help

Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 13 · Next

AuthorMessage
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2634
Credit: 48,263,453
RAC: 132,167
Australia
Message 1797342 - Posted: 19 Jun 2016, 9:58:51 UTC - in response to Message 1796590.  

It's also possible that you're breaking the disk resource limit set in each workunit:

<rsc_disk_bound>33554432.000000</rsc_disk_bound>

but that's 32 MB, so one heck of a long log.

The question is can that limit be increased?

Yes, it can, but you're venturing further into advanced territory. Be careful, and follow the instructions exactly. If anything doesn't immediately feel comfortable, back off and revert any changes.

If the tasks run for more than a day, they'll probably still fail. Rinse and repeat, this time multiplying the bound by 100. And so on.


. . Thanks Richard I will give that a try.
ID: 1797342 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2634
Credit: 48,263,453
RAC: 132,167
Australia
Message 1797343 - Posted: 19 Jun 2016, 10:00:12 UTC - in response to Message 1796839.  

I need to set up dropbox again, something went wrong with the first try.

You may also use:
http://www.zippyshare.com/

Note:
Since zippyshare uses some misleading [Download] buttons (ads) on the resulting pages you (all) may want to first add uBlock Origin to your browser:
https://chrome.google.com/webstore/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm

https://addons.mozilla.org/en-US/firefox/addon/ublock-origin/



. . Thanks for that I will investigate.
ID: 1797343 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5809
Credit: 76,067,204
RAC: 51,211
Russia
Message 1797611 - Posted: 20 Jun 2016, 20:02:51 UTC

For those who experienced GUI lags with stock build:
please try this one: https://cloud.mail.ru/public/GNQz/oRyyF1VQp it hopefully has usability improvements for those GPUs than left on the edge on prev release.
Also, it could be faster than current stock.

for example, this is the test of ATi siblings on my C-60:
WU : PG0009_v7.wu 
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3472.exe  :
  Elapsed 1173.279 secs 
      CPU 195.828 secs 
setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe  :
  Elapsed 1740.647 secs 
      CPU 304.810 secs 

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1797611 · Report as offensive     Reply Quote
Profile ZalsterProject Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 3992
Credit: 208,944,557
RAC: 49,145
United States
Message 1797639 - Posted: 20 Jun 2016, 22:38:02 UTC - in response to Message 1797611.  
Last modified: 20 Jun 2016, 22:38:45 UTC

Raistmer, you labelled the exe wrong.

with all previous versions it was OpenCl_NV_r34XX_SoG.exe

you changed this so it now reads OpenCl_NV_SoG_r34XX.exe

I didn't notice this when I downloaded it and install as I thought you had followed the usual nomenclature.

However, this cause a dumping all my current SoG work and errors listed in the event log.

I was able to trace the error and noticed the discrepancy.

Don't know if you might want to correct this before others try this and get all their work units dumped
ID: 1797639 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5809
Credit: 76,067,204
RAC: 51,211
Russia
Message 1797647 - Posted: 20 Jun 2016, 23:04:34 UTC - in response to Message 1797639.  
Last modified: 20 Jun 2016, 23:05:53 UTC

executable name and its link inside aistub are the same or different?
[If I will go to auto-creation of archive the rev number will be the last, just as in this build]
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1797647 · Report as offensive     Reply Quote
Profile ZalsterProject Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 3992
Credit: 208,944,557
RAC: 49,145
United States
Message 1797648 - Posted: 20 Jun 2016, 23:14:05 UTC - in response to Message 1797647.  

executable name and its link inside aistub are the same or different?
[If I will go to auto-creation of archive the rev number will be the last, just as in this build]


I correct the app_info with the changed revision number and then import everything from the zip folder.

I've never used the aistub before so can't comment on that.
ID: 1797648 · Report as offensive     Reply Quote
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11516
Credit: 106,226,084
RAC: 70,131
United Kingdom
Message 1797650 - Posted: 20 Jun 2016, 23:20:35 UTC

I noticed that as well, when upgrading from r3430 to r3472 for Beta3.

Like Zalster, I base the installer aistubs on my own previous work, rather than starting from scratch with the packaged aistub - but I've learned to do the search/replace on the complete file name, not just the revision number. Even so, it took me a while to work out why the display flickered more than usual when I did it...

MB8_win_x86_SSE3_OpenCL_NV_r3430_SoG.exe
MB8_win_x86_SSE3_OpenCL_NV_SoG_r3472.exe

But I've just checked the release package, and it is internally self-consistent.
ID: 1797650 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5809
Credit: 76,067,204
RAC: 51,211
Russia
Message 1797956 - Posted: 22 Jun 2016, 15:36:07 UTC
Last modified: 22 Jun 2016, 16:03:23 UTC

Lets explore -use_sleep_ex N option little more.

Here is ATI HD5 build for that: https://cloud.mail.ru/public/Lf64/zuZPHgv5y

It's "verbose" one so don't expect amazing performance from it but it can tell you smth new about your system.

Proposed usage: to add -use_sleep_ex 1 (or any other sleep time you want to explore) -v 6 (here 6 is mandatory cause exactly verbosity level 6 reserved for timing output).

What output looks like and what to observe:

starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=32768, deltaP3=547, NDRange={8,1,64}, WG={8,1,8},single_period_size=1.3MB, WG num=8, CU num=2
Partial PulseFind_3 (before buffer read): Awaited 36 iterations for completion
Kernel PULSE_PARTIAL execution time: 537.014(ms); min=537.014(ms); max=537.014(ms); mean=537.014(ms); sleep=1(ms); delta=547; Niterations=1

starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=16384, deltaP3=273, NDRange={16,1,32}, WG={16,1,4},single_period_size=1.3MB, WG num=8, CU num=2
Partial PulseFind_3 (before buffer read): Awaited 19 iterations for completion
Kernel PULSE_PARTIAL execution time: 279.959(ms); min=279.959(ms); max=279.959(ms); mean=279.959(ms); sleep=1(ms); delta=273; Niterations=1

starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=16384, deltaP3=272, NDRange={16,1,32}, WG={16,1,4},single_period_size=1.3MB, WG num=8, CU num=2
Partial PulseFind_3 (before buffer read): Awaited 18 iterations for completion
Kernel PULSE_PARTIAL execution time: 254.941(ms); min=254.941(ms); max=279.959(ms); mean=267.45(ms); sleep=1(ms); delta=272; Niterations=2

And so on.
For experiment I propose only let say first 10 such output items are needed.
I added basic profiling abilities so now app can print time spent for particular kernel (in this case - partial PulseFind one).
take "execution time" and divide it to "awaited iterations for completion".
You will get estimate of how long single iteration takes.

What I found so far on my C-60:
if -use_sleep_ex 0 used iteration time ~0.75us
So, Sleep(0)+ overhead from event handling low enough.

if -use_sleep_ex 1 used iteration time ~15ms (!)
So, Sleep(1) takes ~15ms instead of promised 1ms (!!!!)

You can imagine why -use_sleep that uses Sleep(1) has so big impact on performance on high-performanceGPU hosts.

To get adequate estimate make single kernel call last enough time to provide iterations count ~10 or more. As one can see if kernel run for let say 7us and still single iteration done one can get Sleep(1) time estimation ~7us instead of anything real. From other side if kernel takes (like in my example above) 537ms and 36 Sleep(1) iterations done for it 537/36~15 (ms) - estimation I got.

To increase single kernel call execution time decrease value of -period_iterations_num N option as usual.

For this example I used 10 for my C-60. More speedy cards will require value of 1 and artifical slowdown such as -sbs 48 or smth alike.


What to test:
how real sleep time will scale with increase of Sleep(N) N value?
how it will react on increased priority (-hp switch added to command line).

15 ms is very character time - its ~size of OS time quantum. With increased priority it can change cause process can preemt baing high-priority one - need to check.

Also, my host is AMD old generation APU. Maybe other families will handle Sleep(1) better?...
I did similar testing some time ago but there was no direct profiling info from kernel that time. Now we have such info right in stderr.

P.S. And, of course, use VLAR tasks cause VLAR has biggest PulseFind kernels.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1797956 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2634
Credit: 48,263,453
RAC: 132,167
Australia
Message 1797960 - Posted: 22 Jun 2016, 15:44:26 UTC - in response to Message 1797611.  

. . Hi Raistmer,

. . I think I have dropbox sorted out, here are links to the result text files.

https://www.dropbox.com/home?preview=stderr_local_slot+0.zip

https://www.dropbox.com/home?preview=stderr_trial2_last+WU+error.zip

https://www.dropbox.com/home?preview=stderr_trial2_WU2_48per.zip

. . I am slow but I think I might get everything working.
ID: 1797960 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5809
Credit: 76,067,204
RAC: 51,211
Russia
Message 1797962 - Posted: 22 Jun 2016, 15:53:12 UTC - in response to Message 1797960.  
Last modified: 22 Jun 2016, 15:54:36 UTC

thanks. Soon will be more modern NV build (like AMD Hd5 one I posted recently) to explore. Look prev post how to manage that testing.

EDIT: unfortunately, your links require Dropbox login. Make them public ones.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1797962 · Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,453
RAC: 0
United States
Message 1797963 - Posted: 22 Jun 2016, 16:01:41 UTC - in response to Message 1797956.  

It's "verbose" one so don't expect amazing performance from it but it can tell you smth new about your system.


If that makes it slow, would it not also affect the figures, it is supposed to report?
ID: 1797963 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5809
Credit: 76,067,204
RAC: 51,211
Russia
Message 1797965 - Posted: 22 Jun 2016, 16:04:36 UTC - in response to Message 1797963.  

It's "verbose" one so don't expect amazing performance from it but it can tell you smth new about your system.


If that makes it slow, would it not also affect the figures, it is supposed to report?

No. I doesn't. It make it slow overall due to added output overhead. But each kernel call executes on same speed as before so you get correct info about kernel execution times.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1797965 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2634
Credit: 48,263,453
RAC: 132,167
Australia
Message 1798051 - Posted: 22 Jun 2016, 23:46:15 UTC - in response to Message 1797962.  

thanks. Soon will be more modern NV build (like AMD Hd5 one I posted recently) to explore. Look prev post how to manage that testing.

EDIT: unfortunately, your links require Dropbox login. Make them public ones.



. . This should work.

https://www.dropbox.com/s/63pvtzut2dh1hnt/stderr_trial2_last%20WU%20error.zip?dl=0

https://www.dropbox.com/s/nmwx8re4xpm4bt6/stderr_local_slot%200.zip?dl=0

https://www.dropbox.com/s/s7h3p99w68w0juu/stderr_trial2_WU2_48per.zip?dl=0

. . Hope this helps.
ID: 1798051 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5809
Credit: 76,067,204
RAC: 51,211
Russia
Message 1798439 - Posted: 24 Jun 2016, 16:03:09 UTC - in response to Message 1798051.  

They are downloadable, thanks.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1798439 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5809
Credit: 76,067,204
RAC: 51,211
Russia
Message 1798440 - Posted: 24 Jun 2016, 16:06:26 UTC
Last modified: 24 Jun 2016, 16:09:49 UTC

Here is NV sibling of posted earlier HD5 ATi build.
https://cloud.mail.ru/public/HUAE/soM11FDVh

Please look this post for info what to do with it.

So far I found that on my C-60 changing -use_sleep_ex N from 1 to 4 including almost doesn't change real sleep time. It remains ~15ms.

How it will react on CPU load and priority change - to be explored.

P.S. here is small Perl script for relevant data extraction from stderr.txt:

$path="stderr.txt"; 
$results="times_iterations.txt";

open (RES, ">".$results);
                                   
open (IN, $path);  
while (<IN>) {    
		if(/Partial PulseFind_3(.*)Awaited (\d+) iterations/){
			@iterations=(@iterations,$2);
		}
		if(/Kernel PULSE_PARTIAL execution time: (\d+\.\d+)/ || /Kernel PULSE_PARTIAL execution time: (\d+)/ ){
			@exec_time=(@exec_time,$1);
		}
}
print RES "excec_time\titerations\n";

foreach $iter (@iterations){
	print RES $exec_time[$i]."\t".$iterations[$i]."\n";
	$i++;
}      

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1798440 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2634
Credit: 48,263,453
RAC: 132,167
Australia
Message 1798772 - Posted: 26 Jun 2016, 4:04:37 UTC - in response to Message 1798440.  

Here is NV sibling of posted earlier HD5 ATi build.
https://cloud.mail.ru/public/HUAE/soM11FDVh

Please look this post for info what to do with it.

So far I found that on my C-60 changing -use_sleep_ex N from 1 to 4 including almost doesn't change real sleep time. It remains ~15ms.

How it will react on CPU load and priority change - to be explored.

P.S. here is small Perl script for relevant data extraction from stderr.txt:

$path="stderr.txt"; 
$results="times_iterations.txt";

open (RES, ">".$results);
                                   
open (IN, $path);  



while (<IN>) {    
		if(/Partial PulseFind_3(.*)Awaited (\d+) iterations/){
			@iterations=(@iterations,$2);
		}
		if(/Kernel PULSE_PARTIAL execution time: (\d+\.\d+)/ || /Kernel PULSE_PARTIAL execution time: (\d+)/ ){
			@exec_time=(@exec_time,$1);
		}
}
print RES "excec_time\titerations\n";

foreach $iter (@iterations){
	print RES $exec_time[$i]."\t".$iterations[$i]."\n";
	$i++;
}      



. . OK, to which file do I add that script?
ID: 1798772 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2634
Credit: 48,263,453
RAC: 132,167
Australia
Message 1798788 - Posted: 26 Jun 2016, 6:31:59 UTC - in response to Message 1798440.  

Here is NV sibling of posted earlier HD5 ATi build.
https://cloud.mail.ru/public/HUAE/soM11FDVh

Please look this post for info what to do with it.

So far I found that on my C-60 changing -use_sleep_ex N from 1 to 4 including almost doesn't change real sleep time. It remains ~15ms.

How it will react on CPU load and priority change - to be explored.

P.S. here is small Perl script for relevant data extraction from stderr.txt:

$path="stderr.txt"; 
$results="times_iterations.txt";

open (RES, ">".$results);
                                   
open (IN, $path);  
while (<IN>) {    
		if(/Partial PulseFind_3(.*)Awaited (\d+) iterations/){
			@iterations=(@iterations,$2);
		}
		if(/Kernel PULSE_PARTIAL execution time: (\d+\.\d+)/ || /Kernel PULSE_PARTIAL execution time: (\d+)/ ){
			@exec_time=(@exec_time,$1);
		}
}
print RES "excec_time\titerations\n";

foreach $iter (@iterations){
	print RES $exec_time[$i]."\t".$iterations[$i]."\n";
	$i++;
}      


. . OK

. . I have downloaded the new version r3475 and extracted it to a sub-folder on the seti drive. I have edited the file:
. . mb_cmdline_win_x86_SSE3_OpenCL_NV.txt with the command line
. . -use_sleep_ex 1 -sbs 256 -v 6 -period_iterations_num 100
. . I have saved the script you posted to this notepad file:
https://www.dropbox.com/s/yus8dzjuoyny0ik/Raistmer_Perl_script.txt?dl=0

. . I can change back to SoG with 0.45 installer Beta(3) but how do I make it use this app instead of the included r3472? And where do I add the script file?
ID: 1798788 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5809
Credit: 76,067,204
RAC: 51,211
Russia
Message 1798887 - Posted: 26 Jun 2016, 20:02:16 UTC - in response to Message 1798788.  

. . I can change back to SoG with 0.45 installer Beta(3) but how do I make it use this app instead of the included r3472? And where do I add the script file?

If you have Perl it can be used for speedup data extraction from stderr.txt.
If not do it by hands.
here http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=497 I put some small Perl interpreter in pack.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1798887 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5809
Credit: 76,067,204
RAC: 51,211
Russia
Message 1798982 - Posted: 27 Jun 2016, 13:51:25 UTC

Sleeping behavior greatly reworked.
I updated corresponding post about this option( http://lunatics.kwsn.info/index.php/topic,1808.msg60933.html#msg60933 ).
New builds to test usability of new approach to sleep will be awailable soon, stay tuned.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1798982 · Report as offensive     Reply Quote
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3717
Credit: 8,886,321
RAC: 636
Bulgaria
Message 1798985 - Posted: 27 Jun 2016, 14:19:57 UTC - in response to Message 1798788.  

     I can change back to SoG with 0.45 installer Beta(3) but how do I make it use this app instead of the included r3472?

Copy the files from the new package to SETI@home directory (<BOINC_Data>\projects\setiathome.berkeley.edu\)      (probably you can skip the .dll files - they should be the same)
Edit app_info.xml with Notepad

Global replace (Ctrl+Home Ctrl+H) the old .exe name by the new MB8_win_x86_SSE3_OpenCL_NV_r3475.exe
Global replace (Ctrl+Home Ctrl+H) the old MultiBeam_Kernels_rXXXX.cl name by the new MultiBeam_Kernels_r3475.cl

(Don't type anything, only use Copy/Paste from real filenames to avoid mistakes.)

Save the edited app_info.xml
Restart BOINC


P.S.
Don't use the included MB8_win_x86_SSE3_OpenCL_NV.aistub since it have only:
<version_num>800</version_num>



- ALF - "Find out what you don't do well ..... then don't do it!" :)
ID: 1798985 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 13 · Next

Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.