Message boards :
Number crunching :
Public beta for nVidia AstroPulse, rev 521
Message board moderation
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Alpha and limited beta on Lunatics were successful so I'll start public beta test for this application. Binary can be downloaded here: http://files.mail.ru/W3B0CG (as usual it's your own responsibility to do all antiviral tests (just reminder)). What is known already and what requires further testing: 1) App can silently fail reporting overflow (30 signals foud) in signle pulses. To find 30 single pulses w/o 30 repetitive ones is highly unlikely and can be sign of problems. So far reducing unroll factor from default solve this issue (all involved hosts can produce valid tasks with correctly chosen unroll factor). What is correct value - task for you, beta tester, to find. 2) 2 hosts already reported greatly increased CPU consumption when running with 27x.xx drivers. Rolling back to 26x.xx ones solve this issue in both cases. But for now both these hosts are dual NV/ATi GPUs. Your (as beta tester) goal is to check if this behavior common or not. To make oneself familiar with app options and behavior I strongly recommend to look through ATi AP release notes and corresponding threads. NV and ATi apps have much more common than different (actually, ATi version could be run on NV GPUs in many cases). Here is short excerpt: There are command line switches that can be used for app performance tuning: -unroll N -sets DATA_CHUNK_UNROLL variable to N. This allows to do N data chunks per FindSinglePulse kernel call improving (in most cases) performance but increasing GPU memory requirements. On low-end GPUs it may be worth to use lower values. Default setted to 10. -ffa_block 8192 (default value) - defines how many different periods GPU will process per single kernel call -ffa_block_fetch 2048 (default value) - defines how many threads will be used in FFA initial fetch kernel Rules for using these values: -ffa_block_fetch <number> can be used only if -ffa_block <number> already listed in command line numbers should be even,better if they will be power of 2, ffa_block should be divisible by ffa_block_fetch. If you experience lags during application execution try to decrease these values. -instances_per_device <N> allows to run N app instances per single device -instances_per_device 1 (default value) Once again, keep in mind: to run 2 instances on single GPU one should use -instances_per_device 2 parameter and set <count>0.5</count> inside app_info.xml file -hp will instruct application to rise its priority class to high. Useful when host under high non-BOINC loads. Also useful if BOINC client itself imposes high load on CPU. -no_cpu_lock - disables affinity setting Example of app_info section (this one known to work on my own GSO9600): <app> <name>astropulse_v505</name> </app> <file_info> <name>ap_5.06_win_x86_SSE3_OpenCL_NV_r521.exe</name> <executable/> </file_info> <file_info> <name>AstroPulse_Kernels_r521.cl</name> <executable/> </file_info> <app_version> <app_name>astropulse_v505</app_name> <version_num>505</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.04</avg_ncpus> <max_ncpus>0.20</max_ncpus> <plan_class>cuda</plan_class> <cmdline>-instances_per_device 1 -hp -unroll 10</cmdline> <flops>30987654321</flops> <file_ref> <file_name>ap_5.06_win_x86_SSE3_OpenCL_NV_r521.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>AstroPulse_Kernels_r521.cl</file_name> <copy_file/> </file_ref> <coproc> <type>CUDA</type> <count>1</count> </coproc> </app_version> Please, report any findings about this app in this thread. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Anyone with Dual vendor hosts, ie with Nvidia & ATI GPU's and want to run OpenCL apps on Nvidia and ATI devices at the same time need to set the -instances_per_device to 2 for all OpenCL apps, otherwise only One OpenCL app will run at once, ie Boinc will start one app on each device, but progress will only occur on one of them, Count value should be left as it is. Claggy |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
Will there be support for other cpu then sse3 in future releases? TRuEQ & TuVaLu |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Will there be support for other cpu then sse3 in future releases? It's GPU app, I don't think it's very important, unless you have CPU with lesser capabilities.. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Will there be support for other cpu then sse3 in future releases? I have a SSE only XP3200 with a 8400 GS ;-) Claggy |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
An older AMD http://setiathome.berkeley.edu/show_host_detail.php?hostid=6031403 I am not at it's location now so I can't run CPU-z at the moment... TRuEQ & TuVaLu |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Will there be support for other cpu then sse3 in future releases? When you will be able to run OpenCL runtime there we will talk again ;) |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
Nice Claggy :) Then I'll try get it going on my AMD this week. //TQ TRuEQ & TuVaLu |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
An older AMD http://setiathome.berkeley.edu/show_host_detail.php?hostid=6031403 AMD Athlon64X2 is SSE3 capable. |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
Well I'm trying it. Wonder if anyone can say beforehand that it's a good idea to change something in the command line when using a GTX460? I've just copied the app_info as supplied by Raistmer as is. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
One word of warning, be prepared for very high to completion estimates and running high priority until the DCF adjusts. Another little thing, if you try to run more than one per GPU but forget to change the instances per device to two (or more) a second one will try to start but will only count up the elapsed time without actually doing any work. PROUD MEMBER OF Team Starfire World BOINC |
Jamie Send message Joined: 5 Apr 06 Posts: 162 Credit: 9,867,955 RAC: 0 |
here's my cmdline options for my 465: <cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -hp</cmdline> Been working fine like this for a few weeks now |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Well I'm trying it. Wonder if anyone can say beforehand that it's a good idea to change something in the command line when using a GTX460? I've just copied the app_info as supplied by Raistmer as is. I've been running basically the same as Raistmer's app_info, but without the -hp switch (on my GTX460) and -ffa_block 6144 -ffa_block_fetch 1536 set on both ATI & Nvidia GPU's Claggy |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
I'm going to give 1 AP the whole GPU to see how it works, if it goes well I plan to try 1 AP and 1 or 2 MBs to try that. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Well I'm trying it. Wonder if anyone can say beforehand that it's a good idea to change something in the command line when using a GTX460? I've just copied the app_info as supplied by Raistmer as is. I'm pretty much running just what Raistmer gave on my GTS 450 but have changed the count to .51 and 1 for the number per device for the AP and .49 for the MB. That way I will run two MBs or one MB and one AP but wont try to start a second AP. I also changed the unroll to four because of some trouble I was having but I think it was more my Over clock than an app problem. I just haven't changed it back to the default ten yet. Edit: I was just informed I'm an idiot! :-) No, just that the default is actually ten on the unroll. The app I was given early was set at six and I didn't look to see what Raistmer gave out. edit 2: Ok, I'm back to the default ten. Now to wait for an AP to start to see how it does. That could have been slowing me down some. PROUD MEMBER OF Team Starfire World BOINC |
halfempty Send message Joined: 2 Jun 99 Posts: 97 Credit: 35,236,901 RAC: 114 |
Just installed it on my GTS 450. I changed the unroll to 8 because I think the 450 has 4 processor clusters, so I wanted to make unroll a factor of that. When I ran an ATi I had to change -ffa_block 6144 -ffa_block_fetch 1536 for screen responsiveness, but I'll wait on that one till it starts crunching. That may be several days the way AP units are being rationed these days. |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
That may be several days the way AP units are being rationed these days. I've disabled CPU & MB work so I'm only requesting AP for the GPU, but so far no APs. They're in demand since it seems there's no MB work being split at the moment. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Just installed it on my GTS 450. I changed the unroll to 8 because I think the 450 has 4 processor clusters, so I wanted to make unroll a factor of that. When I ran an ATi I had to change -ffa_block 6144 -ffa_block_fetch 1536 for screen responsiveness, but I'll wait on that one till it starts crunching. That may be several days the way AP units are being rationed these days. I thought about what you said about unroll set to 8 so I decided to try it. The AP task still ran about the same but I noticed the MB task running with it on my GTS 450 suddenly slowed way down. When I checked it's time to completion guesstimate was double what it had been. I stopped and went back up to 10 and now the MB task is running much faster again. PROUD MEMBER OF Team Starfire World BOINC |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Please educate me. Do you just drop the files into the Seti project folder and modify the app_info? Do you have to do anything with the stock AK_V8 AP app? Like delete the files or remove the relevant section from the app_info? How does the Manager know which application to schedule for using the CUDA Open_CL app? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Sorry it took so long to answer. Yes, drop the files in the SETI projects folder then copy and paste the app_ info section into your app_info file. If you already have regular CPU APs or if you still want to run them then leave the existing app section as it is. If you don't have and/or do not want to run anymore CPU APs then you can remove that section. Remember to stop SETI and BOINC before you do anything to your app_info file and to open it in notepad. Also remember to save your changes as .xml. Once you add the new section to your app_info it will automatically start asking for GPU AP work the same way it asks for the MB cuda work.. PROUD MEMBER OF Team Starfire World BOINC |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.