New AstroPulse for GPU ( ATi & NV) released (r1316)

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1257718 - Posted: 8 Jul 2012, 22:08:33 UTC - in response to Message 1257705. So I added llibfftw3f-3.dll 3 times. Is this OK also I added the default <cmdline>unroll=2, ffa_block=1024, ffa_block_fetch=512</cmdline> what else I need to add? You used wrong syntax. space required between param name and param value. Look first post. Besides of this, why you want to specify default params ??? They are already used, this is what "default" means... ID: 1257718 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1257723 - Posted: 8 Jul 2012, 22:13:44 UTC - in response to Message 1257714. Last modified: 8 Jul 2012, 22:15:15 UTC Oops, sorry, I missed that since I was coming from the old 560. Well, I finished the first one. It validated but got a lot of warnings in the stddr. Not sure if I did something wrong or not. http://setiathome.berkeley.edu/result.php?resultid=2515639344 Guys, it's not possible to answer to anyone personally each time. These warnings/infos were already discussed just few posts before in very this thread. Lets not make thread unmanageable, please, sometimes I need to use mobile network to connect so please keep thread as clean as possible. EDIT: reference: http://setiathome.berkeley.edu/forum_thread.php?id=68675&nowrap=true#1256930 ID: 1257723 ·

Paul D Harris Volunteer tester Send message Joined: 1 Dec 99 Posts: 1122 Credit: 33,600,005 RAC: 0	Message 1257725 - Posted: 8 Jul 2012, 22:25:52 UTC So leave <cmdline>unroll=2, ffa_block=1024, ffa_block_fetch=512</cmdline> as <cmdline></cmdline> ID: 1257725 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1257727 - Posted: 8 Jul 2012, 22:28:16 UTC - in response to Message 1257725. So leave <cmdline>unroll=2, ffa_block=1024, ffa_block_fetch=512</cmdline> as <cmdline></cmdline> yes. ID: 1257727 ·

X-Files 27 Send message Joined: 17 May 99 Posts: 104 Credit: 111,191,433 RAC: 0	Message 1257920 - Posted: 9 Jul 2012, 5:51:36 UTC With or without <cmdline></cmdline> in app_info doesnt pickup the params from ap_cmdline.txt Did I missed something? ID: 1257920 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1257948 - Posted: 9 Jul 2012, 7:10:48 UTC - in response to Message 1257920. Last modified: 9 Jul 2012, 7:19:43 UTC With or without <cmdline></cmdline> in app_info doesnt pickup the params from ap_cmdline.txt Did I missed something? App will always pick up params from both places. If duplicated params occur one that last met will be used. ap_cmdline.txt processed latest so params there should overwrite params (with the same name) from <cmdline> tag. Different params just will be added from both sources. Example of usage: let say one have both NV and ATi GPUs in single host. Then one can set common part of params in ap_cmdline.txt and tune params that should differ for those GPUs via <smdline> tag. Another example originally was given by Mike: if you want to change params w/o BOINC restart - just edit ap_cmdline.txt file. On next launch new params will be used. app_info editing requires BOINC restart. ID: 1257948 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1257949 - Posted: 9 Jul 2012, 7:11:40 UTC I made 2 posts about current situation with driver support for OpenCL on both vendors forums recently: http://devgurus.amd.com/thread/159432 http://developer.nvidia.com/devforum/discussion/10636/feature-request-to-add-synchronization-mode-tuning-via-nv-specific-opencl-extension If you have something to say on topic or explain why this important for users, please do post in corresponding threads. ID: 1257949 ·

Kamu Send message Joined: 19 Jan 02 Posts: 56 Credit: 11,009,499 RAC: 0	Message 1257970 - Posted: 9 Jul 2012, 8:49:36 UTC ETA for linux version or is this windows only? -Kimmo- Computers: obelix ID: 1257970 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1257992 - Posted: 9 Jul 2012, 10:14:57 UTC - in response to Message 1257970. Last modified: 9 Jul 2012, 10:17:13 UTC ETA for linux version or is this windows only? -Kimmo- Our Linux guy busy with some windows issue now + he working on multiple CPU versions for different *nix based OSes at once... I'm afraid no ETA at all. The solution either to port to Linux by yourself or to attract someone who can do this to all community benefit [sources are available for free at Berkeley's SETI repository already: https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt] ID: 1257992 ·

Wedge009 Volunteer tester Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553	Message 1258013 - Posted: 9 Jul 2012, 12:08:11 UTC Last modified: 9 Jul 2012, 12:35:58 UTC I'd like to see more Linux support too, Windows being the only option has made it necessary for me to move some hosts back to it for GPGPU (Linux MB is only supported for x64 CPUs). But I understand that Windows has the biggest user base and vendor driver support so it makes sense to focus efforts there. Anyway, thank you again for your awesome work, Raistmer. Just in case it helps with your testing/profiling, I have your new applications running on the following GPUs (seemingly without problem so far, but will have to wait for validation for full confirmation): HD 6950 HD 5760 HD 4670 (AGP!) AMD Fusion C-50 (r1305 doesn't seem to make much difference, so using r1316 like the others) GTX 670 GTX 570 GTX 260 Core 216 Edit: Forgot to mention - there does appear to be a performance increase for ATI application - GPU usage is closer to 100% and work-units appear to be completed in a shorter time. I haven't used any NV version before, so there's no point of reference for comparison. Soli Deo Gloria ID: 1258013 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1258036 - Posted: 9 Jul 2012, 13:51:18 UTC Last modified: 9 Jul 2012, 13:56:20 UTC That's the entry of my app_info.xml file: <cmdline>-instances_per_device 2 -hp -unroll 10 -ffa_block 8192 -ffa_block_fetch 4096</cmdline> - - - - - - - - - - Just for info others: The -instances_per_device isn't used from there only from: <coproc> <type>CUDA</type> <count>0.5</count> </coproc> ...but only so is shown the related usage in the <stderr_txt>: (...) Number of app instances per device set to:2 (...) ...so I like it. ;-D - - - - - - - - - - On my machine... Each application/work unit use ~ 23 MB average system RAM (~ 36 MB peak), so ~ 46 MB both. Each application/work unit use ~ 250 MB VRAM, so ~ 500 MB both. ...if the work units are fine. From my experiences if the work unit is/will be a 30/30 result the system RAM usage increase and increase every second ~ 1 MB, I saw an average (peak) usage at ~ 250 MB. I saw something strange. Normally every 0.9 % progress a checkpoint is saved. This resultid=2515482815 did this not. It was calculated 9 minutes : 11 seconds, 6.624 % progress, BoincTasks showed no checkpoint yet, then BOINC suspended this and an other AstroPulse work unit for to let run SETI@home Enhanced work units with earlier deadline. After the bunch of shorties was calculated the above mentioned AstroPulse work units was restarted. The one without checkpoint started at 0 calculated time and 0 % progress (from scratch). Why happened this? My <cmdline> settings are too high? Or this happen from time to time? Thanks * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * ID: 1258036 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1258039 - Posted: 9 Jul 2012, 14:00:57 UTC Last modified: 9 Jul 2012, 14:01:36 UTC Your GPU did work with these settings with r1305. So please wait a few days i will check it if it gets validated or not. We should wait til a few units are completed and not getting confused after one results IMHO. With each crime and every kindness we birth our future. ID: 1258039 ·

Wembley Volunteer tester Send message Joined: 16 Sep 09 Posts: 429 Credit: 1,844,293 RAC: 0	Message 1258065 - Posted: 9 Jul 2012, 15:56:04 UTC This version doesn't seem to checkpoint very often. I watched it for awhile and it checkpointed about 15 minutes apart. Seems rather long compared to the 2-3 minutes for other apps. ID: 1258065 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1258171 - Posted: 9 Jul 2012, 20:20:30 UTC - in response to Message 1258065. Last modified: 9 Jul 2012, 20:23:05 UTC This application make every 0.9 % progress a checkpoint - it should. I could see on my system that this isn't always the case, the application don't made a checkpoint after 6.624 % progress (normally 7 checkpoints done)... - so I'm curious about why this could happen... What was the reason, my settings, the work unit, my machine, ...? * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * ID: 1258171 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1258568 - Posted: 10 Jul 2012, 15:46:52 UTC - in response to Message 1258171. Last modified: 10 Jul 2012, 15:50:48 UTC This application make every 0.9 % progress a checkpoint - it should. I could see on my system that this isn't always the case, the application don't made a checkpoint after 6.624 % progress (normally 7 checkpoints done)... - so I'm curious about why this could happen... What was the reason, my settings, the work unit, my machine, ...? * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * Downloaded AstroPulse rev.1316 and installed and replaced it taking the risk of a running AP task. Which worked out OK though. Host ATI GPUs. Using the same <command-line-parameters: UNROLL=15 ffa_block 10240 & ffa_ block_fetch 5120, 1 WU per GPU. GPU-load is higher. Looks like it's faster, % updates similar but faster, will watch result. Is the FLOPS_Entry necessary, I used none and estimates have become quite precise, after > 100 valid results in a row. ID: 1258568 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1258579 - Posted: 10 Jul 2012, 22:28:56 UTC No Fred, you dont need the flops entry if your estimates are in order. And yes r1316 is faster on high end GPUs. With each crime and every kindness we birth our future. ID: 1258579 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1258692 - Posted: 11 Jul 2012, 3:05:05 UTC On ATI AMD Radeon HD 6570 / Catalyst 11.12 / Win XP r1316 is 1.3x times faster than r555 and uses less CPU and more GPU %: Quick timetable WU : ap_Zblank_9LC67.wu AP6_win_x86_SSE2_OpenCL_ATI_r555.exe -verbose -unroll 3 : Elapsed 989.422 secs CPU 250.969 secs AP6_win_x86_SSE2_OpenCL_ATI_r1316.exe -verbose -unroll 3 : Elapsed 756.250 secs, speedup: 23.57% ratio: 1.31x CPU 63.156 secs, speedup: 74.84% ratio: 3.97x Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1258692 ·

Wedge009 Volunteer tester Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553	Message 1258896 - Posted: 11 Jul 2012, 13:05:55 UTC - in response to Message 1258013. Last modified: 11 Jul 2012, 13:07:17 UTC HD 6950 HD 5760 HD 4670 (AGP!) AMD Fusion C-50 (r1305 doesn't seem to make much difference, so using r1316 like the others) GTX 670 GTX 570 GTX 260 Core 216 Just some feedback on performance observed so far - most still waiting on validation, but no invalid or results in error so far. AMD Radeon Comparing between WUs with similar blanking percentages, the HD 6950 uses about 35% less CPU time and overall run-time reduced by 30%. That's impressive. On the HD 5760, CPU time went down by about 75% (wow) and overall run-time decreased by about 25%. Another big win, I think. Haven't gone enough WUs on the HD 4670 AGP to make any conclusions yet - there was a lot of high-blanking percentage WUs that got dealt to my hosts over the last few days, slowing down processing. But at this point, r1316 seems to take nearly twice as long in terms of overall run-time. I'm guessing the 'patching required max_kernel_wg_size=32 for binary cached kernels' error message is a bad thing - with the smallest work-group size of all my GPUs of 128, the HD 4670 is probably most sensitive to memory constraints. I'll try playing around with some parameters and if nothing helps, fall back to r1305 and see if that results in any better performance. Again, not enough WUs processed on the slower GPUs to make any conclusions yet, but performance on the C-50 still seems to be showing an improvement. CPU and overall run-time down by about 15%. NVIDIA GeForce Not having participated in the beta trials, I have no point of comparison for my CUDA cards. But here's what I observed so far: Overall run-times for Fermi and Kepler, GTX 570 and 670, look good - perhaps on par with their Radeon counter-parts - however, the CPU time seems to be a significant proportion of that. I understand that this is a known driver issue - I'm currently running 301.42. I remember a similar problem occurring for the Catalyst drivers - here's hoping for a prompt fix from NVIDIA. Interestingly enough, the pre-Fermi code path doesn't seem to be affected (as much) by the high CPU use issue - unless it happens to substantially favour AMD K10 over Intel Core and AMD K8 (I doubt it). On my GTX 260 Core 216, CPU usage is only of the order of a few minutes as opposed to several dozens of minutes - a tiny fraction of the overall run-time. Will have to check that the WUs validate properly in case there's something funny going on there. That's all for now - hope this feedback offers a few helpful pieces in the overall performance picture. Now, just have to wait for a chance to download some more WUs... Soli Deo Gloria ID: 1258896 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1259000 - Posted: 11 Jul 2012, 17:32:43 UTC - in response to Message 1258896. Last modified: 11 Jul 2012, 17:33:01 UTC Thanks for summary. It's very interesting to know that even with newest drivers your NV 260 has low CPU usage. What is so specific in that host - very interesting to find... Regarding AGP HD4670: if you would agree to perticipate in some offline testing I would send some test case to you to check. ID: 1259000 ·

Wedge009 Volunteer tester Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553	Message 1259081 - Posted: 11 Jul 2012, 19:38:33 UTC Happy to help, glad to know it's useful in some small way. There's nothing special about it compared with my other hosts, as far as I know, it's just a box I set up for my brother. It's still WinXP32, it's mixed ATI/NV as with the others, it's... oh, it's not running any ATI WUs at the same time... I wouldn't have thought that would make any difference though. Well, as I said, will have to wait till results validate. Have there been any testers with pre-Fermi cards experiencing the same or different performance? I haven't tried off-line testing before, but I suppose I could give it a try especially given the current work-unit shortage - my GPU machines are running out of work... but it'll have to wait till at least 06:30 UTC when I get back from work. Soli Deo Gloria ID: 1259081 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.