New AstroPulse for GPU ( ATi & NV) released (r1316) |
![]() |
| log in |
Message boards : Number crunching : New AstroPulse for GPU ( ATi & NV) released (r1316)
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next
| Author | Message |
|---|---|
So I added llibfftw3f-3.dll 3 times. Is this OK also I added the default <cmdline>unroll=2, ffa_block=1024, ffa_block_fetch=512</cmdline> what else I need to add? You used wrong syntax. space required between param name and param value. Look first post. Besides of this, why you want to specify default params ??? They are already used, this is what "default" means... | |
| ID: 1257718 · | |
Oops, sorry, I missed that since I was coming from the old 560. Guys, it's not possible to answer to anyone personally each time. These warnings/infos were already discussed just few posts before in very this thread. Lets not make thread unmanageable, please, sometimes I need to use mobile network to connect so please keep thread as clean as possible. EDIT: reference: http://setiathome.berkeley.edu/forum_thread.php?id=68675&nowrap=true#1256930 | |
| ID: 1257723 · | |
|
So leave <cmdline>unroll=2, ffa_block=1024, ffa_block_fetch=512</cmdline> | |
| ID: 1257725 · | |
So leave <cmdline>unroll=2, ffa_block=1024, ffa_block_fetch=512</cmdline> yes. | |
| ID: 1257727 · | |
|
With or without <cmdline></cmdline> in app_info doesnt pickup the params from ap_cmdline.txt | |
| ID: 1257920 · | |
With or without <cmdline></cmdline> in app_info doesnt pickup the params from ap_cmdline.txt App will always pick up params from both places. If duplicated params occur one that last met will be used. ap_cmdline.txt processed latest so params there should overwrite params (with the same name) from <cmdline> tag. Different params just will be added from both sources. Example of usage: let say one have both NV and ATi GPUs in single host. Then one can set common part of params in ap_cmdline.txt and tune params that should differ for those GPUs via <smdline> tag. Another example originally was given by Mike: if you want to change params w/o BOINC restart - just edit ap_cmdline.txt file. On next launch new params will be used. app_info editing requires BOINC restart. | |
| ID: 1257948 · | |
|
I made 2 posts about current situation with driver support for OpenCL on both vendors forums recently: | |
| ID: 1257949 · | |
|
ETA for linux version or is this windows only? | |
| ID: 1257970 · | |
ETA for linux version or is this windows only? Our Linux guy busy with some windows issue now + he working on multiple CPU versions for different *nix based OSes at once... I'm afraid no ETA at all. The solution either to port to Linux by yourself or to attract someone who can do this to all community benefit [sources are available for free at Berkeley's SETI repository already: https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt] | |
| ID: 1257992 · | |
|
I'd like to see more Linux support too, Windows being the only option has made it necessary for me to move some hosts back to it for GPGPU (Linux MB is only supported for x64 CPUs). But I understand that Windows has the biggest user base and vendor driver support so it makes sense to focus efforts there.
| |
| ID: 1258013 · | |
|
That's the entry of my app_info.xml file: <coproc>
<type>CUDA</type>
<count>0.5</count>
</coproc> ...but only so is shown the related usage in the <stderr_txt>: (...) ...so I like it. ;-D - - - - - - - - - - On my machine... Each application/work unit use ~ 23 MB average system RAM (~ 36 MB peak), so ~ 46 MB both. Each application/work unit use ~ 250 MB VRAM, so ~ 500 MB both. ...if the work units are *fine*. From my experiences if the work unit is/will be a 30/30 result the system RAM usage increase and increase every second ~ 1 MB, I saw an average (peak) usage at ~ 250 MB. I saw something strange. Normally every 0.9 % progress a checkpoint is saved. This resultid=2515482815 did this not. It was calculated 9 minutes : 11 seconds, 6.624 % progress, BoincTasks showed no checkpoint yet, then BOINC suspended this and an other AstroPulse work unit for to let run SETI@home Enhanced work units with earlier deadline. After the bunch of shorties was calculated the above mentioned AstroPulse work units was restarted. The one without checkpoint started at 0 calculated time and 0 % progress (from scratch). Why happened this? My <cmdline> settings are too *high*? Or this happen from time to time? Thanks * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * ____________ >Das Deutsche Cafe. The German Cafe.< | |
| ID: 1258036 · | |
|
Your GPU did work with these settings with r1305. | |
| ID: 1258039 · | |
|
This version doesn't seem to checkpoint very often. I watched it for awhile and it checkpointed about 15 minutes apart. Seems rather long compared to the 2-3 minutes for other apps. | |
| ID: 1258065 · | |
|
This application make every 0.9 % progress a checkpoint - it *should*. | |
| ID: 1258171 · | |
This application make every 0.9 % progress a checkpoint - it *should*. Downloaded AstroPulse rev.1316 and installed and replaced it taking the risk of a running AP task. Which worked out OK though. Host ATI GPUs. Using the same <command-line-parameters: UNROLL=15 ffa_block 10240 & ffa_ block_fetch 5120, 1 WU per GPU. GPU-load is higher. Looks like it's faster, % updates similar but faster, will watch result. Is the FLOPS_Entry necessary, I used none and estimates have become quite precise, after > 100 valid results in a row. ____________ Knight Who Says Ni N!, OUT numbered................. | |
| ID: 1258568 · | |
|
No Fred, you dont need the flops entry if your estimates are in order. | |
| ID: 1258579 · | |
|
Quick timetable
WU : ap_Zblank_9LC67.wu
AP6_win_x86_SSE2_OpenCL_ATI_r555.exe -verbose -unroll 3 :
Elapsed 989.422 secs
CPU 250.969 secs
AP6_win_x86_SSE2_OpenCL_ATI_r1316.exe -verbose -unroll 3 :
Elapsed 756.250 secs, speedup: 23.57% ratio: 1.31x
CPU 63.156 secs, speedup: 74.84% ratio: 3.97x
____________ - ALF - "Find out what you don't do well ..... then don't do it!" :) | |
| ID: 1258692 · | |
Just some feedback on performance observed so far - most still waiting on validation, but no invalid or results in error so far. AMD Radeon Comparing between WUs with similar blanking percentages, the HD 6950 uses about 35% less CPU time and overall run-time reduced by 30%. That's impressive. On the HD 5760, CPU time went down by about 75% (wow) and overall run-time decreased by about 25%. Another big win, I think. Haven't gone enough WUs on the HD 4670 AGP to make any conclusions yet - there was a lot of high-blanking percentage WUs that got dealt to my hosts over the last few days, slowing down processing. But at this point, r1316 seems to take nearly twice as long in terms of overall run-time. I'm guessing the 'patching required max_kernel_wg_size=32 for binary cached kernels' error message is a bad thing - with the smallest work-group size of all my GPUs of 128, the HD 4670 is probably most sensitive to memory constraints. I'll try playing around with some parameters and if nothing helps, fall back to r1305 and see if that results in any better performance. Again, not enough WUs processed on the slower GPUs to make any conclusions yet, but performance on the C-50 still seems to be showing an improvement. CPU and overall run-time down by about 15%. NVIDIA GeForce Not having participated in the beta trials, I have no point of comparison for my CUDA cards. But here's what I observed so far: Overall run-times for Fermi and Kepler, GTX 570 and 670, look good - perhaps on par with their Radeon counter-parts - however, the CPU time seems to be a significant proportion of that. I understand that this is a known driver issue - I'm currently running 301.42. I remember a similar problem occurring for the Catalyst drivers - here's hoping for a prompt fix from NVIDIA. Interestingly enough, the pre-Fermi code path doesn't seem to be affected (as much) by the high CPU use issue - unless it happens to substantially favour AMD K10 over Intel Core and AMD K8 (I doubt it). On my GTX 260 Core 216, CPU usage is only of the order of a few minutes as opposed to several dozens of minutes - a tiny fraction of the overall run-time. Will have to check that the WUs validate properly in case there's something funny going on there. That's all for now - hope this feedback offers a few helpful pieces in the overall performance picture. Now, just have to wait for a chance to download some more WUs... ____________ Soli Deo Gloria | |
| ID: 1258896 · | |
|
Thanks for summary. It's very interesting to know that even with newest drivers your NV 260 has low CPU usage. What is so specific in that host - very interesting to find... | |
| ID: 1259000 · | |
|
Happy to help, glad to know it's useful in some small way. | |
| ID: 1259081 · | |
Message boards : Number crunching : New AstroPulse for GPU ( ATi & NV) released (r1316)
| Copyright © 2013 University of California |