New AstroPulse for GPU ( ATi & NV) released (r1316)


log in

Advanced search

Message boards : Number crunching : New AstroPulse for GPU ( ATi & NV) released (r1316)

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next
Author Message
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,041,393
RAC: 34,343
Russia
Message 1257718 - Posted: 8 Jul 2012, 22:08:33 UTC - in response to Message 1257705.

So I added llibfftw3f-3.dll 3 times. Is this OK also I added the default <cmdline>unroll=2, ffa_block=1024, ffa_block_fetch=512</cmdline> what else I need to add?

You used wrong syntax. space required between param name and param value. Look first post.
Besides of this, why you want to specify default params ??? They are already used, this is what "default" means...

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,041,393
RAC: 34,343
Russia
Message 1257723 - Posted: 8 Jul 2012, 22:13:44 UTC - in response to Message 1257714.
Last modified: 8 Jul 2012, 22:15:15 UTC

Oops, sorry, I missed that since I was coming from the old 560.

Well, I finished the first one. It validated but got a lot of warnings in the stddr. Not sure if I did something wrong or not. http://setiathome.berkeley.edu/result.php?resultid=2515639344

Guys, it's not possible to answer to anyone personally each time. These warnings/infos were already discussed just few posts before in very this thread. Lets not make thread unmanageable, please, sometimes I need to use mobile network to connect so please keep thread as clean as possible.
EDIT: reference: http://setiathome.berkeley.edu/forum_thread.php?id=68675&nowrap=true#1256930

Profile Paul D Harris
Volunteer tester
Send message
Joined: 1 Dec 99
Posts: 1123
Credit: 33,598,472
RAC: 0
United States
Message 1257725 - Posted: 8 Jul 2012, 22:25:52 UTC

So leave <cmdline>unroll=2, ffa_block=1024, ffa_block_fetch=512</cmdline>
as
<cmdline></cmdline>
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,041,393
RAC: 34,343
Russia
Message 1257727 - Posted: 8 Jul 2012, 22:28:16 UTC - in response to Message 1257725.

So leave <cmdline>unroll=2, ffa_block=1024, ffa_block_fetch=512</cmdline>
as
<cmdline></cmdline>

yes.

Profile X-Files 27
Avatar
Send message
Joined: 17 May 99
Posts: 100
Credit: 107,862,964
RAC: 0
Canada
Message 1257920 - Posted: 9 Jul 2012, 5:51:36 UTC

With or without <cmdline></cmdline> in app_info doesnt pickup the params from ap_cmdline.txt

Did I missed something?
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,041,393
RAC: 34,343
Russia
Message 1257948 - Posted: 9 Jul 2012, 7:10:48 UTC - in response to Message 1257920.
Last modified: 9 Jul 2012, 7:19:43 UTC

With or without <cmdline></cmdline> in app_info doesnt pickup the params from ap_cmdline.txt

Did I missed something?

App will always pick up params from both places. If duplicated params occur one that last met will be used. ap_cmdline.txt processed latest so params there should overwrite params (with the same name) from <cmdline> tag. Different params just will be added from both sources.
Example of usage: let say one have both NV and ATi GPUs in single host. Then one can set common part of params in ap_cmdline.txt and tune params that should differ for those GPUs via <smdline> tag.
Another example originally was given by Mike: if you want to change params w/o BOINC restart - just edit ap_cmdline.txt file. On next launch new params will be used. app_info editing requires BOINC restart.

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,041,393
RAC: 34,343
Russia
Message 1257949 - Posted: 9 Jul 2012, 7:11:40 UTC

I made 2 posts about current situation with driver support for OpenCL on both vendors forums recently:
http://devgurus.amd.com/thread/159432
http://developer.nvidia.com/devforum/discussion/10636/feature-request-to-add-synchronization-mode-tuning-via-nv-specific-opencl-extension

If you have something to say on topic or explain why this important for users, please do post in corresponding threads.

Kamu
Send message
Joined: 19 Jan 02
Posts: 56
Credit: 9,810,425
RAC: 0
Finland
Message 1257970 - Posted: 9 Jul 2012, 8:49:36 UTC

ETA for linux version or is this windows only?

-Kimmo-

____________
Computers: obelix

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,041,393
RAC: 34,343
Russia
Message 1257992 - Posted: 9 Jul 2012, 10:14:57 UTC - in response to Message 1257970.
Last modified: 9 Jul 2012, 10:17:13 UTC

ETA for linux version or is this windows only?

-Kimmo-


Our Linux guy busy with some windows issue now + he working on multiple CPU versions for different *nix based OSes at once... I'm afraid no ETA at all.

The solution either to port to Linux by yourself or to attract someone who can do this to all community benefit [sources are available for free at Berkeley's SETI repository already: https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt]

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 304
Credit: 128,616,723
RAC: 221,707
Australia
Message 1258013 - Posted: 9 Jul 2012, 12:08:11 UTC
Last modified: 9 Jul 2012, 12:35:58 UTC

I'd like to see more Linux support too, Windows being the only option has made it necessary for me to move some hosts back to it for GPGPU (Linux MB is only supported for x64 CPUs). But I understand that Windows has the biggest user base and vendor driver support so it makes sense to focus efforts there.

Anyway, thank you again for your awesome work, Raistmer. Just in case it helps with your testing/profiling, I have your new applications running on the following GPUs (seemingly without problem so far, but will have to wait for validation for full confirmation):


  • HD 6950
  • HD 5760
  • HD 4670 (AGP!)
  • AMD Fusion C-50 (r1305 doesn't seem to make much difference, so using r1316 like the others)



  • GTX 670
  • GTX 570
  • GTX 260 Core 216


Edit: Forgot to mention - there does appear to be a performance increase for ATI application - GPU usage is closer to 100% and work-units appear to be completed in a shorter time. I haven't used any NV version before, so there's no point of reference for comparison.
____________
Soli Deo Gloria

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7018
Credit: 59,131,626
RAC: 20,248
Germany
Message 1258036 - Posted: 9 Jul 2012, 13:51:18 UTC
Last modified: 9 Jul 2012, 13:56:20 UTC

That's the entry of my app_info.xml file:
<cmdline>-instances_per_device 2 -hp -unroll 10 -ffa_block 8192 -ffa_block_fetch 4096</cmdline>

- - - - - - - - - -
Just for info others:
The *-instances_per_device* isn't used from there only from:

<coproc> <type>CUDA</type> <count>0.5</count> </coproc>

...but only so is shown the related usage in the <stderr_txt>:
(...)
Number of app instances per device set to:2
(...)

...so I like it. ;-D
- - - - - - - - - -

On my machine...
Each application/work unit use ~ 23 MB average system RAM (~ 36 MB peak), so ~ 46 MB both.
Each application/work unit use ~ 250 MB VRAM, so ~ 500 MB both.
...if the work units are *fine*.

From my experiences if the work unit is/will be a 30/30 result the system RAM usage increase and increase every second ~ 1 MB, I saw an average (peak) usage at ~ 250 MB.

I saw something strange.
Normally every 0.9 % progress a checkpoint is saved.
This resultid=2515482815 did this not.
It was calculated 9 minutes : 11 seconds, 6.624 % progress, BoincTasks showed no checkpoint yet, then BOINC suspended this and an other AstroPulse work unit for to let run SETI@home Enhanced work units with earlier deadline.
After the bunch of shorties was calculated the above mentioned AstroPulse work units was restarted.
The one without checkpoint started at 0 calculated time and 0 % progress (from scratch).

Why happened this?

My <cmdline> settings are too *high*?

Or this happen from time to time?


Thanks


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR



>Das Deutsche Cafe. The German Cafe.<

Profile Mike
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23310
Credit: 31,660,198
RAC: 23,882
Germany
Message 1258039 - Posted: 9 Jul 2012, 14:00:57 UTC
Last modified: 9 Jul 2012, 14:01:36 UTC

Your GPU did work with these settings with r1305.

So please wait a few days i will check it if it gets validated or not.
We should wait til a few units are completed and not getting confused after one results IMHO.
____________

Wembley
Volunteer tester
Avatar
Send message
Joined: 16 Sep 09
Posts: 415
Credit: 888,257
RAC: 0
United States
Message 1258065 - Posted: 9 Jul 2012, 15:56:04 UTC

This version doesn't seem to checkpoint very often. I watched it for awhile and it checkpointed about 15 minutes apart. Seems rather long compared to the 2-3 minutes for other apps.
____________


Donate with your searches and online buys:
http://www.goodsearch.com/toolbar/university-of-california-setihome

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7018
Credit: 59,131,626
RAC: 20,248
Germany
Message 1258171 - Posted: 9 Jul 2012, 20:20:30 UTC - in response to Message 1258065.
Last modified: 9 Jul 2012, 20:23:05 UTC

This application make every 0.9 % progress a checkpoint - it *should*.

I could see on my system that this isn't always the case, the application don't made a checkpoint after 6.624 % progress (normally 7 checkpoints done)... - so I'm curious about why this could happen...
What was the reason, my settings, the work unit, my machine, ...?


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR



>Das Deutsche Cafe. The German Cafe.<

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3232
Credit: 31,585,541
RAC: 0
Netherlands
Message 1258568 - Posted: 10 Jul 2012, 15:46:52 UTC - in response to Message 1258171.
Last modified: 10 Jul 2012, 15:50:48 UTC

This application make every 0.9 % progress a checkpoint - it *should*.

I could see on my system that this isn't always the case, the application don't made a checkpoint after 6.624 % progress (normally 7 checkpoints done)... - so I'm curious about why this could happen...
What was the reason, my settings, the work unit, my machine, ...?


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *


Downloaded AstroPulse rev.1316 and installed and replaced it taking the risk
of a running AP task. Which worked out OK though.
Host
ATI GPUs.


Using the same <command-line-parameters: UNROLL=15 ffa_block 10240 & ffa_
block_fetch 5120, 1 WU per GPU
. GPU-load is higher.
Looks like it's faster, % updates similar but faster, will watch result.

Is the FLOPS_Entry necessary, I used none and estimates have become
quite precise, after > 100 valid results in a row.
____________


Knight Who Says Ni N!, OUT numbered.................

Profile Mike
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23310
Credit: 31,660,198
RAC: 23,882
Germany
Message 1258579 - Posted: 10 Jul 2012, 22:28:56 UTC

No Fred, you dont need the flops entry if your estimates are in order.
And yes r1316 is faster on high end GPUs.


____________

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2564
Credit: 5,856,576
RAC: 2,320
Bulgaria
Message 1258692 - Posted: 11 Jul 2012, 3:05:05 UTC


On ATI AMD Radeon HD 6570 / Catalyst 11.12 / Win XP
r1316 is 1.3x times faster than r555 and uses less CPU and more GPU %:

Quick timetable WU : ap_Zblank_9LC67.wu AP6_win_x86_SSE2_OpenCL_ATI_r555.exe -verbose -unroll 3 : Elapsed 989.422 secs CPU 250.969 secs AP6_win_x86_SSE2_OpenCL_ATI_r1316.exe -verbose -unroll 3 : Elapsed 756.250 secs, speedup: 23.57% ratio: 1.31x CPU 63.156 secs, speedup: 74.84% ratio: 3.97x





____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 304
Credit: 128,616,723
RAC: 221,707
Australia
Message 1258896 - Posted: 11 Jul 2012, 13:05:55 UTC - in response to Message 1258013.
Last modified: 11 Jul 2012, 13:07:17 UTC


  • HD 6950
  • HD 5760
  • HD 4670 (AGP!)
  • AMD Fusion C-50 (r1305 doesn't seem to make much difference, so using r1316 like the others)


  • GTX 670
  • GTX 570
  • GTX 260 Core 216


Just some feedback on performance observed so far - most still waiting on validation, but no invalid or results in error so far.

AMD Radeon
Comparing between WUs with similar blanking percentages, the HD 6950 uses about 35% less CPU time and overall run-time reduced by 30%. That's impressive.

On the HD 5760, CPU time went down by about 75% (wow) and overall run-time decreased by about 25%. Another big win, I think.

Haven't gone enough WUs on the HD 4670 AGP to make any conclusions yet - there was a lot of high-blanking percentage WUs that got dealt to my hosts over the last few days, slowing down processing. But at this point, r1316 seems to take nearly twice as long in terms of overall run-time. I'm guessing the 'patching required max_kernel_wg_size=32 for binary cached kernels' error message is a bad thing - with the smallest work-group size of all my GPUs of 128, the HD 4670 is probably most sensitive to memory constraints. I'll try playing around with some parameters and if nothing helps, fall back to r1305 and see if that results in any better performance.

Again, not enough WUs processed on the slower GPUs to make any conclusions yet, but performance on the C-50 still seems to be showing an improvement. CPU and overall run-time down by about 15%.

NVIDIA GeForce
Not having participated in the beta trials, I have no point of comparison for my CUDA cards. But here's what I observed so far:

Overall run-times for Fermi and Kepler, GTX 570 and 670, look good - perhaps on par with their Radeon counter-parts - however, the CPU time seems to be a significant proportion of that. I understand that this is a known driver issue - I'm currently running 301.42. I remember a similar problem occurring for the Catalyst drivers - here's hoping for a prompt fix from NVIDIA.

Interestingly enough, the pre-Fermi code path doesn't seem to be affected (as much) by the high CPU use issue - unless it happens to substantially favour AMD K10 over Intel Core and AMD K8 (I doubt it). On my GTX 260 Core 216, CPU usage is only of the order of a few minutes as opposed to several dozens of minutes - a tiny fraction of the overall run-time. Will have to check that the WUs validate properly in case there's something funny going on there.

That's all for now - hope this feedback offers a few helpful pieces in the overall performance picture. Now, just have to wait for a chance to download some more WUs...
____________
Soli Deo Gloria

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,041,393
RAC: 34,343
Russia
Message 1259000 - Posted: 11 Jul 2012, 17:32:43 UTC - in response to Message 1258896.
Last modified: 11 Jul 2012, 17:33:01 UTC

Thanks for summary. It's very interesting to know that even with newest drivers your NV 260 has low CPU usage. What is so specific in that host - very interesting to find...

Regarding AGP HD4670: if you would agree to perticipate in some offline testing I would send some test case to you to check.

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 304
Credit: 128,616,723
RAC: 221,707
Australia
Message 1259081 - Posted: 11 Jul 2012, 19:38:33 UTC

Happy to help, glad to know it's useful in some small way.

There's nothing special about it compared with my other hosts, as far as I know, it's just a box I set up for my brother. It's still WinXP32, it's mixed ATI/NV as with the others, it's... oh, it's not running any ATI WUs at the same time... I wouldn't have thought that would make any difference though. Well, as I said, will have to wait till results validate. Have there been any testers with pre-Fermi cards experiencing the same or different performance?

I haven't tried off-line testing before, but I suppose I could give it a try especially given the current work-unit shortage - my GPU machines are running out of work... but it'll have to wait till at least 06:30 UTC when I get back from work.
____________
Soli Deo Gloria

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

Message boards : Number crunching : New AstroPulse for GPU ( ATi & NV) released (r1316)

Copyright © 2014 University of California