Public beta for nVidia AstroPulse, rev 521

Message boards : Number crunching : Public beta for nVidia AstroPulse, rev 521
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 30 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1129051 - Posted: 18 Jul 2011, 2:30:29 UTC
Last modified: 18 Jul 2011, 2:33:26 UTC

Thanks. Is it because of the platform and plan class that makes the scheduler ask specifically for AP GPU WU's? The regular AK_V8 AP section asks for AP CPU work? Just trying to get a handle on how the BOINC Manager decides who gets to crunch the data. I have 4 CPU AP WU's already on board and don't want to upset the apple cart just yet. May take a while to get some work as the project seems to be unreachable currently and no work built up from the splitters. Hope that once the floodgates open on Wednesday that I'll get a chance to try out the new application.


Cheer, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1129051 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1129066 - Posted: 18 Jul 2011, 3:39:51 UTC

When someone finds the correct setup for a GTX 470 can you please post here or pm me. I hate the idea of trashing good work

Thanks
ID: 1129066 · Report as offensive
halfempty
Avatar

Send message
Joined: 2 Jun 99
Posts: 97
Credit: 35,236,901
RAC: 114
United States
Message 1129067 - Posted: 18 Jul 2011, 3:43:53 UTC - in response to Message 1129014.  

I thought about what you said about unroll set to 8 so I decided to try it. The AP task still ran about the same but I noticed the MB task running with it on my GTS 450 suddenly slowed way down. When I checked it's time to completion guesstimate was double what it had been. I stopped and went back up to 10 and now the MB task is running much faster again.

Thanks for that info. From a previous release of the OpenCL AP app I thought I remembered that it being a multiple of something on the card was the most efficient. Searched through and tracked down a post mentioning it. http://setiathome.berkeley.edu/forum_thread.php?id=62738&nowrap=true#1068112
I only run 1 wu at a time so I won't see the degradation you saw, but I'll have to play around to check if it makes much difference on this card.

On my winter card, HD5850, I read the number of compute units from stdout and halved it for my unroll. Was doing unblanked AP in about 68 minutes.
ID: 1129067 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1129111 - Posted: 18 Jul 2011, 8:59:32 UTC
Last modified: 18 Jul 2011, 9:02:48 UTC

From maximising GPU compute units load unroll factor should be multiple of CUs of GPU. But sometimes it can lead to weird memory layout that will cause more slowdown (not all memory channels are in use at specific strides). Then performance hit will be bigger than benefit. So some experimentation (or strong understanding of GPU architecture for particualr GPU in use) is required with unroll factor.

For now I have statistic for HD6950 - unroll of 16 only slightly better (on real world tasks) than unroll of 6. For GSO9600
I have no statistics. Only know that unroll of 12 caused invalid overflows whil unroll of 10 works OK.
It takes lot of time to collect reliable data set...
ID: 1129111 · Report as offensive
CryptokiD
Avatar

Send message
Joined: 2 Dec 00
Posts: 150
Credit: 3,216,632
RAC: 0
United States
Message 1129112 - Posted: 18 Jul 2011, 9:02:12 UTC

i have heard somwhere that running astropulse does not give as high of a rac as multibeam does for cpu. does anyone know if this is true for nvidia gpu as well?

ID: 1129112 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1129115 - Posted: 18 Jul 2011, 9:14:19 UTC - in response to Message 1129066.  

When someone finds the correct setup for a GTX 470 can you please post here or pm me. I hate the idea of trashing good work

Thanks


A post here would be good.


Kevin


ID: 1129115 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1129116 - Posted: 18 Jul 2011, 9:16:57 UTC - in response to Message 1129112.  

i have heard somwhere that running astropulse does not give as high of a rac as multibeam does for cpu. does anyone know if this is true for nvidia gpu as well?



I can almost double my RAC running astropulse only on my HD 5850.
But its hard to get enough work atm.



With each crime and every kindness we birth our future.
ID: 1129116 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1129121 - Posted: 18 Jul 2011, 9:55:42 UTC - in response to Message 1129112.  
Last modified: 18 Jul 2011, 9:57:48 UTC

i have heard somwhere that running astropulse does not give as high of a rac as multibeam does for cpu. does anyone know if this is true for nvidia gpu as well?


That is probably true on a lot of AMD CPU's, since the CPU Optimised AP app is quite cache intensive, Intel CPU's should still give a similar amount of Credit/Sec on MB and AP Wu's (subject to what NewCredit gives out)

On ATI GPU's the ATI OpenCL AP app is a lot faster than the MB app, my HD5770 can only do 3 Normal MB Wu's in the time a 0% Blanked AP Wu takes,

On Nvidia GPU's, it's probably quite close, my GTX460 takes 50 minutes per 0% Blanked AP task, a Normal AR MB Wu takes ~10 minutes, so 5 MB Wu's at 120 Credits makes ~600Credits,
and that AP Wu should make 600 to 800 Credits (after doing 10 validations first, and subject to NewCredit)

Claggy
ID: 1129121 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1129125 - Posted: 18 Jul 2011, 10:23:22 UTC

On the other end of the spectrum (i.e. very low end), I've been running

<cmdline>-ffa_block 2048 -ffa_block_fetch 1024 -unroll 4</cmdline>

I can probably go somewhat higher, but if tasks take some 16 hours on a host running about 8 hours a day for 5 days a week, any testing takes a lot of patience ;)
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1129125 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1129138 - Posted: 18 Jul 2011, 11:40:58 UTC

Have it installed it with your given app-info infos ... only change: have delete the switch -hp. I replaced the whole ap cpu section with the GPU version.

Till now, these are my tasks:
http://setiathome.berkeley.edu/result.php?resultid=2000852972
http://setiathome.berkeley.edu/result.php?resultid=2000647981
http://setiathome.berkeley.edu/result.php?resultid=2000647979
http://setiathome.berkeley.edu/result.php?resultid=2000647974

2) 2 hosts already reported greatly increased CPU consumption when running with 27x.xx drivers.

yes, indeed, have high cpu usage, with 0 percent blanked about 98 % of an HT-Core. Its less, if more is blanked (was about 85 % cpu usage at the 4.x blanked).

But untill now to less time (or WUs) to do more testing with the parameters.
I personally would know, what happens on my system, when no cpu-lock is active.

Greetings and thanks again for the great work.

Chris
- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1129138 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1129140 - Posted: 18 Jul 2011, 11:49:56 UTC - in response to Message 1129138.  

Have it installed it with your given app-info infos ... only change: have delete the switch -hp. I replaced the whole ap cpu section with the GPU version.

Till now, these are my tasks:
http://setiathome.berkeley.edu/result.php?resultid=2000852972
http://setiathome.berkeley.edu/result.php?resultid=2000647981
http://setiathome.berkeley.edu/result.php?resultid=2000647979
http://setiathome.berkeley.edu/result.php?resultid=2000647974

2) 2 hosts already reported greatly increased CPU consumption when running with 27x.xx drivers.

yes, indeed, have high cpu usage, with 0 percent blanked about 98 % of an HT-Core. Its less, if more is blanked (was about 85 % cpu usage at the 4.x blanked).

But untill now to less time (or WUs) to do more testing with the parameters.
I personally would know, what happens on my system, when no cpu-lock is active.

Greetings and thanks again for the great work.

Chris

Yeah, too high CPU consumption with 275.xx driver. Could you downgrade to 267.xx and check CPU usage there?

W/o CPUlock there will be no affinity setted and different GPU taksk would compete for the same CPU.
ID: 1129140 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1129257 - Posted: 18 Jul 2011, 15:38:52 UTC

Finished first AP http://setiathome.berkeley.edu/result.php?resultid=2000735263 and already validated with 656.27 CR.

It took 56m16s, not bad I think. It used 13% CPU, seems much, due to using 275.33 drivers?
ID: 1129257 · Report as offensive
Jamie
Volunteer tester

Send message
Joined: 5 Apr 06
Posts: 162
Credit: 9,867,955
RAC: 0
United Kingdom
Message 1129260 - Posted: 18 Jul 2011, 15:43:29 UTC - in response to Message 1129257.  

Possible, there does seem to be a few more people seeing the high CPU usage, try the 267.xx drivers if you get chance.
I'm running them and see 3-6% CPU usage
ID: 1129260 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1129276 - Posted: 18 Jul 2011, 16:16:59 UTC - in response to Message 1129260.  

Just slightly off topic.
Is there anyone with both nVidia and ATI Astropulse or is that impossible.

TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1129276 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1129277 - Posted: 18 Jul 2011, 16:17:18 UTC - in response to Message 1129257.  

Finished first AP http://setiathome.berkeley.edu/result.php?resultid=2000735263 and already validated with 656.27 CR.

It took 56m16s, not bad I think. It used 13% CPU, seems much, due to using 275.33 drivers?


I would say it uses all CPU (logical CPU, not whole CPUs in your system). Elapsed time almost equal CPU time...
Время выполнения 3,376.00
Время ЦП 3,372.34

And yes, most probably it's because of 275.xx drivers... downgrade needed.
ID: 1129277 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1129279 - Posted: 18 Jul 2011, 16:18:07 UTC - in response to Message 1129276.  

Just slightly off topic.
Is there anyone with both nVidia and ATI Astropulse or is that impossible.

Claggy and Ghost do this from early alpha. Yes, it's possible. See second (Claggy's ) post in this thread how to properly configure both apps.
ID: 1129279 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1129324 - Posted: 18 Jul 2011, 17:26:26 UTC
Last modified: 18 Jul 2011, 17:27:12 UTC

So here is one with 266.58 driver version (have some problems/issues with 267.x):

http://setiathome.berkeley.edu/result.php?resultid=2001207330

CPU usage is about a third of one HT Core, so at this front much better.
But im switching back to 275.33 cause the x38g MB runs _much_ better with that version.

FYI (if it is of use at all?):

- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1129324 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1129358 - Posted: 18 Jul 2011, 18:32:46 UTC - in response to Message 1129324.  


But im switching back to 275.33 cause the x38g MB runs _much_ better with that version.

What you mean by this ? Faster, smoother? No lags? Little CPU consumption? What is better ?
ID: 1129358 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1129380 - Posted: 18 Jul 2011, 19:15:04 UTC

2nd AP finished http://setiathome.berkeley.edu/result.php?resultid=2000758397 and was validated against a ATI wingman which was faster, only 284.76 CR though :)

Used 267.24 drivers and it took 45m22s. It used 0-7% CPU.
ID: 1129380 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1129386 - Posted: 18 Jul 2011, 19:23:52 UTC - in response to Message 1129358.  

What you mean by this ? Faster, smoother? No lags? Little CPU consumption? What is better ?


Till now, i had no issues with lags or similar effects on my comps with the drivers and app-versions i had chosen (knock, knock, knock).

Its only the speed, which is imo much better. As reference, i take my phenom computer with xp and a stock clocked GTX 260 216 -> middle execution times of 12-13 minutes per MB WU. With the new x38g, the times changed only slightly to the better (only seconds)

on my i7 with W7 64 bit and GTX 460 OC, the times changed from about 12-15 minutes / MB WU (with x32f and 266.58) to 9-11 minutes with x38g and 275.33.

Think, thats a huge step forward, and the first time, that the GTX 460 is faster than my GTX260. I know, that the W7 Driver-Model is a burdon, and i'm now happy that the new driver/app/GPU/OS combination can show it's potential.

And if the nV AP rev 521 uses more cpu time at the moment, i can live with that.
Imho the AP WUs are so rare that the additional cpu time dont count up that much. And till now i havnt played with the various setting of the AP app. But one step after another, i have time :-)


- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1129386 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 30 · Next

Message boards : Number crunching : Public beta for nVidia AstroPulse, rev 521


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.