Public beta for nVidia AstroPulse, rev 521

Message boards : Number crunching : Public beta for nVidia AstroPulse, rev 521
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 30 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1129433 - Posted: 18 Jul 2011, 20:42:09 UTC - in response to Message 1129386.  
Last modified: 18 Jul 2011, 20:42:33 UTC

You didn't list x38g + 26x.xx drivers time. There were improvements from x32f so listed comparison is incorrect.
ID: 1129433 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1129514 - Posted: 18 Jul 2011, 23:30:57 UTC

3rd AP only lasted a few secs but was validated http://setiathome.berkeley.edu/result.php?resultid=2001592219

4th AP http://setiathome.berkeley.edu/result.php?resultid=2001584832

Took quite a longer time than the 2 first, this time I also ran a MB on GPU and 4 CPU MBs, not sure if that was why it took longer.
ID: 1129514 · Report as offensive
Mass
Volunteer tester

Send message
Joined: 12 Nov 03
Posts: 4
Credit: 958,904
RAC: 0
United States
Message 1129561 - Posted: 19 Jul 2011, 2:56:07 UTC

Using a PhenomIIx4 9550@2.35ghz. GTX560ti@ 935-1870-2100 Driver 266.66

WU was doing .901% a tick at 4.79% blanking with gpu usage jumping between 8% to 90%.
I tried using different command line values to try and get a better %/tick but there was no increase or decrease except for Mem usage.

Using <cmdline>-instances_per_device 1 -hp -unroll 8 -ffa_block 6148 -ffa_block_fetch 2048</cmdline> but have tried up to
cmdline>-instances_per_device 1 -hp -unroll 16 -ffa_block 24576 -ffa_block_fetch 12288</cmdline> and a few other values between.

I can run 2 at a time but have to keep unroll at 8 so memory doesnt go too high.It is using 853 with 2WU about 490 with 1.Ruuning the 2 WU uses a little more of the gpu% but still jumps around between 15 and 95%.CPU uses 10 to 40%.

ID: 1129561 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1129597 - Posted: 19 Jul 2011, 7:06:16 UTC - in response to Message 1129433.  

You didn't list x38g + 26x.xx drivers time. There were improvements from x32f so listed comparison is incorrect.


Cant give u that much data about it cause of the shorty storm or none WUs at all the last time. But this is the actual config of my Phenom (... which im actual run dry cause main reason this comp exists is not given at the moment, going to switch it off).

On the i7 this combination was used also to short to give good information (shortys/empty cache).

Sorry

greetings

Chris

- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1129597 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1129607 - Posted: 19 Jul 2011, 7:42:15 UTC

The 560s have 8 compute units so i´d say unroll 4/8/12 are the values to try.

Thread_Block 6148 is wrong should be 6144 to be honest.

I would go with Thread_Block 8192 Thread_Block_Fetch 2048 and unroll 12.





With each crime and every kindness we birth our future.
ID: 1129607 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1129653 - Posted: 19 Jul 2011, 11:48:57 UTC - in response to Message 1129561.  

...
WU was doing .901% a tick at 4.79% blanking with gpu usage jumping between 8% to 90%.
I tried using different command line values to try and get a better %/tick but there was no increase or decrease except for Mem usage.
...

The top level loop for AP processing runs 111 iterations, so you won't change that 1/111 = 0.9009009%/tick. Making it tick faster or slower is of course possible...
                                                                   Joe
ID: 1129653 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1129901 - Posted: 20 Jul 2011, 1:30:09 UTC
Last modified: 20 Jul 2011, 1:31:50 UTC

Mike, if the 460's have 7 SM's, does an unroll of 7/14 make sense for these cards?
I've processed one WU with the 460 at the default of 10, but my wingman hasn't reported in yet. The output looks good as far as I can tell. The odd number of SM units doesn't make for easy integer values for unroll.

Cheers, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1129901 · Report as offensive
halfempty
Avatar

Send message
Joined: 2 Jun 99
Posts: 97
Credit: 35,236,901
RAC: 114
United States
Message 1129954 - Posted: 20 Jul 2011, 6:21:02 UTC

Finally completed my first couple, and one already validated. Good run times out of this little GTS450. Using 266.58, unroll 12, low priority, everything else default.

Task, Run Time, CPU Time, Blanking
2000647746 5,959.56 1,272.56 2.39
2000647748 5,735.61 1,059.23 0

The only problem is a little screen lag. Next I'm going to try Claggy's -ffa_block 6144 -ffa_block_fetch 1536. If that doesn't do it I may try -unroll 8 to see if it helps.
ID: 1129954 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1129965 - Posted: 20 Jul 2011, 7:11:16 UTC
Last modified: 20 Jul 2011, 7:11:52 UTC

First correct validation against one stock client:

http://setiathome.berkeley.edu/workunit.php?wuid=785187633
- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1129965 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1129974 - Posted: 20 Jul 2011, 8:03:04 UTC - in response to Message 1129901.  

Mike, if the 460's have 7 SM's, does an unroll of 7/14 make sense for these cards?
I've processed one WU with the 460 at the default of 10, but my wingman hasn't reported in yet. The output looks good as far as I can tell. The odd number of SM units doesn't make for easy integer values for unroll.

Cheers, Keith


Unroll of 7 definitely to low for 460.
450 can handle unroll 10 perfectly so i´d try between 12 and 14.
Doesn´t have to be integer.

My HD 5850 has 18 CU´s and unroll 12 is just perfect.

Note Thread_Block and Thread_Block_Fetch not shown in stderr.
Maybe you got a typo in appinfo.

For 450 i suggest 6144/1536
460/470/480/560/580 should use 8192/4096

If you get screen lags reduce Block_Fetch to 2048.



With each crime and every kindness we birth our future.
ID: 1129974 · Report as offensive
Profile TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 505
Credit: 69,523,653
RAC: 10
Sweden
Message 1129984 - Posted: 20 Jul 2011, 8:42:03 UTC

I ran 2 tasks on a 250 card....
http://setiathome.berkeley.edu/results.php?hostid=6031403&offset=0&show_names=0&state=0&appid=5

with default settings

//TQ
TRuEQ & TuVaLu
ID: 1129984 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1130010 - Posted: 20 Jul 2011, 12:36:48 UTC

How do I run 1 AP + 2 MBs and 3 MBs when there's no AP available?

Tried .33 on all CUDA counts, thinking that the switch -instances_per_device 1 would insure only 1 AP. But it started 2 APs, so what function do that switch have?
ID: 1130010 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1130016 - Posted: 20 Jul 2011, 12:47:46 UTC - in response to Message 1130010.  
Last modified: 20 Jul 2011, 12:49:51 UTC

0.33 on all CUDA counts, and -instances_per_device 3.
Boinc....Boinc....Boinc....Boinc....
ID: 1130016 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1130019 - Posted: 20 Jul 2011, 13:01:32 UTC - in response to Message 1130010.  
Last modified: 20 Jul 2011, 13:02:51 UTC

How do I run 1 AP + 2 MBs and 3 MBs when there's no AP available?

Tried .33 on all CUDA counts, thinking that the switch -instances_per_device 1 would insure only 1 AP. But it started 2 APs, so what function do that switch have?

Started 3 but running only 1. 2 other will await free "slot".
But this will negatively affect on elapsed time of awaiting tasks (boinc thinks they running). And can lead to -177 errors (timeout).

When MultiBeam NV will available, this option will allow fine tune AP/MB proportion for every GPU. BOINC has no such ability at all. It's part of long time ignored ProjectPairing proposal..
ID: 1130019 · Report as offensive
Darren Wright

Send message
Joined: 15 Jan 00
Posts: 92
Credit: 17,556,032
RAC: 0
United States
Message 1130054 - Posted: 20 Jul 2011, 15:31:53 UTC

Just finished the first unit.

2600 Seconds on a GTX480 vs 60,000 on a Xeon 5130.

http://setiathome.berkeley.edu/results.php?hostid=5955102&offset=0&show_names=0&state=0&appid=5

Hope it validates.

-Darren


[/url]
ID: 1130054 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1130106 - Posted: 20 Jul 2011, 17:02:47 UTC

today i had a new one AP -> http://setiathome.berkeley.edu/result.php?resultid=2003222228
only change to the standard settings was unroll 7 which gave to less GPU-load (around 55 %). One nV driver crash happens at the restart-time of the AP. But i had no obvious problems before (no lags, gpu-z data was all ok). In event log, only msg around this time was "Event ID 4101 — Display Driver Timeout Detection and Recovery" (nvlddmkm).
And after that, the calculation continues without further problems (no downclocking for example).

greetings

Chris
- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1130106 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 1130490 - Posted: 21 Jul 2011, 19:08:14 UTC - in response to Message 1130388.  
Last modified: 21 Jul 2011, 19:09:45 UTC

One day when the sun is shining, and I have tail wind, I will try this app on my tiny ION.



Well, you're Partly in Luck! There's Plenty of Sunshine to be had on this side of the Pond. Starting anywhere on the East Coast to ummm...just about Osaka, or thereabouts! :)

Lt

we'll send you some!
ID: 1130490 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22200
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1130495 - Posted: 21 Jul 2011, 19:18:47 UTC

Yes please - its been quite murky over here for the last couple of weeks...

But we don't want it so hot that it fries our low temperature acclimatised anatomical components...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1130495 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1130506 - Posted: 21 Jul 2011, 19:26:33 UTC

Hmmmm, wonder why this one decided to go inconclusive? http://setiathome.berkeley.edu/workunit.php?wuid=781560387


PROUD MEMBER OF Team Starfire World BOINC
ID: 1130506 · Report as offensive
Darren Wright

Send message
Joined: 15 Jan 00
Posts: 92
Credit: 17,556,032
RAC: 0
United States
Message 1130538 - Posted: 21 Jul 2011, 20:33:53 UTC - in response to Message 1130054.  

Just finished the first unit.

2600 Seconds on a GTX480 vs 60,000 on a Xeon 5130.

http://setiathome.berkeley.edu/results.php?hostid=5955102&offset=0&show_names=0&state=0&appid=5

Hope it validates.

-Darren


[/url]


Validated. Just used the rescheduler to move AP units from CPU to GPU. Cross your fingers.
ID: 1130538 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 30 · Next

Message boards : Number crunching : Public beta for nVidia AstroPulse, rev 521


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.