Collected data of various Cuda s@h executables public/nonpublic (Alpha/Beta)

Message boards : Number crunching : Collected data of various Cuda s@h executables public/nonpublic (Alpha/Beta)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1054598 - Posted: 10 Dec 2010, 11:09:48 UTC
Last modified: 10 Dec 2010, 11:14:13 UTC

Hi

When the servers was down i was bored and made a benchmark with various full length wu's and made some statistics over that benchmark.
The last two pages contains a summary of one executable which wouldn't be considered accurate but other than that. No problem at all.
You even see the benefit in numbers of running more than one wu on that particular card.
Consider this as a base of what i can compare data with in the future when running newer compiled Cuda 3.2 beta exes and other performance enhancements with the Cuda executables.

Head over to my blog to download the file!

http://vyper.kafit.se

Kind regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1054598 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20334
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1054604 - Posted: 10 Dec 2010, 11:31:14 UTC - in response to Message 1054598.  
Last modified: 10 Dec 2010, 11:32:31 UTC

... benchmark with various full length wu's and made some statistics over that benchmark.

[...]

Head over to my blog to download the file!

http://vyper.kafit.se

Very interesting, but some explanation needed...

How do you calculate the efficiency for 2 WUs vs 1 WU?

Is that using hyperthreading with one WU per virtual core? Or 2 WUs on one core?

There's one very obvious 'interesting' result for your 'fermipfForceSerial'... Can you explain further?

I've seen comments about running multiple WUs in parallel on nVidia GPUs... Good idea or bad?


BTW: Looks like I'll be jumping in to get a:

Palit 1GB GeForce GTS 450 Sonic DDR5 NVIDIA Graphics Card

1GB Palit GTS 450 Sonic, 40nm, 3900MHz GDDR5, GPU 880MHz, Shader 1760MHz, 192 Cores, 2x DVI-I/ HDMI


to add to my crunching count... Thanks all for the comments in another thread.


Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1054604 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1054632 - Posted: 10 Dec 2010, 12:48:39 UTC - in response to Message 1054604.  

Very interesting, but some explanation needed...

How do you calculate the efficiency for 2 WUs vs 1 WU?

Is that using hyperthreading with one WU per virtual core? Or 2 WUs on one core?

There's one very obvious 'interesting' result for your 'fermipfForceSerial'... Can you explain further?

I've seen comments about running multiple WUs in parallel on nVidia GPUs... Good idea or bad?


Hi

Well the efficiency is measured by running two exactly same benchmarks on one gpu at the same time!
If you take the total time taken in seconds which of course is higher than running only one single WU and compare that with the dual wu seconds required.

So the formula is as follows:

86400 / 600 seconds = 144 results / day on a 1WU per GPU config..
86400 / 900 seconds = 96 results / multiplied per app on a single GPU config..

In this testbed i had two parallel Wu's = 192 results / day.

The numbers are only example of measurement.
And you see that particular config has a 33.3% increase in throughput RAC wise.
The interesting phenomena with fermipfforceserial is that it suddenly exited when the test result wasn't done and thus that test exited faster than the other ones, but the fermipfforceserial tests in a 2WU configuration is not considered valid at all.
So the point is that with the newer Fermi architecture there is some headroom in each GPU to make it possible and increase the output / day.
In that terms i would consider it a good thing..

Kind regards Vyper



_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1054632 · Report as offensive

Message boards : Number crunching : Collected data of various Cuda s@h executables public/nonpublic (Alpha/Beta)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.