?'s About Crossfire X: Twice As Good Or Wasted Efforts?

Message boards : Number crunching : ?'s About Crossfire X: Twice As Good Or Wasted Efforts?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile SilentObserver64
Volunteer tester
Avatar

Send message
Joined: 21 Sep 05
Posts: 139
Credit: 680,037
RAC: 0
United States
Message 1173236 - Posted: 23 Nov 2011, 15:05:37 UTC
Last modified: 23 Nov 2011, 15:38:31 UTC

I know this has probably been covered, blogged, and asked a million times, but with all the hardware and software changes that have been made to the GPU's, BOINC clients, and modified clients, in the recent months, I just wanted to get a fresh take on the subject. I'm hoping I can get some insight on this, possibly something I don't already know, however, I am going to pretend I know nothing, in hopes of learning something new. Now granted I have not run my scream machine (with BOINC) that I built a couple months ago in about a month, because of some projects I have been working on, but I have recently installed a second video card. I am now running two ATI Radeon HD 6770's in a Crossfire X setup. My first guess is that SETI@Home and Beta will see these as two seperate cards, and theoretically run two Astro WU's (Also assuming I start crunching WU's again on my scream machine). Is there any way to set them to run the same WU, so that it crunches twice as fast? If yes or even no, should I set a percentage of a core, all of a single core, or 2 cores to run this dual vid card setup? (I'm running an AMD Phenom II x6 @ 3.7+GHz currently, with 14Gigs DDR3 (Unganged) RAM @ 1537MHz I believe. Doing this from memory since I am currently at work.) Does anyone have a Crossfire X profile mod, or know the coding for the SETI appinfo that I would need to maybe make it work as one unit, assuming that this is even possible? What are the pro's and con's of running a dual vid card setup with SETI@Home and Beta? What project may best benefit from a dual card setup, or any that may utilize a Crossfire X setup? I may possibly have more questions, but this is enough to get started with. Thanks in advance for anyone who helps with this issue.

http://www.goodsearch.com/nonprofit/university-of-california-setihome.aspx
ID: 1173236 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1173273 - Posted: 23 Nov 2011, 17:24:57 UTC - in response to Message 1173236.  

As a gamer I don't see the point of cross fire or SLI. There are no games that require anything close to crossfire capabilities. My 6970 plays any game with a very adequate fps so I think the multi GPU idea can only be towards crunching since double GPU is pointless when you can just buy a newer better GPU


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1173273 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1173276 - Posted: 23 Nov 2011, 17:36:42 UTC - in response to Message 1173236.  

IIRC to make full use of your 6770's you should be running 2, or maybe more, tasks at a time per card. However that could just be for MB tasks.

So trying to get both cards to run 1 task would mean each card would be even less used than just running 1 at a time per card.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1173276 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1173277 - Posted: 23 Nov 2011, 17:39:53 UTC - in response to Message 1173236.  

My first guess is that SETI@Home and Beta will see these as two seperate cards, and theoretically run two Astro WU's (Also assuming I start crunching WU's again on my scream machine).

Yes, Boinc will see 2 GPUs and will try to run one WU in each GPU.
(Warning: I dont know about ATI Crossfire, but with Nvidia when you enable SLI on the drivers it tend to fail at crunching or the system gets wonky... Its Ok to have the boards linked by the SLI bridge (if any), but SLI should be disabled to let CUDA/OpenCL apps to run smoothly)


Is there any way to set them to run the same WU, so that it crunches twice as fast?
Doing this from memory since I am currently at work.) Does anyone have a Crossfire X profile mod, or know the coding for the SETI appinfo that I would need to maybe make it work as one unit, assuming that this is even possible? What are the pro's and con's of running a dual vid card setup with SETI@Home and Beta? What project may best benefit from a dual card setup, or any that may utilize a Crossfire X setup? I may possibly have more questions, but this is enough to get started with. Thanks in advance for anyone who helps with this issue.

AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves...


If yes or even no, should I set a percentage of a core, all of a single core, or 2 cores to run this dual vid card setup? (I'm running an AMD Phenom II x6 @ 3.7+GHz currently, with 14Gigs DDR3 (Unganged) RAM @ 1537MHz I believe.

Having at least a free core to feed the GPUs is desiderable to get the better performance, but as the OSs are multitasking/multithreaded if you dont leave any free core then they will be shared among all running apps (wich in some cases could make everything slower)


ID: 1173277 · Report as offensive
Treasurer

Send message
Joined: 13 Dec 05
Posts: 109
Credit: 1,569,762
RAC: 0
Germany
Message 1173311 - Posted: 23 Nov 2011, 20:19:58 UTC - in response to Message 1173277.  

Its Ok to have the boards linked by the SLI bridge (if any), but SLI should be disabled


So its possible to activate the second card this way without using 2nd monitor or dummy plug?
ID: 1173311 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1173313 - Posted: 23 Nov 2011, 20:22:17 UTC - in response to Message 1173311.  

IIRC that was changed to allow access to the card in a driver update from nVidia


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1173313 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1173348 - Posted: 23 Nov 2011, 22:26:10 UTC - in response to Message 1173273.  

As a gamer I don't see the point of cross fire or SLI. There are no games that require anything close to crossfire capabilities. My 6970 plays any game with a very adequate fps...


Indeed, it is pointless other than for futile "bragging rights" to go for a fps that is faster than your screen refresh rate! You just cannot see that fast...

I have all my graphics cards locked to refresh no faster than the vsync rate to avoid wasted effort and also to ensure there are no refresh aliasing effects.


Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1173348 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1173355 - Posted: 23 Nov 2011, 22:38:21 UTC - in response to Message 1173311.  

Its Ok to have the boards linked by the SLI bridge (if any), but SLI should be disabled


So its possible to activate the second card this way without using 2nd monitor or dummy plug?


I dont know (not even knew that some GPUs required to be conected to something to work...)

I have one box with 2 GPUs linked by the SLI bridge and only one monitor, and both GPUs are ussed by BOINC.
On another box I have a GPU not plugged to anything and the monitor on the onboard (not CUDA capable) video card and it also works...
ID: 1173355 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1173381 - Posted: 24 Nov 2011, 0:50:34 UTC - in response to Message 1173355.  

Its Ok to have the boards linked by the SLI bridge (if any), but SLI should be disabled


So its possible to activate the second card this way without using 2nd monitor or dummy plug?


I dont know (not even knew that some GPUs required to be conected to something to work...)

I have one box with 2 GPUs linked by the SLI bridge and only one monitor, and both GPUs are ussed by BOINC.
On another box I have a GPU not plugged to anything and the monitor on the onboard (not CUDA capable) video card and it also works...


I think one of the other solutions was to extend the desktop to the other monitor. When I do that everything looks the same on my primary monitor & the 2nd one is blank until I drag stuff over to that portion of the desktop.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1173381 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1173640 - Posted: 25 Nov 2011, 10:00:07 UTC - in response to Message 1173277.  

AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves..

Running 2 WU's is not twice as fast as running one, as each WU takes twice as long to complete. e.g. if a unit takes 5 minutes to complete running by itself, when running 2 units each takes 10 minutes.

You still complete the same number of units in a hour, in the above example 12 units but there is no speed gain. Some may claim otherwise but I have made some pretty close observations on my GTX470 and GTX580 machines and if there is any difference, the gain only in the order of a few seconds per hour.

The main advantage of running multiple units is to make better usage of the video card's memory. Afterall, if your card has a Gig + of memory it's a shame to only use 250MB of it.

T.A.
ID: 1173640 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1173641 - Posted: 25 Nov 2011, 10:30:48 UTC - in response to Message 1173640.  
Last modified: 25 Nov 2011, 10:31:32 UTC

AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves..

Running 2 WU's is not twice as fast as running one, as each WU takes twice as long to complete. e.g. if a unit takes 5 minutes to complete running by itself, when running 2 units each takes 10 minutes.

You still complete the same number of units in a hour, in the above example 12 units but there is no speed gain. Some may claim otherwise but I have made some pretty close observations on my GTX470 and GTX580 machines and if there is any difference, the gain only in the order of a few seconds per hour.

The main advantage of running multiple units is to make better usage of the video card's memory. Afterall, if your card has a Gig + of memory it's a shame to only use 250MB of it.

T.A.


No thats wrong.

On some cards you definetly increase output while running 2 or 3 units at a time.

On my HD 5850 its around 30%.


With each crime and every kindness we birth our future.
ID: 1173641 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1173643 - Posted: 25 Nov 2011, 10:59:48 UTC - in response to Message 1173640.  

AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves..

Running 2 WU's is not twice as fast as running one, as each WU takes twice as long to complete. e.g. if a unit takes 5 minutes to complete running by itself, when running 2 units each takes 10 minutes.

You still complete the same number of units in a hour, in the above example 12 units but there is no speed gain. Some may claim otherwise but I have made some pretty close observations on my GTX470 and GTX580 machines and if there is any difference, the gain only in the order of a few seconds per hour.

The main advantage of running multiple units is to make better usage of the video card's memory. Afterall, if your card has a Gig + of memory it's a shame to only use 250MB of it.

T.A.


Jason posted this. He should know, shouldn't he? I admit it's a tad old, and not the current app, but I don't think the fundamentals have changed.
ID: 1173643 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1173648 - Posted: 25 Nov 2011, 11:55:21 UTC - in response to Message 1173640.  

AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves..

Running 2 WU's is not twice as fast as running one, as each WU takes twice as long to complete. e.g. if a unit takes 5 minutes to complete running by itself, when running 2 units each takes 10 minutes.

You still complete the same number of units in a hour, in the above example 12 units but there is no speed gain. Some may claim otherwise but I have made some pretty close observations on my GTX470 and GTX580 machines and if there is any difference, the gain only in the order of a few seconds per hour.

The main advantage of running multiple units is to make better usage of the video card's memory. Afterall, if your card has a Gig + of memory it's a shame to only use 250MB of it.

T.A.


Thats not what Ive said... I was trying to say something more obvious: running 2 WUs in 2 GPUs is twice faster than running one WU in 1 GPU, the point was that he wanted to crunch just 1 WU using both GPUs concurrently to do it twice faster which is not possible and also pointless because he is already doing it twice faster... (Ok... I know all this paragraph is kinda confusing :D )

Besides that, and as others have said, on some cards (mainly all fermis and some ATIs which I dont know), if they have enough memory, you get more throughput running 2 (or even more) WUs than just running one. Off course its not twice as fast but on those cards its definately faster.

Ive also made a very extensive research (using Einstein's WUs that are much more predictable on crunching times than the Seti MBs) and on my 560Ti running just one takes around 40/45 mins while running 2 the time is around 1 hour (so 30 mins each).

ID: 1173648 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1173651 - Posted: 25 Nov 2011, 13:17:50 UTC - in response to Message 1173648.  

...
Ive also made a very extensive research (using Einstein's WUs that are much more predictable on crunching times than the Seti MBs) and on my 560Ti running just one takes around 40/45 mins while running 2 the time is around 1 hour (so 30 mins each).

The Einstein CUDA science app does, consciously and deliberately, still do a higher proportion of its work on the host CPU - up to 20% in the worst case.

So, there will be times during the run when the GPU stream processors are idle. With fast-switching hardware (Fermi-class GPUs), the second task can efficiently sneak in and use those spare cycles - that will be the major part of the benefit at Einstein.

Here at SETI, the CUDA shaders are pretty fully occupied during the bulk of the run. The apps we were using a year ago had a long CPU-only startup phase, during which the GPU was effectively idle - so the same 'sneaking in a second task while the first isn't looking' trick worked here, almost as well as at Einstein.

The newer optimised CUDA applications have substantially reduced the length of that CPU-only startup phase. I wonder if that difference explains TA's observation of 'no benefit', which I have to admit I also found surprising - particularly for 470s and 580s - before I thought of that possible explanation.

There'll be a new installer release, with a further increment to the CUDA app, probably sometime next week (we're just doing some final testing). It might be a good idea to re-test the "multiple tasks per GPU" benefit curve once the new app is released.
ID: 1173651 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1173654 - Posted: 25 Nov 2011, 13:47:09 UTC - in response to Message 1173651.  
Last modified: 25 Nov 2011, 14:03:44 UTC

...
Ive also made a very extensive research (using Einstein's WUs that are much more predictable on crunching times than the Seti MBs) and on my 560Ti running just one takes around 40/45 mins while running 2 the time is around 1 hour (so 30 mins each).

The Einstein CUDA science app does, consciously and deliberately, still do a higher proportion of its work on the host CPU - up to 20% in the worst case.

So, there will be times during the run when the GPU stream processors are idle. With fast-switching hardware (Fermi-class GPUs), the second task can efficiently sneak in and use those spare cycles - that will be the major part of the benefit at Einstein.

Here at SETI, the CUDA shaders are pretty fully occupied during the bulk of the run. The apps we were using a year ago had a long CPU-only startup phase, during which the GPU was effectively idle - so the same 'sneaking in a second task while the first isn't looking' trick worked here, almost as well as at Einstein.

The newer optimised CUDA applications have substantially reduced the length of that CPU-only startup phase. I wonder if that difference explains TA's observation of 'no benefit', which I have to admit I also found surprising - particularly for 470s and 580s - before I thought of that possible explanation.

There'll be a new installer release, with a further increment to the CUDA app, probably sometime next week (we're just doing some final testing). It might be a good idea to re-test the "multiple tasks per GPU" benefit curve once the new app is released.


I dont know if that explains what TA said, but im my system, when the Einstein WUs are paired in the same card with Seti's MB WU they are crunched faster than when their are paired with another Einstein WU... which is not consistent with the heavy use of the stream processors </confused> ...

OTOH, if i run only one Seti MB, the GPU cores usage is between 50 and 60%, this means that in my cards there are enough free processing power available to be used by another task ...

When I run 2 WUs the GPU goes above 90% and I guess that's why if I run 3 I dont get any noticeable advantage...

May be, for TA, CPU/PCIe Lanes/etc. are more efficient than mines and the GPUs are feeded more efficiently so there is no spare processing power on the GPUs to crunch aditional work?...

EDIT: Ive configured BOINC to use only 6 of the 8 CPU cores, and Ive changed the CPU usage of all the CUDA apps to 0.25, so there is allways at least 3 free CPU cores im my system.
ID: 1173654 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1173662 - Posted: 25 Nov 2011, 14:41:40 UTC - in response to Message 1173654.  

I dont know if that explains what TA said, but im my system, when the Einstein WUs are paired in the same card with Seti's MB WU they are crunched faster than when their are paired with another Einstein WU... which is not consistent with the heavy use of the stream processors </confused> ...

That's interesting, and counter-intuitive. I'm confused too.

As far as I can remember, that's the first report I've read of heterogeneous pairing, on either site or between any two projects. I think we've had people mixing AP and MB here, but that would probably have been on ATI. And do I remember someone with count=0.51 for AP, count=0.49 for MB, to run two MB, one of each, but not two AP together? Something like that.

Possible thought: Einstein uses much bigger data files - 4MB at a time. Each Einstein task is actually a 'package' of eight separate sub-tasks - so every four or five minutes, you'll be unloading all that data and loading a new set. Maybe the different pattern of memory access makes the two different apps interleave better. I remember in the days of the old hyperthreaded NetBurst P4s, it was well known that running SETI and Einstein in parallel on the two virtual cores was most efficient, possibly for a similar reason.

Much scope for further research here...
ID: 1173662 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1173687 - Posted: 25 Nov 2011, 18:39:51 UTC

To clarify, I'm talking about MB on GTX470 and GTX580 Nvidia Fermi cards using Lunatics apps.

I have checked this thoroughly with VHAR, VLAR units and everything in between. When running 2 units, they take twice as long to within a few seconds as when running one. Either way the GPU is running at around 95% load.

T.A.

ID: 1173687 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1173693 - Posted: 25 Nov 2011, 18:57:09 UTC - in response to Message 1173687.  
Last modified: 25 Nov 2011, 19:03:42 UTC

To clarify, I'm talking about MB on GTX470 and GTX580 Nvidia Fermi cards using Lunatics apps.

I have checked this thoroughly with VHAR, VLAR units and everything in between. When running 2 units, they take twice as long to within a few seconds as when running one. Either way the GPU is running at around 95% load.

T.A.

Which version of the Lunatics app were you running during the timing runs?

Edit - your GTX 470 host is showing some odd results at the moment - Pending tasks for computer 5467867. Many inconclusive validations for CPU apps, and NVidia-allocated tasks run on CPU.
ID: 1173693 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1173695 - Posted: 25 Nov 2011, 18:58:21 UTC - in response to Message 1173687.  
Last modified: 25 Nov 2011, 18:59:45 UTC

To clarify, I'm talking about MB on GTX470 and GTX580 Nvidia Fermi cards using Lunatics apps.

I have checked this thoroughly with VHAR, VLAR units and everything in between. When running 2 units, they take twice as long to within a few seconds as when running one. Either way the GPU is running at around 95% load.

T.A.



It's not that I don't believe, that on your system this holds true. :)

Which Lunatics app? x38g?
I'm not sure throughput got reevaluated after x32f, so it would be intresting to hear from other people running optimised on Fermis, what they see.
Maybe we can discern a pattern on when/if increasing the count increases throughput.

edit we'll have x41g out next week, then would be a good time for everybody to reevaluate.

NB When it comes to counts, the advice has always been to go and check what gives the best throughput.
ID: 1173695 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1173696 - Posted: 25 Nov 2011, 18:59:57 UTC - in response to Message 1173662.  
Last modified: 25 Nov 2011, 19:15:33 UTC

As far as I can remember, that's the first report I've read of heterogeneous pairing, on either site or between any two projects. I think we've had people mixing AP and MB here, but that would probably have been on ATI. And do I remember someone with count=0.51 for AP, count=0.49 for MB, to run two MB, one of each, but not two AP together? Something like that.


I was running two instances of Hybrid Astropulse and an instance of Collatz on my HD5770 at least a year and a half ago, and Raistmer was running something similar,
I was running with a Count of 0.4 with Hybrid AP, and a Count of 0.2 for Collatz, i just had to make sure i didn't run out of AP.

I tried running an Einstein BRP3 GPU Cuda task before with a Cuda MB task (on an older x series Cuda app), the Einstein task completed completed O.K, the Cuda MB task crawled along and got Max time exceeded, i haven't tried it recently with x38g,

Claggy
ID: 1173696 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : ?'s About Crossfire X: Twice As Good Or Wasted Efforts?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.