?'s About Crossfire X: Twice As Good Or Wasted Efforts?

Author	Message
SilentObserver64 Volunteer tester Send message Joined: 21 Sep 05 Posts: 139 Credit: 680,037 RAC: 0	Message 1173236 - Posted: 23 Nov 2011, 15:05:37 UTC Last modified: 23 Nov 2011, 15:38:31 UTC I know this has probably been covered, blogged, and asked a million times, but with all the hardware and software changes that have been made to the GPU's, BOINC clients, and modified clients, in the recent months, I just wanted to get a fresh take on the subject. I'm hoping I can get some insight on this, possibly something I don't already know, however, I am going to pretend I know nothing, in hopes of learning something new. Now granted I have not run my scream machine (with BOINC) that I built a couple months ago in about a month, because of some projects I have been working on, but I have recently installed a second video card. I am now running two ATI Radeon HD 6770's in a Crossfire X setup. My first guess is that SETI@Home and Beta will see these as two seperate cards, and theoretically run two Astro WU's (Also assuming I start crunching WU's again on my scream machine). Is there any way to set them to run the same WU, so that it crunches twice as fast? If yes or even no, should I set a percentage of a core, all of a single core, or 2 cores to run this dual vid card setup? (I'm running an AMD Phenom II x6 @ 3.7+GHz currently, with 14Gigs DDR3 (Unganged) RAM @ 1537MHz I believe. Doing this from memory since I am currently at work.) Does anyone have a Crossfire X profile mod, or know the coding for the SETI appinfo that I would need to maybe make it work as one unit, assuming that this is even possible? What are the pro's and con's of running a dual vid card setup with SETI@Home and Beta? What project may best benefit from a dual card setup, or any that may utilize a Crossfire X setup? I may possibly have more questions, but this is enough to get started with. Thanks in advance for anyone who helps with this issue. http://www.goodsearch.com/nonprofit/university-of-california-setihome.aspx ID: 1173236 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1173273 - Posted: 23 Nov 2011, 17:24:57 UTC - in response to Message 1173236. As a gamer I don't see the point of cross fire or SLI. There are no games that require anything close to crossfire capabilities. My 6970 plays any game with a very adequate fps so I think the multi GPU idea can only be towards crunching since double GPU is pointless when you can just buy a newer better GPU In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1173273 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1173276 - Posted: 23 Nov 2011, 17:36:42 UTC - in response to Message 1173236. IIRC to make full use of your 6770's you should be running 2, or maybe more, tasks at a time per card. However that could just be for MB tasks. So trying to get both cards to run 1 task would mean each card would be even less used than just running 1 at a time per card. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1173276 ·

Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0	Message 1173277 - Posted: 23 Nov 2011, 17:39:53 UTC - in response to Message 1173236. My first guess is that SETI@Home and Beta will see these as two seperate cards, and theoretically run two Astro WU's (Also assuming I start crunching WU's again on my scream machine). Yes, Boinc will see 2 GPUs and will try to run one WU in each GPU. (Warning: I dont know about ATI Crossfire, but with Nvidia when you enable SLI on the drivers it tend to fail at crunching or the system gets wonky... Its Ok to have the boards linked by the SLI bridge (if any), but SLI should be disabled to let CUDA/OpenCL apps to run smoothly) Is there any way to set them to run the same WU, so that it crunches twice as fast? Doing this from memory since I am currently at work.) Does anyone have a Crossfire X profile mod, or know the coding for the SETI appinfo that I would need to maybe make it work as one unit, assuming that this is even possible? What are the pro's and con's of running a dual vid card setup with SETI@Home and Beta? What project may best benefit from a dual card setup, or any that may utilize a Crossfire X setup? I may possibly have more questions, but this is enough to get started with. Thanks in advance for anyone who helps with this issue. AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves... If yes or even no, should I set a percentage of a core, all of a single core, or 2 cores to run this dual vid card setup? (I'm running an AMD Phenom II x6 @ 3.7+GHz currently, with 14Gigs DDR3 (Unganged) RAM @ 1537MHz I believe. Having at least a free core to feed the GPUs is desiderable to get the better performance, but as the OSs are multitasking/multithreaded if you dont leave any free core then they will be shared among all running apps (wich in some cases could make everything slower) ID: 1173277 ·

Treasurer Send message Joined: 13 Dec 05 Posts: 109 Credit: 1,569,762 RAC: 0	Message 1173311 - Posted: 23 Nov 2011, 20:19:58 UTC - in response to Message 1173277. Its Ok to have the boards linked by the SLI bridge (if any), but SLI should be disabled So its possible to activate the second card this way without using 2nd monitor or dummy plug? ID: 1173311 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1173313 - Posted: 23 Nov 2011, 20:22:17 UTC - in response to Message 1173311. IIRC that was changed to allow access to the card in a driver update from nVidia In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1173313 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20	Message 1173348 - Posted: 23 Nov 2011, 22:26:10 UTC - in response to Message 1173273. As a gamer I don't see the point of cross fire or SLI. There are no games that require anything close to crossfire capabilities. My 6970 plays any game with a very adequate fps... Indeed, it is pointless other than for futile "bragging rights" to go for a fps that is faster than your screen refresh rate! You just cannot see that fast... I have all my graphics cards locked to refresh no faster than the vsync rate to avoid wasted effort and also to ensure there are no refresh aliasing effects. Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 1173348 ·

Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0	Message 1173355 - Posted: 23 Nov 2011, 22:38:21 UTC - in response to Message 1173311. Its Ok to have the boards linked by the SLI bridge (if any), but SLI should be disabled So its possible to activate the second card this way without using 2nd monitor or dummy plug? I dont know (not even knew that some GPUs required to be conected to something to work...) I have one box with 2 GPUs linked by the SLI bridge and only one monitor, and both GPUs are ussed by BOINC. On another box I have a GPU not plugged to anything and the monitor on the onboard (not CUDA capable) video card and it also works... ID: 1173355 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1173381 - Posted: 24 Nov 2011, 0:50:34 UTC - in response to Message 1173355. Its Ok to have the boards linked by the SLI bridge (if any), but SLI should be disabled So its possible to activate the second card this way without using 2nd monitor or dummy plug? I dont know (not even knew that some GPUs required to be conected to something to work...) I have one box with 2 GPUs linked by the SLI bridge and only one monitor, and both GPUs are ussed by BOINC. On another box I have a GPU not plugged to anything and the monitor on the onboard (not CUDA capable) video card and it also works... I think one of the other solutions was to extend the desktop to the other monitor. When I do that everything looks the same on my primary monitor & the 2nd one is blank until I drag stuff over to that portion of the desktop. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1173381 ·

Terror Australis Volunteer tester Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44	Message 1173640 - Posted: 25 Nov 2011, 10:00:07 UTC - in response to Message 1173277. AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves.. Running 2 WU's is not twice as fast as running one, as each WU takes twice as long to complete. e.g. if a unit takes 5 minutes to complete running by itself, when running 2 units each takes 10 minutes. You still complete the same number of units in a hour, in the above example 12 units but there is no speed gain. Some may claim otherwise but I have made some pretty close observations on my GTX470 and GTX580 machines and if there is any difference, the gain only in the order of a few seconds per hour. The main advantage of running multiple units is to make better usage of the video card's memory. Afterall, if your card has a Gig + of memory it's a shame to only use 250MB of it. T.A. ID: 1173640 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1173641 - Posted: 25 Nov 2011, 10:30:48 UTC - in response to Message 1173640. Last modified: 25 Nov 2011, 10:31:32 UTC AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves.. Running 2 WU's is not twice as fast as running one, as each WU takes twice as long to complete. e.g. if a unit takes 5 minutes to complete running by itself, when running 2 units each takes 10 minutes. You still complete the same number of units in a hour, in the above example 12 units but there is no speed gain. Some may claim otherwise but I have made some pretty close observations on my GTX470 and GTX580 machines and if there is any difference, the gain only in the order of a few seconds per hour. The main advantage of running multiple units is to make better usage of the video card's memory. Afterall, if your card has a Gig + of memory it's a shame to only use 250MB of it. T.A. No thats wrong. On some cards you definetly increase output while running 2 or 3 units at a time. On my HD 5850 its around 30%. With each crime and every kindness we birth our future. ID: 1173641 ·

LadyL Volunteer tester Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0	Message 1173643 - Posted: 25 Nov 2011, 10:59:48 UTC - in response to Message 1173640. AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves.. Running 2 WU's is not twice as fast as running one, as each WU takes twice as long to complete. e.g. if a unit takes 5 minutes to complete running by itself, when running 2 units each takes 10 minutes. You still complete the same number of units in a hour, in the above example 12 units but there is no speed gain. Some may claim otherwise but I have made some pretty close observations on my GTX470 and GTX580 machines and if there is any difference, the gain only in the order of a few seconds per hour. The main advantage of running multiple units is to make better usage of the video card's memory. Afterall, if your card has a Gig + of memory it's a shame to only use 250MB of it. T.A. Jason posted this. He should know, shouldn't he? I admit it's a tad old, and not the current app, but I don't think the fundamentals have changed. ID: 1173643 ·

Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0	Message 1173648 - Posted: 25 Nov 2011, 11:55:21 UTC - in response to Message 1173640. AFAIK, not. Anyway running 2 WUs is twice as fast as running one, and if it were posible to run one WU in some splitted way it will not be exactly twice as fast due to the overload of the join/sync of the two halves.. Running 2 WU's is not twice as fast as running one, as each WU takes twice as long to complete. e.g. if a unit takes 5 minutes to complete running by itself, when running 2 units each takes 10 minutes. You still complete the same number of units in a hour, in the above example 12 units but there is no speed gain. Some may claim otherwise but I have made some pretty close observations on my GTX470 and GTX580 machines and if there is any difference, the gain only in the order of a few seconds per hour. The main advantage of running multiple units is to make better usage of the video card's memory. Afterall, if your card has a Gig + of memory it's a shame to only use 250MB of it. T.A. Thats not what Ive said... I was trying to say something more obvious: running 2 WUs in 2 GPUs is twice faster than running one WU in 1 GPU, the point was that he wanted to crunch just 1 WU using both GPUs concurrently to do it twice faster which is not possible and also pointless because he is already doing it twice faster... (Ok... I know all this paragraph is kinda confusing :D ) Besides that, and as others have said, on some cards (mainly all fermis and some ATIs which I dont know), if they have enough memory, you get more throughput running 2 (or even more) WUs than just running one. Off course its not twice as fast but on those cards its definately faster. Ive also made a very extensive research (using Einstein's WUs that are much more predictable on crunching times than the Seti MBs) and on my 560Ti running just one takes around 40/45 mins while running 2 the time is around 1 hour (so 30 mins each). ID: 1173648 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1173651 - Posted: 25 Nov 2011, 13:17:50 UTC - in response to Message 1173648. ... Ive also made a very extensive research (using Einstein's WUs that are much more predictable on crunching times than the Seti MBs) and on my 560Ti running just one takes around 40/45 mins while running 2 the time is around 1 hour (so 30 mins each). The Einstein CUDA science app does, consciously and deliberately, still do a higher proportion of its work on the host CPU - up to 20% in the worst case. So, there will be times during the run when the GPU stream processors are idle. With fast-switching hardware (Fermi-class GPUs), the second task can efficiently sneak in and use those spare cycles - that will be the major part of the benefit at Einstein. Here at SETI, the CUDA shaders are pretty fully occupied during the bulk of the run. The apps we were using a year ago had a long CPU-only startup phase, during which the GPU was effectively idle - so the same 'sneaking in a second task while the first isn't looking' trick worked here, almost as well as at Einstein. The newer optimised CUDA applications have substantially reduced the length of that CPU-only startup phase. I wonder if that difference explains TA's observation of 'no benefit', which I have to admit I also found surprising - particularly for 470s and 580s - before I thought of that possible explanation. There'll be a new installer release, with a further increment to the CUDA app, probably sometime next week (we're just doing some final testing). It might be a good idea to re-test the "multiple tasks per GPU" benefit curve once the new app is released. ID: 1173651 ·

Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0	Message 1173654 - Posted: 25 Nov 2011, 13:47:09 UTC - in response to Message 1173651. Last modified: 25 Nov 2011, 14:03:44 UTC ... Ive also made a very extensive research (using Einstein's WUs that are much more predictable on crunching times than the Seti MBs) and on my 560Ti running just one takes around 40/45 mins while running 2 the time is around 1 hour (so 30 mins each). The Einstein CUDA science app does, consciously and deliberately, still do a higher proportion of its work on the host CPU - up to 20% in the worst case. So, there will be times during the run when the GPU stream processors are idle. With fast-switching hardware (Fermi-class GPUs), the second task can efficiently sneak in and use those spare cycles - that will be the major part of the benefit at Einstein. Here at SETI, the CUDA shaders are pretty fully occupied during the bulk of the run. The apps we were using a year ago had a long CPU-only startup phase, during which the GPU was effectively idle - so the same 'sneaking in a second task while the first isn't looking' trick worked here, almost as well as at Einstein. The newer optimised CUDA applications have substantially reduced the length of that CPU-only startup phase. I wonder if that difference explains TA's observation of 'no benefit', which I have to admit I also found surprising - particularly for 470s and 580s - before I thought of that possible explanation. There'll be a new installer release, with a further increment to the CUDA app, probably sometime next week (we're just doing some final testing). It might be a good idea to re-test the "multiple tasks per GPU" benefit curve once the new app is released. I dont know if that explains what TA said, but im my system, when the Einstein WUs are paired in the same card with Seti's MB WU they are crunched faster than when their are paired with another Einstein WU... which is not consistent with the heavy use of the stream processors </confused> ... OTOH, if i run only one Seti MB, the GPU cores usage is between 50 and 60%, this means that in my cards there are enough free processing power available to be used by another task ... When I run 2 WUs the GPU goes above 90% and I guess that's why if I run 3 I dont get any noticeable advantage... May be, for TA, CPU/PCIe Lanes/etc. are more efficient than mines and the GPUs are feeded more efficiently so there is no spare processing power on the GPUs to crunch aditional work?... EDIT: Ive configured BOINC to use only 6 of the 8 CPU cores, and Ive changed the CPU usage of all the CUDA apps to 0.25, so there is allways at least 3 free CPU cores im my system. ID: 1173654 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1173662 - Posted: 25 Nov 2011, 14:41:40 UTC - in response to Message 1173654. I dont know if that explains what TA said, but im my system, when the Einstein WUs are paired in the same card with Seti's MB WU they are crunched faster than when their are paired with another Einstein WU... which is not consistent with the heavy use of the stream processors </confused> ... That's interesting, and counter-intuitive. I'm confused too. As far as I can remember, that's the first report I've read of heterogeneous pairing, on either site or between any two projects. I think we've had people mixing AP and MB here, but that would probably have been on ATI. And do I remember someone with count=0.51 for AP, count=0.49 for MB, to run two MB, one of each, but not two AP together? Something like that. Possible thought: Einstein uses much bigger data files - 4MB at a time. Each Einstein task is actually a 'package' of eight separate sub-tasks - so every four or five minutes, you'll be unloading all that data and loading a new set. Maybe the different pattern of memory access makes the two different apps interleave better. I remember in the days of the old hyperthreaded NetBurst P4s, it was well known that running SETI and Einstein in parallel on the two virtual cores was most efficient, possibly for a similar reason. Much scope for further research here... ID: 1173662 ·

Terror Australis Volunteer tester Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44	Message 1173687 - Posted: 25 Nov 2011, 18:39:51 UTC To clarify, I'm talking about MB on GTX470 and GTX580 Nvidia Fermi cards using Lunatics apps. I have checked this thoroughly with VHAR, VLAR units and everything in between. When running 2 units, they take twice as long to within a few seconds as when running one. Either way the GPU is running at around 95% load. T.A. ID: 1173687 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1173693 - Posted: 25 Nov 2011, 18:57:09 UTC - in response to Message 1173687. Last modified: 25 Nov 2011, 19:03:42 UTC To clarify, I'm talking about MB on GTX470 and GTX580 Nvidia Fermi cards using Lunatics apps. I have checked this thoroughly with VHAR, VLAR units and everything in between. When running 2 units, they take twice as long to within a few seconds as when running one. Either way the GPU is running at around 95% load. T.A. Which version of the Lunatics app were you running during the timing runs? Edit - your GTX 470 host is showing some odd results at the moment - Pending tasks for computer 5467867. Many inconclusive validations for CPU apps, and NVidia-allocated tasks run on CPU. ID: 1173693 ·

LadyL Volunteer tester Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0	Message 1173695 - Posted: 25 Nov 2011, 18:58:21 UTC - in response to Message 1173687. Last modified: 25 Nov 2011, 18:59:45 UTC To clarify, I'm talking about MB on GTX470 and GTX580 Nvidia Fermi cards using Lunatics apps. I have checked this thoroughly with VHAR, VLAR units and everything in between. When running 2 units, they take twice as long to within a few seconds as when running one. Either way the GPU is running at around 95% load. T.A. It's not that I don't believe, that on your system this holds true. :) Which Lunatics app? x38g? I'm not sure throughput got reevaluated after x32f, so it would be intresting to hear from other people running optimised on Fermis, what they see. Maybe we can discern a pattern on when/if increasing the count increases throughput. edit we'll have x41g out next week, then would be a good time for everybody to reevaluate. NB When it comes to counts, the advice has always been to go and check what gives the best throughput. ID: 1173695 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1173696 - Posted: 25 Nov 2011, 18:59:57 UTC - in response to Message 1173662. Last modified: 25 Nov 2011, 19:15:33 UTC As far as I can remember, that's the first report I've read of heterogeneous pairing, on either site or between any two projects. I think we've had people mixing AP and MB here, but that would probably have been on ATI. And do I remember someone with count=0.51 for AP, count=0.49 for MB, to run two MB, one of each, but not two AP together? Something like that. I was running two instances of Hybrid Astropulse and an instance of Collatz on my HD5770 at least a year and a half ago, and Raistmer was running something similar, I was running with a Count of 0.4 with Hybrid AP, and a Count of 0.2 for Collatz, i just had to make sure i didn't run out of AP. I tried running an Einstein BRP3 GPU Cuda task before with a Cuda MB task (on an older x series Cuda app), the Einstein task completed completed O.K, the Cuda MB task crawled along and got Max time exceeded, i haven't tried it recently with x38g, Claggy ID: 1173696 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.