Comparing WUs to wingman

Message boards : Number crunching : Comparing WUs to wingman
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile FloridaBear
Avatar

Send message
Joined: 28 Mar 02
Posts: 117
Credit: 6,480,773
RAC: 0
United States
Message 1135010 - Posted: 2 Aug 2011, 0:52:17 UTC

Occasionally I like to compare valid blocks I've been crunching to my wingman to see if my times are better or worse, and try to pinpoint why in either case. Unfortunately, this has become nearly impossible since there is no log anywhere of how many WU's a given GPU is processing simultaneously. I process two blocks at the same time, since I cannot really manage to do 3 on my GTX 260, even though the 460 would handle it. When looking at a Wingman's WU, unless I'm missing something, there's no way to tell how many blocks that GPU was processing concurrently. It's frustrating. Anyone have any ideas?
ID: 1135010 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1135043 - Posted: 2 Aug 2011, 3:38:23 UTC - in response to Message 1135010.  

nope other than assuming that the 4XX and 5XX nVidia cards and 5XXX and 6XXX ATI cards are running 2-3 WU's at a time.

One way to be sure is knowing your crunching potential. My GPU's both run 2 at a time and compare with a 280 or 460 nVidia. You also have to consider that your wingman may OC his card so the numbers can be very misleading.

Raistmer has charts of what some cards are capable of using a standardized WU. Again, that's not every card and doesn't actually look at the OC or varying Driver setups


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1135043 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1135200 - Posted: 2 Aug 2011, 13:56:37 UTC - in response to Message 1135043.  

nope other than assuming that the 4XX and 5XX nVidia cards and 5XXX and 6XXX ATI cards are running 2-3 WU's at a time.

One way to be sure is knowing your crunching potential. My GPU's both run 2 at a time and compare with a 280 or 460 nVidia. You also have to consider that your wingman may OC his card so the numbers can be very misleading.

Raistmer has charts of what some cards are capable of using a standardized WU. Again, that's not every card and doesn't actually look at the OC or varying Driver setups

With the "instances per device" in the ATI apps recorded in the result file it can show how that system was configured at the time that result was returned. Maybe something like that can be done in the Nvidia App for v7.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1135200 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1135206 - Posted: 2 Aug 2011, 14:14:36 UTC - in response to Message 1135200.  

nope other than assuming that the 4XX and 5XX nVidia cards and 5XXX and 6XXX ATI cards are running 2-3 WU's at a time.

One way to be sure is knowing your crunching potential. My GPU's both run 2 at a time and compare with a 280 or 460 nVidia. You also have to consider that your wingman may OC his card so the numbers can be very misleading.

Raistmer has charts of what some cards are capable of using a standardized WU. Again, that's not every card and doesn't actually look at the OC or varying Driver setups

With the "instances per device" in the ATI apps recorded in the result file it can show how that system was configured at the time that result was returned. Maybe something like that can be done in the Nvidia App for v7.

The Nvidia (native CUDA) app for MB - which includes the forthcoming v7 - doesn't require any special command-line setting when run two-up or three-up - BOINC's <count> directive is sufficient. Without a command line parmeter passed to the app itself, it's hard to see what could be reported in stderr_txt (not the result file, that's the science that get uploaded for validation). But I'm sure the question will find its way back to the relevant developer.
ID: 1135206 · Report as offensive
Profile BMH
Avatar

Send message
Joined: 27 May 99
Posts: 419
Credit: 166,294,083
RAC: 125
United Kingdom
Message 1135474 - Posted: 3 Aug 2011, 14:42:42 UTC

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).
Brian.
ID: 1135474 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1135477 - Posted: 3 Aug 2011, 14:58:38 UTC - in response to Message 1135206.  

nope other than assuming that the 4XX and 5XX nVidia cards and 5XXX and 6XXX ATI cards are running 2-3 WU's at a time.

One way to be sure is knowing your crunching potential. My GPU's both run 2 at a time and compare with a 280 or 460 nVidia. You also have to consider that your wingman may OC his card so the numbers can be very misleading.

Raistmer has charts of what some cards are capable of using a standardized WU. Again, that's not every card and doesn't actually look at the OC or varying Driver setups

With the "instances per device" in the ATI apps recorded in the result file it can show how that system was configured at the time that result was returned. Maybe something like that can be done in the Nvidia App for v7.

The Nvidia (native CUDA) app for MB - which includes the forthcoming v7 - doesn't require any special command-line setting when run two-up or three-up - BOINC's <count> directive is sufficient. Without a command line parmeter passed to the app itself, it's hard to see what could be reported in stderr_txt (not the result file, that's the science that get uploaded for validation). But I'm sure the question will find its way back to the relevant developer.

I always want to call the data in stderr the results for some reason. It might prove beneficial to testing/troubleshooting if there was some way for them to stick that in there, but it might be more effort then it is worth. That would be for devs to decide.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1135477 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1135479 - Posted: 3 Aug 2011, 15:04:33 UTC - in response to Message 1135474.  

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).

That was certainly true with my 260, times was much longer running 2 WUs.

3 WUs on my 460 is slightly better than 2 WUs, not much but still better.
ID: 1135479 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1135485 - Posted: 3 Aug 2011, 15:18:23 UTC - in response to Message 1135479.  

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).

That was certainly true with my 260, times was much longer running 2 WUs.

3 WUs on my 460 is slightly better than 2 WUs, not much but still better.

You're never actually doing doing two tasks literally at the same time: you just have them both available in memory as close as possible to the processing unit, and switch between them thousands of times a second - like any multitasking computer from the 1960s onwards. That's quite different from what you do when you're running multiple tasks on distinct CPU cores or separate graphics cards.

The older 2xx GPUs (and earlier) can do that task switching, but they are horrendously inefficient at it. So they can do two tasks per GPU, but it's pointless to try - you waste far more time than you save.

The newer 'Fermi' class - 4xx and 5xx - GPUs have new hardware to handle task switching far more efficiently. That's why it becomes viable to double-up - having a second task ready to steal a few cycles, and to be switched into 'run' mode in a few microseconds, can fill in the gaps when the first task is waiting for instructions (or data) about what to do next. And vice versa...
ID: 1135485 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1135502 - Posted: 3 Aug 2011, 15:39:04 UTC - in response to Message 1135485.  

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).

That was certainly true with my 260, times was much longer running 2 WUs.

3 WUs on my 460 is slightly better than 2 WUs, not much but still better.

You're never actually doing doing two tasks literally at the same time: you just have them both available in memory as close as possible to the processing unit, and switch between them thousands of times a second - like any multitasking computer from the 1960s onwards. That's quite different from what you do when you're running multiple tasks on distinct CPU cores or separate graphics cards.

The older 2xx GPUs (and earlier) can do that task switching, but they are horrendously inefficient at it. So they can do two tasks per GPU, but it's pointless to try - you waste far more time than you save.

The newer 'Fermi' class - 4xx and 5xx - GPUs have new hardware to handle task switching far more efficiently. That's why it becomes viable to double-up - having a second task ready to steal a few cycles, and to be switched into 'run' mode in a few microseconds, can fill in the gaps when the first task is waiting for instructions (or data) about what to do next. And vice versa...

I had not realized that is what they were doing. For some reason I was thinking the GPU was splitting its processing cores between the tasks.
It now reminds me of the old days of people running multiple tasks on the non-HT P4's and getting a bit of gain out of it.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1135502 · Report as offensive
Profile FloridaBear
Avatar

Send message
Joined: 28 Mar 02
Posts: 117
Credit: 6,480,773
RAC: 0
United States
Message 1135786 - Posted: 4 Aug 2011, 5:00:55 UTC - in response to Message 1135474.  
Last modified: 4 Aug 2011, 5:01:21 UTC

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).


I recently added a 460 to my PC that already had a 260, and in order to extract more performance from the 460, I switched to doing 2 WU's at a time in both cards. I did not see any performance degredation on the 260 (i.e. throughput)--it was still doing a theoretical ~15K per day. The 460 was doing ~20K theoretical per day. These were taken by averaging the numbers from about a dozen varied completed WU's.

I do remember on older versions of the optimized clients, the 260's utilization would actually drop when doing 2 at a time, but that is certainly no longer the case. I think it may slightly improve throughput, since you are not wasting time between WU's (unless they both complete at the same time). I'd like to hear from others running 2 at a time on 260s though, it's an interesting topic.
ID: 1135786 · Report as offensive
Profile FloridaBear
Avatar

Send message
Joined: 28 Mar 02
Posts: 117
Credit: 6,480,773
RAC: 0
United States
Message 1135802 - Posted: 4 Aug 2011, 5:45:20 UTC - in response to Message 1135786.  
Last modified: 4 Aug 2011, 5:45:44 UTC

As an addendum, revisiting the numbers, my 460@770MHz does two shorties in about 3:20; my 260@690MHz does two shorties in about 5:18. They do shorties one at a time in 1:53 and 2:14, respectively.

So basically, it seems the 460 does about 3K less per day doing one at time while the 260 does 3K less doing two at a time. So by putting them both in one machine, it's basically a wash (until such a time as I can specify how many concurrent WU's to do on each GPU).

So while it's true the 260 has lower throughput doing 2 at a time, it's on the order of 19%--not too drastic. The 460 picks up about 13%.

I guess the solution is to get another 460 and pass the 260 down to the kids ;-)
ID: 1135802 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1135854 - Posted: 4 Aug 2011, 10:30:18 UTC - in response to Message 1135802.  

I guess the solution is to get another 460 and pass the 260 down to the kids ;-)


Another solution is to run two clients, (each would have their own Boinc Data directory) you then exclude one GPU from each client, and set the GTX460 app_info to run two tasks at once, and the 260 app_info to only run one task at a time,

Claggy
ID: 1135854 · Report as offensive
Profile FloridaBear
Avatar

Send message
Joined: 28 Mar 02
Posts: 117
Credit: 6,480,773
RAC: 0
United States
Message 1135873 - Posted: 4 Aug 2011, 12:51:59 UTC - in response to Message 1135854.  
Last modified: 4 Aug 2011, 12:52:06 UTC

All a moot point until I can actually download work.
ID: 1135873 · Report as offensive

Message boards : Number crunching : Comparing WUs to wingman


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.