Comparing WUs to wingman


log in

Advanced search

Message boards : Number crunching : Comparing WUs to wingman

Author Message
Profile FloridaBear
Avatar
Send message
Joined: 28 Mar 02
Posts: 116
Credit: 5,900,191
RAC: 0
United States
Message 1135010 - Posted: 2 Aug 2011, 0:52:17 UTC

Occasionally I like to compare valid blocks I've been crunching to my wingman to see if my times are better or worse, and try to pinpoint why in either case. Unfortunately, this has become nearly impossible since there is no log anywhere of how many WU's a given GPU is processing simultaneously. I process two blocks at the same time, since I cannot really manage to do 3 on my GTX 260, even though the 460 would handle it. When looking at a Wingman's WU, unless I'm missing something, there's no way to tell how many blocks that GPU was processing concurrently. It's frustrating. Anyone have any ideas?

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 1135043 - Posted: 2 Aug 2011, 3:38:23 UTC - in response to Message 1135010.

nope other than assuming that the 4XX and 5XX nVidia cards and 5XXX and 6XXX ATI cards are running 2-3 WU's at a time.

One way to be sure is knowing your crunching potential. My GPU's both run 2 at a time and compare with a 280 or 460 nVidia. You also have to consider that your wingman may OC his card so the numbers can be very misleading.

Raistmer has charts of what some cards are capable of using a standardized WU. Again, that's not every card and doesn't actually look at the OC or varying Driver setups
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4063
Credit: 111,220,604
RAC: 148,179
United States
Message 1135200 - Posted: 2 Aug 2011, 13:56:37 UTC - in response to Message 1135043.

nope other than assuming that the 4XX and 5XX nVidia cards and 5XXX and 6XXX ATI cards are running 2-3 WU's at a time.

One way to be sure is knowing your crunching potential. My GPU's both run 2 at a time and compare with a 280 or 460 nVidia. You also have to consider that your wingman may OC his card so the numbers can be very misleading.

Raistmer has charts of what some cards are capable of using a standardized WU. Again, that's not every card and doesn't actually look at the OC or varying Driver setups

With the "instances per device" in the ATI apps recorded in the result file it can show how that system was configured at the time that result was returned. Maybe something like that can be done in the Nvidia App for v7.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8460
Credit: 48,756,541
RAC: 83,461
United Kingdom
Message 1135206 - Posted: 2 Aug 2011, 14:14:36 UTC - in response to Message 1135200.

nope other than assuming that the 4XX and 5XX nVidia cards and 5XXX and 6XXX ATI cards are running 2-3 WU's at a time.

One way to be sure is knowing your crunching potential. My GPU's both run 2 at a time and compare with a 280 or 460 nVidia. You also have to consider that your wingman may OC his card so the numbers can be very misleading.

Raistmer has charts of what some cards are capable of using a standardized WU. Again, that's not every card and doesn't actually look at the OC or varying Driver setups

With the "instances per device" in the ATI apps recorded in the result file it can show how that system was configured at the time that result was returned. Maybe something like that can be done in the Nvidia App for v7.

The Nvidia (native CUDA) app for MB - which includes the forthcoming v7 - doesn't require any special command-line setting when run two-up or three-up - BOINC's <count> directive is sufficient. Without a command line parmeter passed to the app itself, it's hard to see what could be reported in stderr_txt (not the result file, that's the science that get uploaded for validation). But I'm sure the question will find its way back to the relevant developer.

Profile BMH
Avatar
Send message
Joined: 27 May 99
Posts: 321
Credit: 87,869,068
RAC: 71,234
United Kingdom
Message 1135474 - Posted: 3 Aug 2011, 14:42:42 UTC

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).
____________
Brian.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4063
Credit: 111,220,604
RAC: 148,179
United States
Message 1135477 - Posted: 3 Aug 2011, 14:58:38 UTC - in response to Message 1135206.

nope other than assuming that the 4XX and 5XX nVidia cards and 5XXX and 6XXX ATI cards are running 2-3 WU's at a time.

One way to be sure is knowing your crunching potential. My GPU's both run 2 at a time and compare with a 280 or 460 nVidia. You also have to consider that your wingman may OC his card so the numbers can be very misleading.

Raistmer has charts of what some cards are capable of using a standardized WU. Again, that's not every card and doesn't actually look at the OC or varying Driver setups

With the "instances per device" in the ATI apps recorded in the result file it can show how that system was configured at the time that result was returned. Maybe something like that can be done in the Nvidia App for v7.

The Nvidia (native CUDA) app for MB - which includes the forthcoming v7 - doesn't require any special command-line setting when run two-up or three-up - BOINC's <count> directive is sufficient. Without a command line parmeter passed to the app itself, it's hard to see what could be reported in stderr_txt (not the result file, that's the science that get uploaded for validation). But I'm sure the question will find its way back to the relevant developer.

I always want to call the data in stderr the results for some reason. It might prove beneficial to testing/troubleshooting if there was some way for them to stick that in there, but it might be more effort then it is worth. That would be for devs to decide.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

JohnDKProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 840
Credit: 42,740,132
RAC: 69,950
Denmark
Message 1135479 - Posted: 3 Aug 2011, 15:04:33 UTC - in response to Message 1135474.

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).

That was certainly true with my 260, times was much longer running 2 WUs.

3 WUs on my 460 is slightly better than 2 WUs, not much but still better.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8460
Credit: 48,756,541
RAC: 83,461
United Kingdom
Message 1135485 - Posted: 3 Aug 2011, 15:18:23 UTC - in response to Message 1135479.

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).

That was certainly true with my 260, times was much longer running 2 WUs.

3 WUs on my 460 is slightly better than 2 WUs, not much but still better.

You're never actually doing doing two tasks literally at the same time: you just have them both available in memory as close as possible to the processing unit, and switch between them thousands of times a second - like any multitasking computer from the 1960s onwards. That's quite different from what you do when you're running multiple tasks on distinct CPU cores or separate graphics cards.

The older 2xx GPUs (and earlier) can do that task switching, but they are horrendously inefficient at it. So they can do two tasks per GPU, but it's pointless to try - you waste far more time than you save.

The newer 'Fermi' class - 4xx and 5xx - GPUs have new hardware to handle task switching far more efficiently. That's why it becomes viable to double-up - having a second task ready to steal a few cycles, and to be switched into 'run' mode in a few microseconds, can fill in the gaps when the first task is waiting for instructions (or data) about what to do next. And vice versa...

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4063
Credit: 111,220,604
RAC: 148,179
United States
Message 1135502 - Posted: 3 Aug 2011, 15:39:04 UTC - in response to Message 1135485.

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).

That was certainly true with my 260, times was much longer running 2 WUs.

3 WUs on my 460 is slightly better than 2 WUs, not much but still better.

You're never actually doing doing two tasks literally at the same time: you just have them both available in memory as close as possible to the processing unit, and switch between them thousands of times a second - like any multitasking computer from the 1960s onwards. That's quite different from what you do when you're running multiple tasks on distinct CPU cores or separate graphics cards.

The older 2xx GPUs (and earlier) can do that task switching, but they are horrendously inefficient at it. So they can do two tasks per GPU, but it's pointless to try - you waste far more time than you save.

The newer 'Fermi' class - 4xx and 5xx - GPUs have new hardware to handle task switching far more efficiently. That's why it becomes viable to double-up - having a second task ready to steal a few cycles, and to be switched into 'run' mode in a few microseconds, can fill in the gaps when the first task is waiting for instructions (or data) about what to do next. And vice versa...

I had not realized that is what they were doing. For some reason I was thinking the GPU was splitting its processing cores between the tasks.
It now reminds me of the old days of people running multiple tasks on the non-HT P4's and getting a bit of gain out of it.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile FloridaBear
Avatar
Send message
Joined: 28 Mar 02
Posts: 116
Credit: 5,900,191
RAC: 0
United States
Message 1135786 - Posted: 4 Aug 2011, 5:00:55 UTC - in response to Message 1135474.
Last modified: 4 Aug 2011, 5:01:21 UTC

I thought the 260s could only practiacally crunch 1 WU at a time. That's what I have mine doing and the 460s and 560tis do 2 WUs at a time (some people have them doing 3 but I'm not certain there is any benefit).


I recently added a 460 to my PC that already had a 260, and in order to extract more performance from the 460, I switched to doing 2 WU's at a time in both cards. I did not see any performance degredation on the 260 (i.e. throughput)--it was still doing a theoretical ~15K per day. The 460 was doing ~20K theoretical per day. These were taken by averaging the numbers from about a dozen varied completed WU's.

I do remember on older versions of the optimized clients, the 260's utilization would actually drop when doing 2 at a time, but that is certainly no longer the case. I think it may slightly improve throughput, since you are not wasting time between WU's (unless they both complete at the same time). I'd like to hear from others running 2 at a time on 260s though, it's an interesting topic.

Profile FloridaBear
Avatar
Send message
Joined: 28 Mar 02
Posts: 116
Credit: 5,900,191
RAC: 0
United States
Message 1135802 - Posted: 4 Aug 2011, 5:45:20 UTC - in response to Message 1135786.
Last modified: 4 Aug 2011, 5:45:44 UTC

As an addendum, revisiting the numbers, my 460@770MHz does two shorties in about 3:20; my 260@690MHz does two shorties in about 5:18. They do shorties one at a time in 1:53 and 2:14, respectively.

So basically, it seems the 460 does about 3K less per day doing one at time while the 260 does 3K less doing two at a time. So by putting them both in one machine, it's basically a wash (until such a time as I can specify how many concurrent WU's to do on each GPU).

So while it's true the 260 has lower throughput doing 2 at a time, it's on the order of 19%--not too drastic. The 460 picks up about 13%.

I guess the solution is to get another 460 and pass the 260 down to the kids ;-)

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4066
Credit: 32,868,893
RAC: 6,820
United Kingdom
Message 1135854 - Posted: 4 Aug 2011, 10:30:18 UTC - in response to Message 1135802.

I guess the solution is to get another 460 and pass the 260 down to the kids ;-)


Another solution is to run two clients, (each would have their own Boinc Data directory) you then exclude one GPU from each client, and set the GTX460 app_info to run two tasks at once, and the 260 app_info to only run one task at a time,

Claggy

Profile FloridaBear
Avatar
Send message
Joined: 28 Mar 02
Posts: 116
Credit: 5,900,191
RAC: 0
United States
Message 1135873 - Posted: 4 Aug 2011, 12:51:59 UTC - in response to Message 1135854.
Last modified: 4 Aug 2011, 12:52:06 UTC

All a moot point until I can actually download work.

Message boards : Number crunching : Comparing WUs to wingman

Copyright © 2014 University of California