Optimize your GPU. Find the value the easy way.


log in

Advanced search

Message boards : Number crunching : Optimize your GPU. Find the value the easy way.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 13 · Next
Author Message
Profile Arvid Almstrom
Avatar
Send message
Joined: 23 Mar 00
Posts: 98
Credit: 136,165,524
RAC: 8,907
Australia
Message 1280069 - Posted: 4 Sep 2012, 23:23:01 UTC - in response to Message 1279991.
Last modified: 4 Sep 2012, 23:23:47 UTC

Here are my tests running on a 570 with different CUDA apps.

For some reason running 4 tasks didn't complete on either the 41z cuda 3.2 or 41z cuda 4.2.

Starting automatic test: (x41x_winx64_cuda41)
Device: 0, device count: 1, average time / count: 127, average time on device: 127 Seconds (2 Minutes, 7 Seconds)
Device: 0, device count: 2, average time / count: 203, average time on device: 101 Seconds (1 Minutes, 41 Seconds)
Device: 0, device count: 3, average time / count: 290, average time on device: 96 Seconds (1 Minutes, 36 Seconds)

Starting automatic test: (x41x_winx64_cuda42)
Device: 0, device count: 1, average time / count: 124, average time on device: 124 Seconds (2 Minutes, 4 Seconds)
Device: 0, device count: 2, average time / count: 198, average time on device: 99 Seconds (1 Minutes, 39 Seconds)
Device: 0, device count: 3, average time / count: 285, average time on device: 95 Seconds (1 Minutes, 35 Seconds)
Device: 0, device count: 4, average time / count: 469, average time on device: 117 Seconds (1 Minutes, 57 Seconds)

Starting automatic test: (x41z_winx64_cuda42)
Device: 0, device count: 1, average time / count: 128, average time on device: 128 Seconds (2 Minutes, 8 Seconds)
Device: 0, device count: 2, average time / count: 198, average time on device: 99 Seconds (1 Minutes, 39 Seconds)
Device: 0, device count: 3, average time / count: 284, average time on device: 94 Seconds (1 Minutes, 34 Seconds)

Arvid
____________
Arvid Almstrom

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4810
Credit: 71,583,880
RAC: 8,785
Australia
Message 1280099 - Posted: 5 Sep 2012, 1:12:23 UTC - in response to Message 1280069.
Last modified: 5 Sep 2012, 1:19:53 UTC

...
Starting automatic test: (x41x_winx64_cuda42)
Device: 0, device count: 1, average time / count: 124, average time on device: 124 Seconds (2 Minutes, 4 Seconds)
Device: 0, device count: 2, average time / count: 198, average time on device: 99 Seconds (1 Minutes, 39 Seconds)
Device: 0, device count: 3, average time / count: 285, average time on device: 95 Seconds (1 Minutes, 35 Seconds)
Device: 0, device count: 4, average time / count: 469, average time on device: 117 Seconds (1 Minutes, 57 Seconds)
...


Textbook parallelism 'bathtub curve' , that's what you're looking for. [ For completion be sure to check 4 raises due to contention cost with z as well :), suspicion is that Fermi's dual DMA engines need 2 streams plus one for latency hiding for maximal use (on Vista/Win7 anyway), though there are other bottlenecks in play.]
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

spitfire_mk_2
Avatar
Send message
Joined: 14 Apr 00
Posts: 424
Credit: 11,265,287
RAC: 8,622
United States
Message 1280114 - Posted: 5 Sep 2012, 2:36:01 UTC

Tool version 1.2

Card: GTX 460


Results:
Starting automatic test: (x41g)
04 September 2012 - 21:42:57 Start, devices: 1, device count: 1 (1.00)
04 September 2012 - 21:47:27 Runtime: Device: 0, count: 0, 265 seconds
04 September 2012 - 21:47:27 Device: 0, Count: 0, finished.
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 1, average time / count: 265, average time on device: 265 Seconds (4 Minutes, 25 Seconds)
Next ---------------------------------------------------------------------
04 September 2012 - 21:47:29 Start, devices: 1, device count: 2 (0.50)
04 September 2012 - 21:56:02 Runtime: Device: 0, count: 0, 510 seconds
04 September 2012 - 21:56:02 Device: 0, Count: 0, finished.
04 September 2012 - 21:56:12 Runtime: Device: 0, count: 1, 520 seconds
04 September 2012 - 21:56:12 Device: 0, Count: 1, finished.
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 2, average time / count: 515, average time on device: 257 Seconds (4 Minutes, 17 Seconds)
Next ---------------------------------------------------------------------
04 September 2012 - 21:56:13 Start, devices: 1, device count: 3 (0.33)
04 September 2012 - 22:08:10 Runtime: Device: 0, count: 0, 711 seconds
04 September 2012 - 22:08:10 Device: 0, Count: 0, finished.
04 September 2012 - 22:08:14 Runtime: Device: 0, count: 2, 715 seconds
04 September 2012 - 22:08:14 Device: 0, Count: 2, finished.
04 September 2012 - 22:08:17 Runtime: Device: 0, count: 1, 718 seconds
04 September 2012 - 22:08:17 Device: 0, Count: 1, finished.
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 3, average time / count: 714, average time on device: 238 Seconds (3 Minutes, 58 Seconds)
Next ---------------------------------------------------------------------
04 September 2012 - 22:08:19 Start, devices: 1, device count: 4 (0.25)
04 September 2012 - 22:24:12 Runtime: Device: 0, count: 0, 945 seconds
04 September 2012 - 22:24:12 Device: 0, Count: 0, finished.
04 September 2012 - 22:24:20 Runtime: Device: 0, count: 1, 953 seconds
04 September 2012 - 22:24:20 Device: 0, Count: 1, finished.
04 September 2012 - 22:24:22 Runtime: Device: 0, count: 2, 955 seconds
04 September 2012 - 22:24:22 Device: 0, Count: 2, finished.
04 September 2012 - 22:24:22 Runtime: Device: 0, count: 3, 955 seconds
04 September 2012 - 22:24:22 Device: 0, Count: 3, finished.
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 4, average time / count: 952, average time on device: 238 Seconds (3 Minutes, 58 Seconds)
The best average time found: 238 Seconds (3 Minutes, 58 Seconds), with count: 0.33 (3)
____________

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4810
Credit: 71,583,880
RAC: 8,785
Australia
Message 1280150 - Posted: 5 Sep 2012, 5:52:27 UTC - in response to Message 1280114.

Tool version 1.2


I was called that once in high school.

____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 5165
Credit: 82,942,874
RAC: 71,538
Australia
Message 1280158 - Posted: 5 Sep 2012, 6:20:02 UTC - in response to Message 1280150.

Tool version 1.2


I was called that once in high school.

Sorry, I know that I shouldn't post while I'm trying to get pi...., ah..., blind or getting off topic but this just leaves me with questions to ask so does that mean that you have,

A/ Already been categorised?

B/ At such an early age?

C/ Also been identified as the 2nd revision of your model?

As far as I know they're still trying to figure me out on A. :D

Cheers?


____________

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,602,174
RAC: 102
United States
Message 1280159 - Posted: 5 Sep 2012, 6:21:02 UTC - in response to Message 1280114.

Tool version 1.2

Card: GTX 460
Results:
Device: 0, device count: 4, average time / count: 952, average time on device: 238 Seconds (3 Minutes, 58 Seconds)
The best average time found: 238 Seconds (3 Minutes, 58 Seconds), with count: 0.33 (3)

This value is way off other readings of 182 and 157oC the other cards are in the line of expectations. 460/560/660
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,602,174
RAC: 102
United States
Message 1280212 - Posted: 5 Sep 2012, 10:45:55 UTC - in response to Message 1280159.

V 1.3 Some bug fixes and a new workunits folder.
The check "Use all xx workunits" will use all WU in the "workunits" folder.
WARNING: A test with all the workunits may take a while.....

____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

John
Send message
Joined: 21 May 99
Posts: 49
Credit: 4,577,933
RAC: 737
United States
Message 1280218 - Posted: 5 Sep 2012, 11:05:50 UTC - in response to Message 1279991.

thanks for the info.

Full day of running about 600 or more completed and only those 2 errors so far.
rac over 4000
CPU has started crunching late last night.
so far so good at 3 wu each
Gtx 670 gtx 470 ( win 7 64 bit) Nvidia 306.02
____________

Profile Alex Storey
Volunteer tester
Avatar
Send message
Joined: 14 Jun 04
Posts: 533
Credit: 1,574,837
RAC: 473
Greece
Message 1280323 - Posted: 5 Sep 2012, 17:02:38 UTC
Last modified: 5 Sep 2012, 17:24:10 UTC

Ok, not only was this tool an awesome idea but since I was able to run it without nagging anybody for help, proves you made it idiot-proof too:)

Double mahalo!

And now a quote from Sten, which I've been dying to use:
"Let's not forget The Mighty Ion!"

Starting automatic test: (x41g)
05 September 2012 - 13:41:27 Start, devices: 1, device count: 1 (1.00)
05 September 2012 - 14:20:10 Runtime: Device: 0, count: 0, 2319 seconds
05 September 2012 - 14:20:10 Device: 0, Count: 0, finished.
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 1, average time / count: 2319, average time on device: 2319 Seconds (38 Minutes, 39 Seconds)
Next ---------------------------------------------------------------------
05 September 2012 - 14:20:11 Start, devices: 1, device count: 2 (0.50)
05 September 2012 - 15:29:07 Runtime: Device: 0, count: 0, 4132 seconds
05 September 2012 - 15:29:07 Device: 0, Count: 0, finished.
05 September 2012 - 15:29:13 Runtime: Device: 0, count: 1, 4138 seconds
05 September 2012 - 15:29:13 Device: 0, Count: 1, finished.
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 2, average time / count: 4135, average time on device: 2067 Seconds (34 Minutes, 27 Seconds)
Next ---------------------------------------------------------------------
05 September 2012 - 15:29:15 Start, devices: 1, device count: 3 (0.33)


Local time is 19:59 so after 4hrs+ I think I'm going to give up on count 3:) I also think I'm the only person to need the 6hr option in the graph! I know it's not showing but don't worry... GPU-Z crashed too when running 3 tasks. Yeah this is the second run. I've been at this all day!



Edit:
Alt + Prt Sc wouldn't work either, it just saw "behind" the Graph window and took a picture of a small part of my desktop:) Anyway, thanx for thinking to run this on your laptop. I NEVER would have thought to run it otherwise.

Windows 7 32-bit, Second Generation nVidia ION 512MB driver 270.61 35GFLOPS peak.

This is the GT218 chip also found in the G210M, 305M, 310M and 315M

w1hue
Volunteer tester
Send message
Joined: 4 Aug 00
Posts: 48
Credit: 1,536,720
RAC: 1,636
United States
Message 1280451 - Posted: 5 Sep 2012, 23:54:34 UTC - in response to Message 1280323.

Ok, not only was this tool an awesome idea but since I was able to run it without nagging anybody for help, proves you made it idiot-proof too:)

Not entirely ... I have not been able to get it to run. Apparently I am the only one in the known universe that hasn't had success, so I must be a super idiot!

John
Send message
Joined: 21 May 99
Posts: 49
Credit: 4,577,933
RAC: 737
United States
Message 1280475 - Posted: 6 Sep 2012, 2:28:29 UTC

made it thur a day of server down 270 + tasks reported when server came up.
with plenty in cache. First time ever. I also go a few astopusle wu ( 155 min runtimes once cpu freed up) Let the good times roll.


____________

Profile Alex Storey
Volunteer tester
Avatar
Send message
Joined: 14 Jun 04
Posts: 533
Credit: 1,574,837
RAC: 473
Greece
Message 1280517 - Posted: 6 Sep 2012, 8:57:43 UTC - in response to Message 1280451.
Last modified: 6 Sep 2012, 9:18:56 UTC

...Apparently I am the only one in the known universe that hasn't had success, so I must be a super idiot!


Ok, I had a quick look at everything you posted, your PC with the 520 and even the front page of your website and:

a) I'm not buying the whole "caveman" thing:)
b) I'm no Lunatics expert but maybe it's because you haven't installed the apps for CPU? It could be you have, and just haven't returned any results yet but CPU lunatics isn't showing up on the application details page of your 520 PC.

Edit:
...machine ID 'lepc'...
Not importartant but just so you know, the names of your PCs (and a bunch of other more personal info like IP addresses and other stuff) are only shown to you. In other words, no-one else can see the names of your PCs even if you wanted them to:)

Profile The Chosen
Avatar
Send message
Joined: 24 Jul 00
Posts: 54
Credit: 4,451,004
RAC: 889
Germany
Message 1280558 - Posted: 6 Sep 2012, 12:04:36 UTC - in response to Message 1280517.

@Snowmain

that is a very interesting list ;) thanx

in the next some weeks my Nvidia-GT630 get a new friend that is not in the list.
im courious about the WU's/$ :D

greetings
ralf
____________
Boinc runs here on:
Intel i7-3770K (only 4 Cores run BOINC)
Nvidia GT-630 (Fermi)

w1hue
Volunteer tester
Send message
Joined: 4 Aug 00
Posts: 48
Credit: 1,536,720
RAC: 1,636
United States
Message 1280804 - Posted: 6 Sep 2012, 23:49:32 UTC - in response to Message 1280517.
Last modified: 6 Sep 2012, 23:57:37 UTC

b) I'm no Lunatics expert but maybe it's because you haven't installed the apps for CPU? It could be you have, and just haven't returned any results yet but CPU lunatics isn't showing up on the application details page of your 520 PC.

Well, no, I haven't installed the Lunatics apps for the CPU -- does that matter? (I guess I could install them and see what happens...) I'm running WU's from projects that don't support GPU's in the CPU, but not any from projects that support the GT 520. I'm currently running SETI, Einstein and Milkyway WU's in the 520 (and NOT in the CPU).

This brings up another question: Since there is no Lunatics NVIDIA GPU app for Astropulse (as of yet...), will my machine run GPU Astropulse WU's using the standard SETI app, or do I need to add something the the app_info file? And if so, what? The answer may be out there somewhere, but a search here and on the Lunatics site hasn't turned it up...
____________

Profile Snowmain
Avatar
Send message
Joined: 17 Nov 05
Posts: 74
Credit: 8,544,235
RAC: 15,089
United States
Message 1280834 - Posted: 7 Sep 2012, 2:04:42 UTC - in response to Message 1280558.
Last modified: 7 Sep 2012, 2:53:27 UTC

@ The Chosen.....and everybody else.


For my 2 cents the 229$ on the gtx 570 is the price performance Delta.
My hope is that Sept 16th when the newest GTX 650 and 660 come out it will push down the price of gtx 570. Here's hoping.

Finding power consumption #'s on the mobile processors was very difficult. Since the likelyhood of them being used is so low when I foujnd a number I didn't look any further...so they very well could be wrong( as any of these numbers could be wrong).
____________

Profile Sunny129
Avatar
Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1280858 - Posted: 7 Sep 2012, 3:50:59 UTC - in response to Message 1280804.

This brings up another question: Since there is no Lunatics NVIDIA GPU app for Astropulse (as of yet...), will my machine run GPU Astropulse WU's using the standard SETI app, or do I need to add something the the app_info file? And if so, what? The answer may be out there somewhere, but a search here and on the Lunatics site hasn't turned it up...

yes, once you enable AP tasks via your web preferences, your host should eventually download the stock nVidia OpenCL Astropulse binaries, and tasks will of course follow when they become available. you really only need an entry for it in the app_info.xml if you want to run multiple tasks in parallel/increase GPU utilization/decrease CPU utilization/mitigate GUI lag/etc.
____________

w1hue
Volunteer tester
Send message
Joined: 4 Aug 00
Posts: 48
Credit: 1,536,720
RAC: 1,636
United States
Message 1280862 - Posted: 7 Sep 2012, 4:05:02 UTC - in response to Message 1280858.

yes, once you enable AP tasks via your web preferences, your host should eventually download the stock nVidia OpenCL Astropulse binaries, and tasks will of course follow when they become available.

Thanks for the reply. The stock binaries haven't appeared yet, but maybe I need to de-select AP, update, and then re-select AP.

you really only need an entry for it in the app_info.xml if you want to run multiple tasks in parallel/increase GPU utilization/decrease CPU utilization/mitigate GUI lag/etc.

This brings up another question -- where can I find info on what all can be done via settings in the app_info.xml file??? Can multiple stock AP tasks be executed in the GPU? I didn't think that was possible...
____________

jthon
Volunteer tester
Avatar
Send message
Joined: 12 May 08
Posts: 18
Credit: 10,572,495
RAC: 3,088
United States
Message 1280865 - Posted: 7 Sep 2012, 4:16:39 UTC

not getting any WU's at all, are we on stand-by or something?

and!

how does the work: <avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</max_ncpus>

Profile Sunny129
Avatar
Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1280870 - Posted: 7 Sep 2012, 4:31:35 UTC - in response to Message 1280862.

yes, once you enable AP tasks via your web preferences, your host should eventually download the stock nVidia OpenCL Astropulse binaries, and tasks will of course follow when they become available.

Thanks for the reply. The stock binaries haven't appeared yet, but maybe I need to de-select AP, update, and then re-select AP.

perhaps it isn't supposed to download the executable and the associated files until new AP tasks are actually ready to be sent to your host...i really don't know. i would try manually updating the project from within BOINC before i try deselecting and re-selecting AP tasks in the web preferences. worst case it tells you that AP tasks aren't available at this time, and you'll get the binaries when tasks become available.


you really only need an entry for it in the app_info.xml if you want to run multiple tasks in parallel/increase GPU utilization/decrease CPU utilization/mitigate GUI lag/etc.

This brings up another question -- where can I find info on what all can be done via settings in the app_info.xml file??? Can multiple stock AP tasks be executed in the GPU? I didn't think that was possible...

come to think of it, i'm not entirely sure if it would even be worth it to try to run more than a single task at a time on a GT 520. your card may have enough VRAM to run more than one task at a time, but a single task just might come close to maxing out your GPU utilization. really there's only one way to find out - run a single AP task, and then try two at a time. if they finish in less than twice the run time of the task that ran by itself, then your card can benefit from multiple tasks at once. rinse and repeat...although i can tell you right away that the 1GB of VRAM on GPUs like yours (and even my otherwise much more powerful GTX 560 Ti's) will not be enough to run 3 tasks in parallel...not without over-utilizing VRAM and increasing run times. at any rate, here's how the AP nVidia section of my app_info.xml reads:

<app_info>

<app>
<name>astropulse_v6</name>
</app>
<file_info>
<name>AP6_win_x86_SSE2_OpenCL_NV_r1316.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v6</app_name>
<version_num>604</version_num>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_fermi</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>0.5</count>
</coproc>
<file_ref>
<file_name>AP6_win_x86_SSE2_OpenCL_NV_r1316.exe</file_name>
<main_program/>
</file_ref>
</app_version>

</app_info>

the <count>n</count> statement is the one that controls the number of tasks running in parallel, where n=1 corresponds to 1 task, n=0.5 corresponds to 2 tasks, n=0.33 corresponds to 3 tasks, and so on and so forth...
____________

w1hue
Volunteer tester
Send message
Joined: 4 Aug 00
Posts: 48
Credit: 1,536,720
RAC: 1,636
United States
Message 1280878 - Posted: 7 Sep 2012, 5:10:32 UTC - in response to Message 1280870.
Last modified: 7 Sep 2012, 5:18:09 UTC

the <count>n</count> statement is the one that controls the number of tasks running in parallel, where n=1 corresponds to 1 task, n=0.5 corresponds to 2 tasks, n=0.33 corresponds to 3 tasks, and so on and so forth...

Well, I know about <count> and I finally found some info on <flops>, but there is stuff in there that I don't entirely understand (for example, what's <avg_ncpus> mean?). It would be nice if the parameters in the app_info file were documented someplace...

I am currently running two SETI tasks in the 520 -- they appear to complete in somewhat less than twice the time for a single task, so It looks like I am coming out ahead. GPU-Z shows 99% GPU Load, 74% Memory Controller Load, 466 MB Memory Used and GPU Temp of 79 deg C when running two SETI enhanced tasks in parallel. GPU Load was 92 - 93% for a single task. Interesting that even under 99% GPU Load, I don't see any effect on the display with GPU tasks running.

But it would be nice if I could get Fred's test program to run on my machine...
____________

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 13 · Next

Message boards : Number crunching : Optimize your GPU. Find the value the easy way.

Copyright © 2014 University of California