Multiple Cuda tasks on one GPU? Broke it.

Message boards : Number crunching : Multiple Cuda tasks on one GPU? Broke it.
Message board moderation

To post messages, you must log in.

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1011238 - Posted: 3 Jul 2010, 14:05:06 UTC
Last modified: 3 Jul 2010, 14:10:13 UTC

OK....
What did I hose up now???

I tried to change the app_info on one of my rigs that still has some Cuda tasks left to see what it would do running 2 tasks on the GPU (GTX260).

I set <count>1</count> to .50
2 instances.

Now it won't start any Cuda tasks at all. The one that was running went to waiting to run status. Thinking it might have gotten stuck, I aborted that one task. Restarted Boinc, and now it is running 4 tasks on the CPU, but won't start anything on the GPU.

Is this supposed to work with any combination of cards, dll's, etc.?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1011238 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1011252 - Posted: 3 Jul 2010, 14:55:21 UTC - in response to Message 1011238.  
Last modified: 3 Jul 2010, 15:00:00 UTC

A few Days ago i set <count>0.5</count> After a Restart on both GTX 280 BOINC starts two Tasks. I let
it run over Night, but switched back to >1< in the Morning because the Computing Time was 40min instead of 11min.



Helli
ID: 1011252 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1011259 - Posted: 3 Jul 2010, 15:07:04 UTC

Well.....
It's REALLY borked now.

When the MB task it was running finished, it launched 25 Cuda tasks....
None of which is running on the GPU.

I'm gonna change it back to 1 and hope Boinc can sort it later....much later.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1011259 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1011264 - Posted: 3 Jul 2010, 15:13:50 UTC - in response to Message 1011259.  
Last modified: 3 Jul 2010, 15:14:46 UTC

I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that)

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1011264 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1011265 - Posted: 3 Jul 2010, 15:15:13 UTC - in response to Message 1011259.  

Well.....
It's REALLY borked now.

When the MB task it was running finished, it launched 25 Cuda tasks....
None of which is running on the GPU.

I'm gonna change it back to 1 and hope Boinc can sort it later....much later.


I feel your pain. I really do. I also must say that this message and the one where you got a gazillion credits, made me laugh as they way you described the incedents was just excellent! It's nice to smile a little when there is nothing else you can do. :)

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1011265 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1011267 - Posted: 3 Jul 2010, 15:17:00 UTC - in response to Message 1011264.  

I could be wrong, but I thought multiple Cuda contexts on one card is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that)

Jason

LOL.....
It would appear you may be quite right, my friend.
This is on Win2K Advanced Server with your installer and 2.3 dll's.

I guess I just started a new Boinc project...
Don'ttrythis@home

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1011267 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1011275 - Posted: 3 Jul 2010, 15:51:24 UTC
Last modified: 3 Jul 2010, 15:51:48 UTC

Well....finally got it going again.
Took half an hour to get the app_info edited because the rig was so tied up.
Did the edit back to 1 and rebooted, now there are 24 Cuda tasks waiting to run and 1 running on the GPU again.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1011275 · Report as offensive
Profile Area 51
Avatar

Send message
Joined: 31 Jan 04
Posts: 965
Credit: 42,193,520
RAC: 0
United Kingdom
Message 1011282 - Posted: 3 Jul 2010, 16:16:32 UTC - in response to Message 1011264.  

I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that)


I tried it recently under Windows 7 on my 285. Performance was really bad, so I backed out that change. It didn't error, it just ran VVVERRRRRYYYY SLLLOWWWLLLYYYY!
ID: 1011282 · Report as offensive
TheFreshPrince a.k.a. BlueTooth76
Avatar

Send message
Joined: 4 Jun 99
Posts: 210
Credit: 10,315,944
RAC: 0
Netherlands
Message 1011298 - Posted: 3 Jul 2010, 17:28:56 UTC - in response to Message 1011282.  

I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that)


I tried it recently under Windows 7 on my 285. Performance was really bad, so I backed out that change. It didn't error, it just ran VVVERRRRRYYYY SLLLOWWWLLLYYYY!


I run 3 tasks on my GTX470 and that gives about 50% higher output.

But I think that's because 1 task doesn't use full resources on the Fermi cards yet.
Rig name: "x6Crunchy"
OS: Win 7 x64
MB: Asus M4N98TD EVO
CPU: AMD X6 1055T 2.8(1,2v)
GPU: 2x Asus GTX560ti
Member of: Dutch Power Cows
ID: 1011298 · Report as offensive
TheFreshPrince a.k.a. BlueTooth76
Avatar

Send message
Joined: 4 Jun 99
Posts: 210
Credit: 10,315,944
RAC: 0
Netherlands
Message 1011299 - Posted: 3 Jul 2010, 17:30:13 UTC - in response to Message 1011282.  

I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that)


I tried it recently under Windows 7 on my 285. Performance was really bad, so I backed out that change. It didn't error, it just ran VVVERRRRRYYYY SLLLOWWWLLLYYYY!


I run 3 tasks on my GTX470 and that gives about 50% higher output.

But I think that's because 1 task doesn't use full resources on the Fermi cards yet.
Rig name: "x6Crunchy"
OS: Win 7 x64
MB: Asus M4N98TD EVO
CPU: AMD X6 1055T 2.8(1,2v)
GPU: 2x Asus GTX560ti
Member of: Dutch Power Cows
ID: 1011299 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1011304 - Posted: 3 Jul 2010, 17:53:08 UTC - in response to Message 1011298.  

I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that)


I tried it recently under Windows 7 on my 285. Performance was really bad, so I backed out that change. It didn't error, it just ran VVVERRRRRYYYY SLLLOWWWLLLYYYY!


I run 3 tasks on my GTX470 and that gives about 50% higher output.

But I think that's because 1 task doesn't use full resources on the Fermi cards yet.

Yes, apparently this tweak is only helpful for the Fermi class GPU's.
My GTX295's and 260's usually show 93-97% GPU utilization in the normal setup, so I knew if it worked at all, it would be a marginal difference at best.

Now we know.....
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1011304 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1019415 - Posted: 25 Jul 2010, 10:56:41 UTC - in response to Message 1011264.  
Last modified: 25 Jul 2010, 11:01:40 UTC

I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that)

Jason



Tried this out this weekend. I am currently running 3 tasks on a GTX470.

The machine is using the downloaded stock apps (CUDA3.0) and is running on Windows XP 64 bit.

Havent done any benchmarking yet - just testing that it runs OK.

This is on an i7 920 with 2 GPUS and I have set avg/max ncpus to 0.4 which leaves 2 out of 8 free threads to help feed the GPUs (probably overkill as no maxed out threads and load times of about 20 sec per task).

This is with drivers 19775 and 25721.

John.
GPU Users Group



ID: 1019415 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1019420 - Posted: 25 Jul 2010, 11:18:38 UTC - in response to Message 1019415.  
Last modified: 25 Jul 2010, 11:22:21 UTC

Tried this out this weekend. I am currently running 3 tasks on a GTX470...and is running on Windows XP 64 bit.


Great. I did some more reading on the driver model differences themselves to clarify. The major difference is the memory management model used, followed by the Fermi specific context switching hardware. On XP with Fermi, the limit you'll hit first with number of instances is the memory on board, since each instance takes a portion of the physical RAM. Then the amount of overhead to switch between contexts as a tradeoff against utilisation benefits. Three instances sounds 'about right' for Stock Fermi application, before overhead eats into the increased utilisation benefit.

For completion, on Vista/Win7 with WDDM driver model it is possible to run more instances, since each instance will see its own memory space. Overheads are higher once the total Video RAM is utilised though, as video memory is paged across the PCIe bus on context switch if needed.

By rights, the end result in terms of utilisation & overheads is similar provided the total memory summed from all apps using the video memory is the less than the total available, though no doubt the newer WDDM drivers still have some room to mature until they reach similar efficiency to the XP ones, and under heavy loading cache effects will come into play as well.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1019420 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1019461 - Posted: 25 Jul 2010, 14:12:30 UTC - in response to Message 1019420.  
Last modified: 25 Jul 2010, 14:15:51 UTC

though no doubt the newer WDDM drivers still have some room to mature until they reach similar efficiency to the XP ones, and under heavy loading cache effects will come into play as well.

Jason


Hi Jason.
What version drivers are you referring too? I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit.

Is this the most efficient combination for XP? If not, what combination do you recommend ?

T.A.
ID: 1019461 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1019515 - Posted: 25 Jul 2010, 18:09:08 UTC - in response to Message 1019461.  
Last modified: 25 Jul 2010, 18:09:53 UTC

Hi Jason.
What version drivers are you referring too? I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit.

Is this the most efficient combination for XP? If not, what combination do you recommend ?

T.A.


Basically any Vista/Win7 drivers should be WDDM type, which is microsoft's new model. Those are less mature than XP ones. Since the multiple application contexts are working under XP32, whatever works for you should be fine up to physical video RAM limits. I guess you've probably already compared times running a single instance versus 2 & 3, and decided 3 x stock is the most productive (?). These comments shouldn't change that, and are only 'sussing out' what actually changed to make this possible & beneficial in the new hardware (which is important for development, 200 series was at best a wash this way).

For reference, with a newer experimental build that has higher GPU utilisation, under Win7 x64 on the same driver version (but WDDM) my 480 runs midrange tasks in about 7 minutes one at a time, or 2 in 13 minutes, so each task is ~30 seconds better off running 2. Haven't tried 3 yet, but I expect it to be slower total throughput, as the utilisation is already quite high.

As we (gradually) approach a more highly optimised fermi compatible build, and hardware utilisation improves, I'd expect single instance to provide the highest throughput, since that mode creates less switching overhead (even if that overhead is very small in Fermi drivers/hardware)

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1019515 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1019532 - Posted: 25 Jul 2010, 19:12:47 UTC - in response to Message 1019461.  
Last modified: 25 Jul 2010, 19:13:18 UTC

though no doubt the newer WDDM drivers still have some room to mature until they reach similar efficiency to the XP ones, and under heavy loading cache effects will come into play as well.

Jason


Hi Jason.
What version drivers are you referring too? I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit.

Is this the most efficient combination for XP? If not, what combination do you recommend ?

T.A.


Have you been running 3 tasks with the downlaod quotas in place? Just wondered what that does to the number of tasks you can download and whether you run dry during the outage. As far as SETI/Boinc is concerned you only have one device but are effectively getting through up to 3 times as many tasks but the completion times are for a single task. That should nicely confuse things?


John
GPU Users Group



ID: 1019532 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1019545 - Posted: 25 Jul 2010, 21:28:17 UTC - in response to Message 1019532.  

...I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit.

Is this the most efficient combination for XP? If not, what combination do you recommend ?

T.A.

Have you been running 3 tasks with the downlaod quotas in place? Just wondered what that does to the number of tasks you can download and whether you run dry during the outage. As far as SETI/Boinc is concerned you only have one device but are effectively getting through up to 3 times as many tasks but the completion times are for a single task. That should nicely confuse things?

John

Doing 3 at once doesn't get through 3 times as many, just somewhat more than doing them 1 at a time serially. It just looks like a faster GTX 470 to the server code.

T.A.'s GTX 470 system has a download quota of 2312 (289 * 8) for GPU and has only gotten 124 so far today, so that doesn't seem to be an issue. The in progress limit is probably in effect, whether it gets bumped from 320 per GPU to 2560 or limits are removed totally Monday it should be able to build cache further. Even if a -12 error or similar resets the quota to less than 800, it is turning in enough good work to rebuild the quota before the Tuesday shutdown. Lasting through the outage may depend more on how difficult it is to actually download assigned work than on how much work was assigned.
                                                               Joe
ID: 1019545 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1019657 - Posted: 26 Jul 2010, 8:57:47 UTC - in response to Message 1019532.  

Have you been running 3 tasks with the downlaod quotas in place? Just wondered what that does to the number of tasks you can download and whether you run dry during the outage. As far as SETI/Boinc is concerned you only have one device but are effectively getting through up to 3 times as many tasks but the completion times are for a single task. That should nicely confuse things?

I only got it up and running early Sunday morning my time. The computer had approx 320 GPU units in it's cache that I managed to stop from self destructing during the card/driver/app change over. It's crunching these quite happily and getting a 1 for 1 replacement every time it reports in. As I write its 0200 Monday, Berkeley time, We'll see what happens when the brakes come off in 6 or 7 hours. :-)

T.A.
ID: 1019657 · Report as offensive

Message boards : Number crunching : Multiple Cuda tasks on one GPU? Broke it.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.