Multiple Cuda tasks on one GPU? Broke it.

Author	Message
kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51540 Credit: 1,018,363,574 RAC: 1,004	Message 1011238 - Posted: 3 Jul 2010, 14:05:06 UTC Last modified: 3 Jul 2010, 14:10:13 UTC OK.... What did I hose up now??? I tried to change the app_info on one of my rigs that still has some Cuda tasks left to see what it would do running 2 tasks on the GPU (GTX260). I set <count>1</count> to .50 2 instances. Now it won't start any Cuda tasks at all. The one that was running went to waiting to run status. Thinking it might have gotten stuck, I aborted that one task. Restarted Boinc, and now it is running 4 tasks on the CPU, but won't start anything on the GPU. Is this supposed to work with any combination of cards, dll's, etc.? "Time is simply the mechanism that keeps everything from happening all at once." ID: 1011238 ·

Helli_retiered Volunteer tester Send message Joined: 15 Dec 99 Posts: 707 Credit: 108,785,585 RAC: 0	Message 1011252 - Posted: 3 Jul 2010, 14:55:21 UTC - in response to Message 1011238. Last modified: 3 Jul 2010, 15:00:00 UTC A few Days ago i set <count>0.5</count> After a Restart on both GTX 280 BOINC starts two Tasks. I let it run over Night, but switched back to >1< in the Morning because the Computing Time was 40min instead of 11min. Helli ID: 1011252 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51540 Credit: 1,018,363,574 RAC: 1,004	Message 1011259 - Posted: 3 Jul 2010, 15:07:04 UTC Well..... It's REALLY borked now. When the MB task it was running finished, it launched 25 Cuda tasks.... None of which is running on the GPU. I'm gonna change it back to 1 and hope Boinc can sort it later....much later. "Time is simply the mechanism that keeps everything from happening all at once." ID: 1011259 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1011264 - Posted: 3 Jul 2010, 15:13:50 UTC - in response to Message 1011259. Last modified: 3 Jul 2010, 15:14:46 UTC I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1011264 ·

SciManStev Volunteer tester Send message Joined: 20 Jun 99 Posts: 6662 Credit: 121,090,076 RAC: 0	Message 1011265 - Posted: 3 Jul 2010, 15:15:13 UTC - in response to Message 1011259. Well..... It's REALLY borked now. When the MB task it was running finished, it launched 25 Cuda tasks.... None of which is running on the GPU. I'm gonna change it back to 1 and hope Boinc can sort it later....much later. I feel your pain. I really do. I also must say that this message and the one where you got a gazillion credits, made me laugh as they way you described the incedents was just excellent! It's nice to smile a little when there is nothing else you can do. :) Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website ID: 1011265 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51540 Credit: 1,018,363,574 RAC: 1,004	Message 1011267 - Posted: 3 Jul 2010, 15:17:00 UTC - in response to Message 1011264. I could be wrong, but I thought multiple Cuda contexts on one card is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) Jason LOL..... It would appear you may be quite right, my friend. This is on Win2K Advanced Server with your installer and 2.3 dll's. I guess I just started a new Boinc project... Don'ttrythis@home "Time is simply the mechanism that keeps everything from happening all at once." ID: 1011267 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51540 Credit: 1,018,363,574 RAC: 1,004	Message 1011275 - Posted: 3 Jul 2010, 15:51:24 UTC Last modified: 3 Jul 2010, 15:51:48 UTC Well....finally got it going again. Took half an hour to get the app_info edited because the rig was so tied up. Did the edit back to 1 and rebooted, now there are 24 Cuda tasks waiting to run and 1 running on the GPU again. Meow. "Time is simply the mechanism that keeps everything from happening all at once." ID: 1011275 ·

Area 51 Send message Joined: 31 Jan 04 Posts: 965 Credit: 42,193,520 RAC: 0	Message 1011282 - Posted: 3 Jul 2010, 16:16:32 UTC - in response to Message 1011264. I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) I tried it recently under Windows 7 on my 285. Performance was really bad, so I backed out that change. It didn't error, it just ran VVVERRRRRYYYY SLLLOWWWLLLYYYY! ID: 1011282 ·

TheFreshPrince a.k.a. BlueTooth76 Send message Joined: 4 Jun 99 Posts: 210 Credit: 10,315,944 RAC: 0	Message 1011298 - Posted: 3 Jul 2010, 17:28:56 UTC - in response to Message 1011282. I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) I tried it recently under Windows 7 on my 285. Performance was really bad, so I backed out that change. It didn't error, it just ran VVVERRRRRYYYY SLLLOWWWLLLYYYY! I run 3 tasks on my GTX470 and that gives about 50% higher output. But I think that's because 1 task doesn't use full resources on the Fermi cards yet. Rig name: "x6Crunchy" OS: Win 7 x64 MB: Asus M4N98TD EVO CPU: AMD X6 1055T 2.8(1,2v) GPU: 2x Asus GTX560ti Member of: Dutch Power Cows ID: 1011298 ·

TheFreshPrince a.k.a. BlueTooth76 Send message Joined: 4 Jun 99 Posts: 210 Credit: 10,315,944 RAC: 0	Message 1011299 - Posted: 3 Jul 2010, 17:30:13 UTC - in response to Message 1011282. I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) I tried it recently under Windows 7 on my 285. Performance was really bad, so I backed out that change. It didn't error, it just ran VVVERRRRRYYYY SLLLOWWWLLLYYYY! I run 3 tasks on my GTX470 and that gives about 50% higher output. But I think that's because 1 task doesn't use full resources on the Fermi cards yet. Rig name: "x6Crunchy" OS: Win 7 x64 MB: Asus M4N98TD EVO CPU: AMD X6 1055T 2.8(1,2v) GPU: 2x Asus GTX560ti Member of: Dutch Power Cows ID: 1011299 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51540 Credit: 1,018,363,574 RAC: 1,004	Message 1011304 - Posted: 3 Jul 2010, 17:53:08 UTC - in response to Message 1011298. I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) I tried it recently under Windows 7 on my 285. Performance was really bad, so I backed out that change. It didn't error, it just ran VVVERRRRRYYYY SLLLOWWWLLLYYYY! I run 3 tasks on my GTX470 and that gives about 50% higher output. But I think that's because 1 task doesn't use full resources on the Fermi cards yet. Yes, apparently this tweak is only helpful for the Fermi class GPU's. My GTX295's and 260's usually show 93-97% GPU utilization in the normal setup, so I knew if it worked at all, it would be a marginal difference at best. Now we know..... "Time is simply the mechanism that keeps everything from happening all at once." ID: 1011304 ·

Questor Volunteer tester Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157	Message 1019415 - Posted: 25 Jul 2010, 10:56:41 UTC - in response to Message 1011264. Last modified: 25 Jul 2010, 11:01:40 UTC I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) Jason Tried this out this weekend. I am currently running 3 tasks on a GTX470. The machine is using the downloaded stock apps (CUDA3.0) and is running on Windows XP 64 bit. Havent done any benchmarking yet - just testing that it runs OK. This is on an i7 920 with 2 GPUS and I have set avg/max ncpus to 0.4 which leaves 2 out of 8 free threads to help feed the GPUs (probably overkill as no maxed out threads and load times of about 20 sec per task). This is with drivers 19775 and 25721. John. GPU Users Group ID: 1019415 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1019420 - Posted: 25 Jul 2010, 11:18:38 UTC - in response to Message 1019415. Last modified: 25 Jul 2010, 11:22:21 UTC Tried this out this weekend. I am currently running 3 tasks on a GTX470...and is running on Windows XP 64 bit. Great. I did some more reading on the driver model differences themselves to clarify. The major difference is the memory management model used, followed by the Fermi specific context switching hardware. On XP with Fermi, the limit you'll hit first with number of instances is the memory on board, since each instance takes a portion of the physical RAM. Then the amount of overhead to switch between contexts as a tradeoff against utilisation benefits. Three instances sounds 'about right' for Stock Fermi application, before overhead eats into the increased utilisation benefit. For completion, on Vista/Win7 with WDDM driver model it is possible to run more instances, since each instance will see its own memory space. Overheads are higher once the total Video RAM is utilised though, as video memory is paged across the PCIe bus on context switch if needed. By rights, the end result in terms of utilisation & overheads is similar provided the total memory summed from all apps using the video memory is the less than the total available, though no doubt the newer WDDM drivers still have some room to mature until they reach similar efficiency to the XP ones, and under heavy loading cache effects will come into play as well. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1019420 ·

Terror Australis Volunteer tester Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44	Message 1019461 - Posted: 25 Jul 2010, 14:12:30 UTC - in response to Message 1019420. Last modified: 25 Jul 2010, 14:15:51 UTC though no doubt the newer WDDM drivers still have some room to mature until they reach similar efficiency to the XP ones, and under heavy loading cache effects will come into play as well. Jason Hi Jason. What version drivers are you referring too? I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit. Is this the most efficient combination for XP? If not, what combination do you recommend ? T.A. ID: 1019461 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1019515 - Posted: 25 Jul 2010, 18:09:08 UTC - in response to Message 1019461. Last modified: 25 Jul 2010, 18:09:53 UTC Hi Jason. What version drivers are you referring too? I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit. Is this the most efficient combination for XP? If not, what combination do you recommend ? T.A. Basically any Vista/Win7 drivers should be WDDM type, which is microsoft's new model. Those are less mature than XP ones. Since the multiple application contexts are working under XP32, whatever works for you should be fine up to physical video RAM limits. I guess you've probably already compared times running a single instance versus 2 & 3, and decided 3 x stock is the most productive (?). These comments shouldn't change that, and are only 'sussing out' what actually changed to make this possible & beneficial in the new hardware (which is important for development, 200 series was at best a wash this way). For reference, with a newer experimental build that has higher GPU utilisation, under Win7 x64 on the same driver version (but WDDM) my 480 runs midrange tasks in about 7 minutes one at a time, or 2 in 13 minutes, so each task is ~30 seconds better off running 2. Haven't tried 3 yet, but I expect it to be slower total throughput, as the utilisation is already quite high. As we (gradually) approach a more highly optimised fermi compatible build, and hardware utilisation improves, I'd expect single instance to provide the highest throughput, since that mode creates less switching overhead (even if that overhead is very small in Fermi drivers/hardware) Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1019515 ·

Questor Volunteer tester Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157	Message 1019532 - Posted: 25 Jul 2010, 19:12:47 UTC - in response to Message 1019461. Last modified: 25 Jul 2010, 19:13:18 UTC though no doubt the newer WDDM drivers still have some room to mature until they reach similar efficiency to the XP ones, and under heavy loading cache effects will come into play as well. Jason Hi Jason. What version drivers are you referring too? I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit. Is this the most efficient combination for XP? If not, what combination do you recommend ? T.A. Have you been running 3 tasks with the downlaod quotas in place? Just wondered what that does to the number of tasks you can download and whether you run dry during the outage. As far as SETI/Boinc is concerned you only have one device but are effectively getting through up to 3 times as many tasks but the completion times are for a single task. That should nicely confuse things? John GPU Users Group ID: 1019532 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1019545 - Posted: 25 Jul 2010, 21:28:17 UTC - in response to Message 1019532. ...I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit. Is this the most efficient combination for XP? If not, what combination do you recommend ? T.A. Have you been running 3 tasks with the downlaod quotas in place? Just wondered what that does to the number of tasks you can download and whether you run dry during the outage. As far as SETI/Boinc is concerned you only have one device but are effectively getting through up to 3 times as many tasks but the completion times are for a single task. That should nicely confuse things? John Doing 3 at once doesn't get through 3 times as many, just somewhat more than doing them 1 at a time serially. It just looks like a faster GTX 470 to the server code. T.A.'s GTX 470 system has a download quota of 2312 (289 * 8) for GPU and has only gotten 124 so far today, so that doesn't seem to be an issue. The in progress limit is probably in effect, whether it gets bumped from 320 per GPU to 2560 or limits are removed totally Monday it should be able to build cache further. Even if a -12 error or similar resets the quota to less than 800, it is turning in enough good work to rebuild the quota before the Tuesday shutdown. Lasting through the outage may depend more on how difficult it is to actually download assigned work than on how much work was assigned. Joe ID: 1019545 ·

Terror Australis Volunteer tester Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44	Message 1019657 - Posted: 26 Jul 2010, 8:57:47 UTC - in response to Message 1019532. Have you been running 3 tasks with the downlaod quotas in place? Just wondered what that does to the number of tasks you can download and whether you run dry during the outage. As far as SETI/Boinc is concerned you only have one device but are effectively getting through up to 3 times as many tasks but the completion times are for a single task. That should nicely confuse things? I only got it up and running early Sunday morning my time. The computer had approx 320 GPU units in it's cache that I managed to stop from self destructing during the card/driver/app change over. It's crunching these quite happily and getting a 1 for 1 replacement every time it reports in. As I write its 0200 Monday, Berkeley time, We'll see what happens when the brakes come off in 6 or 7 hours. :-) T.A. ID: 1019657 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.