Message boards :
Number crunching :
Multiple Cuda tasks on one GPU? Broke it.
Message board moderation
Author | Message |
---|---|
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
OK.... What did I hose up now??? I tried to change the app_info on one of my rigs that still has some Cuda tasks left to see what it would do running 2 tasks on the GPU (GTX260). I set <count>1</count> to .50 2 instances. Now it won't start any Cuda tasks at all. The one that was running went to waiting to run status. Thinking it might have gotten stuck, I aborted that one task. Restarted Boinc, and now it is running 4 tasks on the CPU, but won't start anything on the GPU. Is this supposed to work with any combination of cards, dll's, etc.? "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Helli_retiered Send message Joined: 15 Dec 99 Posts: 707 Credit: 108,785,585 RAC: 0 |
|
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Well..... It's REALLY borked now. When the MB task it was running finished, it launched 25 Cuda tasks.... None of which is running on the GPU. I'm gonna change it back to 1 and hope Boinc can sort it later....much later. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
SciManStev Send message Joined: 20 Jun 99 Posts: 6652 Credit: 121,090,076 RAC: 0 |
Well..... I feel your pain. I really do. I also must say that this message and the one where you got a gazillion credits, made me laugh as they way you described the incedents was just excellent! It's nice to smile a little when there is nothing else you can do. :) Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I could be wrong, but I thought multiple Cuda contexts on one card is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) LOL..... It would appear you may be quite right, my friend. This is on Win2K Advanced Server with your installer and 2.3 dll's. I guess I just started a new Boinc project... Don'ttrythis@home "Freedom is just Chaos, with better lighting." Alan Dean Foster |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Well....finally got it going again. Took half an hour to get the app_info edited because the rig was so tied up. Did the edit back to 1 and rebooted, now there are 24 Cuda tasks waiting to run and 1 running on the GPU again. Meow. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Area 51 Send message Joined: 31 Jan 04 Posts: 965 Credit: 42,193,520 RAC: 0 |
I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) I tried it recently under Windows 7 on my 285. Performance was really bad, so I backed out that change. It didn't error, it just ran VVVERRRRRYYYY SLLLOWWWLLLYYYY! |
TheFreshPrince a.k.a. BlueTooth76 Send message Joined: 4 Jun 99 Posts: 210 Credit: 10,315,944 RAC: 0 |
I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) I run 3 tasks on my GTX470 and that gives about 50% higher output. But I think that's because 1 task doesn't use full resources on the Fermi cards yet. Rig name: "x6Crunchy" OS: Win 7 x64 MB: Asus M4N98TD EVO CPU: AMD X6 1055T 2.8(1,2v) GPU: 2x Asus GTX560ti Member of: Dutch Power Cows |
TheFreshPrince a.k.a. BlueTooth76 Send message Joined: 4 Jun 99 Posts: 210 Credit: 10,315,944 RAC: 0 |
I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) I run 3 tasks on my GTX470 and that gives about 50% higher output. But I think that's because 1 task doesn't use full resources on the Fermi cards yet. Rig name: "x6Crunchy" OS: Win 7 x64 MB: Asus M4N98TD EVO CPU: AMD X6 1055T 2.8(1,2v) GPU: 2x Asus GTX560ti Member of: Dutch Power Cows |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) Yes, apparently this tweak is only helpful for the Fermi class GPU's. My GTX295's and 260's usually show 93-97% GPU utilization in the normal setup, so I knew if it worked at all, it would be a marginal difference at best. Now we know..... "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Questor Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157 |
I could be wrong, but I thought multiple Cuda contexts on one GPU is only possible under the WDDM driver model (Vista / Win7) and maybe only with cuda 3.0 & 3.1 builds (even less certain of that) Tried this out this weekend. I am currently running 3 tasks on a GTX470. The machine is using the downloaded stock apps (CUDA3.0) and is running on Windows XP 64 bit. Havent done any benchmarking yet - just testing that it runs OK. This is on an i7 920 with 2 GPUS and I have set avg/max ncpus to 0.4 which leaves 2 out of 8 free threads to help feed the GPUs (probably overkill as no maxed out threads and load times of about 20 sec per task). This is with drivers 19775 and 25721. John. GPU Users Group |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Tried this out this weekend. I am currently running 3 tasks on a GTX470...and is running on Windows XP 64 bit. Great. I did some more reading on the driver model differences themselves to clarify. The major difference is the memory management model used, followed by the Fermi specific context switching hardware. On XP with Fermi, the limit you'll hit first with number of instances is the memory on board, since each instance takes a portion of the physical RAM. Then the amount of overhead to switch between contexts as a tradeoff against utilisation benefits. Three instances sounds 'about right' for Stock Fermi application, before overhead eats into the increased utilisation benefit. For completion, on Vista/Win7 with WDDM driver model it is possible to run more instances, since each instance will see its own memory space. Overheads are higher once the total Video RAM is utilised though, as video memory is paged across the PCIe bus on context switch if needed. By rights, the end result in terms of utilisation & overheads is similar provided the total memory summed from all apps using the video memory is the less than the total available, though no doubt the newer WDDM drivers still have some room to mature until they reach similar efficiency to the XP ones, and under heavy loading cache effects will come into play as well. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
though no doubt the newer WDDM drivers still have some room to mature until they reach similar efficiency to the XP ones, and under heavy loading cache effects will come into play as well. Hi Jason. What version drivers are you referring too? I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit. Is this the most efficient combination for XP? If not, what combination do you recommend ? T.A. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Hi Jason. Basically any Vista/Win7 drivers should be WDDM type, which is microsoft's new model. Those are less mature than XP ones. Since the multiple application contexts are working under XP32, whatever works for you should be fine up to physical video RAM limits. I guess you've probably already compared times running a single instance versus 2 & 3, and decided 3 x stock is the most productive (?). These comments shouldn't change that, and are only 'sussing out' what actually changed to make this possible & beneficial in the new hardware (which is important for development, 200 series was at best a wash this way). For reference, with a newer experimental build that has higher GPU utilisation, under Win7 x64 on the same driver version (but WDDM) my 480 runs midrange tasks in about 7 minutes one at a time, or 2 in 13 minutes, so each task is ~30 seconds better off running 2. Haven't tried 3 yet, but I expect it to be slower total throughput, as the utilisation is already quite high. As we (gradually) approach a more highly optimised fermi compatible build, and hardware utilisation improves, I'd expect single instance to provide the highest throughput, since that mode creates less switching overhead (even if that overhead is very small in Fermi drivers/hardware) Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Questor Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157 |
though no doubt the newer WDDM drivers still have some room to mature until they reach similar efficiency to the XP ones, and under heavy loading cache effects will come into play as well. Have you been running 3 tasks with the downlaod quotas in place? Just wondered what that does to the number of tasks you can download and whether you run dry during the outage. As far as SETI/Boinc is concerned you only have one device but are effectively getting through up to 3 times as many tasks but the completion times are for a single task. That should nicely confuse things? John GPU Users Group |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
...I'm currently running 3 tasks on a GTX470 under XP-32 using 258.96 drivers and V32_30_14 dll's with the V6.10 CUDA app. They are taking between 20 and 40 mins per task depending on the AR of the unit. Doing 3 at once doesn't get through 3 times as many, just somewhat more than doing them 1 at a time serially. It just looks like a faster GTX 470 to the server code. T.A.'s GTX 470 system has a download quota of 2312 (289 * 8) for GPU and has only gotten 124 so far today, so that doesn't seem to be an issue. The in progress limit is probably in effect, whether it gets bumped from 320 per GPU to 2560 or limits are removed totally Monday it should be able to build cache further. Even if a -12 error or similar resets the quota to less than 800, it is turning in enough good work to rebuild the quota before the Tuesday shutdown. Lasting through the outage may depend more on how difficult it is to actually download assigned work than on how much work was assigned. Joe |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Have you been running 3 tasks with the downlaod quotas in place? Just wondered what that does to the number of tasks you can download and whether you run dry during the outage. As far as SETI/Boinc is concerned you only have one device but are effectively getting through up to 3 times as many tasks but the completion times are for a single task. That should nicely confuse things? I only got it up and running early Sunday morning my time. The computer had approx 320 GPU units in it's cache that I managed to stop from self destructing during the card/driver/app change over. It's crunching these quite happily and getting a 1 for 1 replacement every time it reports in. As I write its 0200 Monday, Berkeley time, We'll see what happens when the brakes come off in 6 or 7 hours. :-) T.A. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.