Message boards :
Number crunching :
Just added 3rd GPU and CPU is 'Waiting for Memory'
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
So Win7 Ultimate x64 with Woodgie's 6GB GTX Titan plus two 2GB 750ti's would like to have more than 9GB of shared kernel memory for that VRAM backup. With 16GB installed RAM implying 8GB kernel memory there must be some workaround in the driver model. Yes, with a 32 bit application, then at least the Cuda Runtime (and underlying DirextX based driver it uses) can only 'use' 4GiB on an instance on one given device (minus some overheads). How much is really there and paged in is supposed to be transparently managed underneath. OpenCL is closer to the driver runtime, so reporting a different (unusable higher physical) number may or may not make sense. each instance will see its own space, whatever the OS pages in, so filling >4GiB is possible with 32 bit instances. For 64 bit instances there'll be some tradoffs. less computation in the virtualisation host side, but bigger addresses on the GPU, which chew up more registers. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yes, The WDDM (Windows Display Driver Model, Vista onwards), replacing the XPDM (XP Driver Model, XP and XBox) http://en.wikipedia.org/wiki/Windows_Display_Driver_Model Those include DirectX/Direct3D and other specifications for hardware, firmware and software driver interfaces. (2) Can you tell us how this differs from Linux & Mac OS X and does this make a difference as to how efficient the platform is as a number crunching entity; that is, does the latency introduced by the Windows 'double-buffering' affect how fast the same working would be crunched on Windows vs. Linux/OS X, all other things being equal. (Yes, I am aware I'm asking you to explain how long a piece of quantum superstring is :) ) Given the hardware and firmware parts are part of the specs, The OS specific implementation would likely vary by which features it implements. Just because card vendors wouldn't want to maintain completely separate driver forks, there is overlap with other specifications like VESA and OPENGL etc. Anecdotally ( YMMV) I find the Linux drivers pretty much the same functionally, though no DirectX, but OpenGL and Cuda. OpenGL does some certain things faster/leaner, and others not as well. That's probably some part of the motivation behind AMD's Mantle ( Low latency driver architecture) to bypass this, and for compute only devices NV use special 'Tesla Compute Cluster' drivers as well. On Mac I have no direct experience with the drivers etc, though other devs reported to me much higher latencies, perhaps indicating more buffering going on underneath. 3) Would adding more RAM help the issue, i.e. reduce paging or is it "not as simple as that". I've got 8GB RAM across the 3 cards (4+2+2) so I'm assuming it's trying to reserve 8GB kernel space to call its own. (I don't know where to find the window you showed to check). yes and no. IF you actually use more total VRAM, across all devices, instance, apps, overheads etc. than about half your kernel space ( ~4GiB in your case, with 16GiB host RAM installed) then I'd say yes. If you don't use near or more than 4GiB then paging should already be pretty minimal depending on what you do with the machine. That's a lot of virtualisation and large memory amounts, probably further justification for AMD's Mantle, and DirectX12 changes. Meaning the whole picture could well change (must see how far they are with Win10 this weekend, DirectX12 there yet?) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
extra info from the linked wiki entry, that clarifies a bit probably: WDDM 2.0 That does have implications for Cuda & OpenCL, and implies particular hardware features. I also interpret it as saying that what we have now doesn't manage virtualised video memory efficiently, and has CPU utilisation issues with multithreading parallelism (Which we see in evidence) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.