Just added 3rd GPU and CPU is 'Waiting for Memory'

Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory'
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1656977 - Posted: 26 Mar 2015, 0:44:38 UTC - in response to Message 1656782.  
Last modified: 26 Mar 2015, 0:47:42 UTC

So Win7 Ultimate x64 with Woodgie's 6GB GTX Titan plus two 2GB 750ti's would like to have more than 9GB of shared kernel memory for that VRAM backup. With 16GB installed RAM implying 8GB kernel memory there must be some workaround in the driver model.

{edit} Standard memory for the GTX Titan is 6GB, and the OpenCL AP task details are showing that amount, so I assume the card actually does have it even though the CUDA task details only show 4GB.
                                                                   Joe


Yes, with a 32 bit application, then at least the Cuda Runtime (and underlying DirextX based driver it uses) can only 'use' 4GiB on an instance on one given device (minus some overheads). How much is really there and paged in is supposed to be transparently managed underneath. OpenCL is closer to the driver runtime, so reporting a different (unusable higher physical) number may or may not make sense. each instance will see its own space, whatever the OS pages in, so filling >4GiB is possible with 32 bit instances.

For 64 bit instances there'll be some tradoffs. less computation in the virtualisation host side, but bigger addresses on the GPU, which chew up more registers.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1656977 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1656988 - Posted: 26 Mar 2015, 1:07:57 UTC - in response to Message 1656776.  
Last modified: 26 Mar 2015, 1:14:31 UTC


(1) When you say 'Windows display driver model' I take it you mean Microsoft have dictated "This is how you need to write a driver to interface between your hardware and the OS because this is how we've designed the OS'.


Yes, The WDDM (Windows Display Driver Model, Vista onwards), replacing the XPDM (XP Driver Model, XP and XBox)
http://en.wikipedia.org/wiki/Windows_Display_Driver_Model
Those include DirectX/Direct3D and other specifications for hardware, firmware and software driver interfaces.


(2) Can you tell us how this differs from Linux & Mac OS X and does this make a difference as to how efficient the platform is as a number crunching entity; that is, does the latency introduced by the Windows 'double-buffering' affect how fast the same working would be crunched on Windows vs. Linux/OS X, all other things being equal. (Yes, I am aware I'm asking you to explain how long a piece of quantum superstring is :) )

Given the hardware and firmware parts are part of the specs, The OS specific implementation would likely vary by which features it implements. Just because card vendors wouldn't want to maintain completely separate driver forks, there is overlap with other specifications like VESA and OPENGL etc. Anecdotally ( YMMV) I find the Linux drivers pretty much the same functionally, though no DirectX, but OpenGL and Cuda. OpenGL does some certain things faster/leaner, and others not as well. That's probably some part of the motivation behind AMD's Mantle ( Low latency driver architecture) to bypass this, and for compute only devices NV use special 'Tesla Compute Cluster' drivers as well. On Mac I have no direct experience with the drivers etc, though other devs reported to me much higher latencies, perhaps indicating more buffering going on underneath.

3) Would adding more RAM help the issue, i.e. reduce paging or is it "not as simple as that". I've got 8GB RAM across the 3 cards (4+2+2) so I'm assuming it's trying to reserve 8GB kernel space to call its own. (I don't know where to find the window you showed to check).

yes and no. IF you actually use more total VRAM, across all devices, instance, apps, overheads etc. than about half your kernel space ( ~4GiB in your case, with 16GiB host RAM installed) then I'd say yes. If you don't use near or more than 4GiB then paging should already be pretty minimal depending on what you do with the machine.

That's a lot of virtualisation and large memory amounts, probably further justification for AMD's Mantle, and DirectX12 changes. Meaning the whole picture could well change (must see how far they are with Win10 this weekend, DirectX12 there yet?)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1656988 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1657001 - Posted: 26 Mar 2015, 1:30:44 UTC
Last modified: 26 Mar 2015, 1:46:37 UTC

extra info from the linked wiki entry, that clarifies a bit probably:
WDDM 2.0
Direct3D 12 API, announced at Build 2014, will require WDDM 2.0. The new API will do away with automatic resource-management and pipeline-management tasks and allow developers to take full low-level control of adapter memory and rendering states. WDDM 2.0 dramatically reduces workload on the kernel-mode driver for GPUs that support virtual memory addressing,[36] which allows multithreading parallelism in the user-mode driver and results in lower CPU utilization.[37][38][39] WDDM 2.0 will ship with Windows 10. [40]


That does have implications for Cuda & OpenCL, and implies particular hardware features. I also interpret it as saying that what we have now doesn't manage virtualised video memory efficiently, and has CPU utilisation issues with multithreading parallelism (Which we see in evidence)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1657001 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory'


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.