Posts by jason_gee


log in
1) Message boards : Number crunching : I need some help to find a software (Message 1658205)
Posted 6 hours ago by Profile jason_gee
erm, well just some points of order. Firstly if you have XP service pack 2 or newer, then you have Windows firewall, and AFAIK won't uninstall, but you would have to manually disable (which I would never recommend except under tightly controlled laboratory conditions, in isolation, behind a NAT router). Assuming you uninstalled some third party firewall mentioned in the thread (which I didn't read, freely admitted) I would recommend not dropping any more firewalls, but figure out how to open the requisite port(s) instead.

[Edit:] sorry if that sounds tetchy, it's not meant to. Just seen enough chaos induced by imprecise firewall setups to last me a lifetime ;)
2) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1657904)
Posted 21 hours ago by Profile jason_gee
Ah, I see, thanks! I was under the impression that the set of useful timings had been narrowed down and more runs done on some sets. Well indeed couldn't get an idea of skew under the current circumstances
3) Message boards : Number crunching : SSD and Bionic??? (Message 1657683)
Posted 1 day ago by Profile jason_gee
Fully updated, plain vanilla Win10 TP x64, seems to boot in 9 seconds to desktop here (disabled login screen). CPU is old Core2Duo 3GHz, 4 GiB DDR3 RAM, GTX 680, Samsung 850 Pro 256GiB. Sisoft Sandra Lite appears to confirm it's on a Sata 2 (3GB/s) link, so won't be pushing this drive with this machine.
4) Message boards : Number crunching : I need some help to find a software (Message 1657663)
Posted 1 day ago by Profile jason_gee
In that case I just use efMer's BoincTasks, and add each of my hosts :)
5) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1657655)
Posted 1 day ago by Profile jason_gee
Hi Jason,
now you confused me. ;-)
Before I test -oclFFT_plan, I need to go back to the first and following test runs and look (take into account) to average/median times?


I'm not completely sure, which is why the question :) What happened is that with Creditnew stuff (not directly related to here), Eric pointed out some time back that the times are some special kindof curve. average and median can be about the same, or there can be a skew.

If you check one with the most results, and they were pretty close (give or take a few seconds) then no skew to worry about. If there was a lot of skew ( like 10 or more seconds) then it'll say different things.
6) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1657644)
Posted 1 day ago by Profile jason_gee
Hi Dirk,
Just a question, of the settings with many run results, is the 'average' very different from the 'median' ? Just in other work that difference is becoming pretty important.
7) Message boards : Number crunching : I need some help to find a software (Message 1657539)
Posted 1 day ago by Profile jason_gee
I use a TINC VPN and TightVNC for similar purposes. Pretty complex to get running, but the most stable and powerful lightweight solution I found so far, given I don't mind tweaking obscure configuration files and router settings manually.

http://en.wikipedia.org/wiki/Tinc_%28protocol%29
http://en.wikipedia.org/wiki/TightVNC
8) Message boards : Number crunching : GTX 970 about as fast as a 670 for crunching (Message 1657455)
Posted 1 day ago by Profile jason_gee
Jason, do you want to divulge the predicted timeline for new apps that harness the power of the Kepler and Maxwell hardware? How far out are they.... 6 months, one year??

Cheers, Keith


Hard to predict with current family & personal issues, but with Cuda 7.0 being released a couple of days ago, that's one less technical roadblock. (testers were experiencing unexplained reliability issues with Cuda 6.0 and 6.5, and no appreciable performance gains with the current application (x41zc) architecture so they were out)

Also I've been migrating build system to Gradle ( see http://en.wikipedia.org/wiki/Gradle ), it complicates the timeline a bit. That's an extra development burden up front expected to ease cross platform release in the long run (so worthwhile)

Aside from the infrastructure changes, the reengineering parts involved place alpha test x42 builds within the 3 month timeframe. That's after the already confirmed architectural changes needed to reduce the chattiness, up the load with fewer instances, and scale better from the smallest Cuda device through to the TiTan-X.

So short version ~3 months to x42 alpha, which is more or less a completely reengineered design based on everything we found. Aside from improved Cuda scaling, it's expected to have support for OpenCL devices and in a later revision AP (though those come later, and not based on current code/techniques)

[Edit:] Note that Windows 10 release, and adapting to accomodate WDDM2.0/dirextX12 techniques & best practices may or may not extend the timeline. That'll probably be a bit cleaer after I get to play with the tech preview this weekend ( USB is made, machine & new SSD are waiting)
9) Message boards : Number crunching : SSD and Bionic??? (Message 1657401)
Posted 1 day ago by Profile jason_gee
I'll be finding out this weekend what Vanilla Win10 tech preview boot times will be on a 850 Pro 256 GiB, on an older Core2Duo that doubles as my Linux Box.

Back when I switched my main development machine to Intel chipset RAID 10, that was near vanilla Win7 w/sp1. clean like that it was able to acheive boot to desktop times in the sub 20 second region, though naturally filling the drive and all sorts of installed services and drivers entropy have brought that close to the 40 second mark.

That's 4 1TB seagate barracudas, 2 from the original raid install remaining, and 2 newer spares manufactured after the floods. The main dev machine only has Sata2, while the Linux machine may have sata3, which I'll be checking, so it'll be an interesting comparison for me.
10) Message boards : Number crunching : GTX 970 about as fast as a 670 for crunching (Message 1657392)
Posted 1 day ago by Profile jason_gee
Yes, We're hitting a number of limits in the application, rather than the GPU. For development purposes, Finding [& understanding] all those limits on my 980 is taking a significant amount of time, as some of those include how chatty the application is, scaling of the pretty small datasets, and some underlying system considerations.

For the time being, the best way is to up the process priority & run multiple instances (usually 2-3 per GPU), and the combined throughput should be around 2x a 670. (the second task on a maxwell GPU seems to scale very well, even with an old Core2Duo driving it)
11) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1657001)
Posted 2 days ago by Profile jason_gee
extra info from the linked wiki entry, that clarifies a bit probably:
WDDM 2.0
Direct3D 12 API, announced at Build 2014, will require WDDM 2.0. The new API will do away with automatic resource-management and pipeline-management tasks and allow developers to take full low-level control of adapter memory and rendering states. WDDM 2.0 dramatically reduces workload on the kernel-mode driver for GPUs that support virtual memory addressing,[36] which allows multithreading parallelism in the user-mode driver and results in lower CPU utilization.[37][38][39] WDDM 2.0 will ship with Windows 10. [40]


That does have implications for Cuda & OpenCL, and implies particular hardware features. I also interpret it as saying that what we have now doesn't manage virtualised video memory efficiently, and has CPU utilisation issues with multithreading parallelism (Which we see in evidence)
12) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656988)
Posted 2 days ago by Profile jason_gee

(1) When you say 'Windows display driver model' I take it you mean Microsoft have dictated "This is how you need to write a driver to interface between your hardware and the OS because this is how we've designed the OS'.


Yes, The WDDM (Windows Display Driver Model, Vista onwards), replacing the XPDM (XP Driver Model, XP and XBox)
http://en.wikipedia.org/wiki/Windows_Display_Driver_Model
Those include DirectX/Direct3D and other specifications for hardware, firmware and software driver interfaces.


(2) Can you tell us how this differs from Linux & Mac OS X and does this make a difference as to how efficient the platform is as a number crunching entity; that is, does the latency introduced by the Windows 'double-buffering' affect how fast the same working would be crunched on Windows vs. Linux/OS X, all other things being equal. (Yes, I am aware I'm asking you to explain how long a piece of quantum superstring is :) )

Given the hardware and firmware parts are part of the specs, The OS specific implementation would likely vary by which features it implements. Just because card vendors wouldn't want to maintain completely separate driver forks, there is overlap with other specifications like VESA and OPENGL etc. Anecdotally ( YMMV) I find the Linux drivers pretty much the same functionally, though no DirectX, but OpenGL and Cuda. OpenGL does some certain things faster/leaner, and others not as well. That's probably some part of the motivation behind AMD's Mantle ( Low latency driver architecture) to bypass this, and for compute only devices NV use special 'Tesla Compute Cluster' drivers as well. On Mac I have no direct experience with the drivers etc, though other devs reported to me much higher latencies, perhaps indicating more buffering going on underneath.

3) Would adding more RAM help the issue, i.e. reduce paging or is it "not as simple as that". I've got 8GB RAM across the 3 cards (4+2+2) so I'm assuming it's trying to reserve 8GB kernel space to call its own. (I don't know where to find the window you showed to check).

yes and no. IF you actually use more total VRAM, across all devices, instance, apps, overheads etc. than about half your kernel space ( ~4GiB in your case, with 16GiB host RAM installed) then I'd say yes. If you don't use near or more than 4GiB then paging should already be pretty minimal depending on what you do with the machine.

That's a lot of virtualisation and large memory amounts, probably further justification for AMD's Mantle, and DirectX12 changes. Meaning the whole picture could well change (must see how far they are with Win10 this weekend, DirectX12 there yet?)
13) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656977)
Posted 2 days ago by Profile jason_gee
So Win7 Ultimate x64 with Woodgie's 6GB GTX Titan plus two 2GB 750ti's would like to have more than 9GB of shared kernel memory for that VRAM backup. With 16GB installed RAM implying 8GB kernel memory there must be some workaround in the driver model.

{edit} Standard memory for the GTX Titan is 6GB, and the OpenCL AP task details are showing that amount, so I assume the card actually does have it even though the CUDA task details only show 4GB.
Joe


Yes, with a 32 bit application, then at least the Cuda Runtime (and underlying DirextX based driver it uses) can only 'use' 4GiB on an instance on one given device (minus some overheads). How much is really there and paged in is supposed to be transparently managed underneath. OpenCL is closer to the driver runtime, so reporting a different (unusable higher physical) number may or may not make sense. each instance will see its own space, whatever the OS pages in, so filling >4GiB is possible with 32 bit instances.

For 64 bit instances there'll be some tradoffs. less computation in the virtualisation host side, but bigger addresses on the GPU, which chew up more registers.
14) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656711)
Posted 3 days ago by Profile jason_gee
- Windows display driver model mirrors VRAM for display driver recovery purposes

How often this mirroring takes place?
(I assume you mean the whole VRAM (?) is copied (by some DMA controller?) to main computer RAM every X seconds?)


Complex, though in this post classic-XP mechanism generally most operations occur via a kernel memory 'staging area' which then transmits the commands/data (sometimes combined for optimisation purposes), so in effect you have a virtual GPU in host memory that the applications talk through via a user mode driver helper. (virtualisation of the GPU resources)

That's a more complex kindof 'double-buffering' than simply mirroring, , that explains increased latencies, and why extreme gamer benchmarks stuck with old XP so long, amounting to 10% or so performance penalty at the time at introduction with Vista (Since then newer GPUs add DMA engines, faster & more DMA engines, and more latency hiding mechanisms).

Later XP drivers add some (enough) of the virtualisation to keep applications compatible, though then being hybrid drivers attain all the scaling limits and latencies, without the benefit of new hardware & lots of RAM on top.

In terms of amounts of VRAM being mirrored, it's this number right here for my 4GiB physical VRAM 980 (Win7x64):


Fortunately, or unfortunately, depending on the usage, that virtualisation of the video memory is paged. If you actually start filling things up to the extent system resources are low, then you'll see similar or worse effects as with host memory excessively paging to disk (i.e. usually unusable). Naturally adding more host memory to modern standards is only an option on 64 Bit systems etc, so extreme care is needed if selected modern GPUs for a 32 bit desktop version of Windows.

[Edit:] note that Windows 10 and DirectX12 is supposed to be changing this model. I've not seen details though tek syndicate mentioned at least sli configurations stacking VRAM, so that's different. The picture may change completely if they want to compete with Mantle for latency
15) Message boards : Number crunching : Just added 3rd GPU and CPU is 'Waiting for Memory' (Message 1656451)
Posted 4 days ago by Profile jason_gee
There's also some complications, adding for the sakes of completion, that make multiple GPUs with lots of VRAM much more complex than in the past.

Those include:
- Windows display driver model mirrors VRAM for display driver recovery purposes, (in this 3 card case to the tune of some ~6GiB kernel space 'shared') and
- PCI express lanes are limited ( 16 lanes for the i7-4770K, to cover the 3 video and any other devices in the system )

The first item above, when you dig really deeply, covers the majority of why an extreme example (retired) host of Windows XP, 4 x old GTX 295's (total 8 GPUs, with some 7Gib physical VRAM) while viable under 'Old style XP drivers' and small amounts of Host RAM, will tend to choke early under more modern 'hybrid' drivers. With respect to the current (i7 + 3 larger GPUs), that's a lot of the 16 GiB physical goobbled up, and that will be out of the 8GiB half that is kernel space (leaving 2 GiB for OS and drivers, though likely plenty for application user space).

The second item will have more of an impact on how many tasks can be 'fed' by the CPU in limited time, remembering that there will need to be some turns being taken on the PCI express links, and a lot of activity there can be met with sitting and waiting. Hyperthreading would probably double that queue contention. I didn't know or think about this limitation extensively in the past, though it becomes pretty important in modern workstation operation, which is probably why the likes of Xeon processors with more PCIe lanes have been becoming popular even in high end gaming rigs, just to feed the faster GPUs more promptly.

For x42 (next major Cuda multibeam revision) I've been gradually engineering ways to make the application less 'chatty', which should reduce the issues there, though with the GPUs getting faster all the time, it's taking some time to find the best ways to make things scale better and more automatically in the future.
16) Message boards : Number crunching : Panic Mode On (96) Server Problems? (Message 1656002)
Posted 5 days ago by Profile jason_gee
Hope the servers can stay glued together until tomorrow when Matt may be back in the lab again.
Forum lag is usually a warning sign.
And the spiky nature of the Cricket graph is another one....that usually means something is getting tied up and work goes out in spurts rather than in a smooth flow.

Kitties cross their little toes for luck.

The spikes are data going up to the lab per the 8_34 graph. I won't try to guess what those ~20 GB chunks contain.
Joe

Offline archiving perhaps?


Thomas the Tank Engine on Bluray.
17) Message boards : Number crunching : GPU driver version lists (Message 1655802)
Posted 5 days ago by Profile jason_gee
Something comes to mind that could be mentioned, is the drivers that sometimes ship on CD in the box are generally GPU model specific, and sometimes a bit dicey. Not much option sometimes if it's a brand new model released yesterday, but the mainstream downloads that quickly follow are usually much better.
18) Message boards : Number crunching : GPU Problem (Message 1655684)
Posted 6 days ago by Profile jason_gee
Yeah 3-7mV variation would be outstanding, even if suspicious enough to look for confirmation. It can happen ;)

The temp limit may have done it, by effectively limiting frequency. To an OCer that might suggest back off a notch, though these days of automated doohickeys do place some trust in the doohickey creators.
19) Message boards : Number crunching : @Pre-FERMI nVidia GPU users: Important warning (Message 1655681)
Posted 6 days ago by Profile jason_gee
Also is anyone keeping a list of NV drivers for people to reference like the ATI Driver Version Cheat Sheet I make?


I'd imagine, sadly, probably not, since that's probably the first major OpenCL schism, and I recall no particular problem Cuda drivers that push things to the point of recall. Claggy's probably your best bet to talk to, as defacto head of the Boinc Emergency Response Team ( BERT )
20) Message boards : Number crunching : GPU Problem (Message 1655680)
Posted 6 days ago by Profile jason_gee
That would certainly make sense, and it wouldn't be the first time competitive 'bang for buck' cards fall into this trap (The initial round of 560ti's coming to mind). It's a really tough tradeoff between actual component quality, acceptability of glitches in gaming pixels, and consumers liking numbers some small percentage better based on flashy logos and extra fins.


Next 20

Copyright © 2015 University of California