Posts by jason_gee

log in
1) Message boards : Number crunching : GTX 960 vs GTX 780 for Crunching (Message 1681881)
Posted 8 days ago by Profile jason_gee
Actually the 960 would be just short of a 770 in performance while using around 55% of the power of a 770.


sounds about right to me (with healthy doses of your mileage may vary)
2) Message boards : Number crunching : GTX 960 vs GTX 780 for Crunching (Message 1681879)
Posted 8 days ago by Profile jason_gee
Jason - on a one for one basis, are Maxwell cores more powerful computationally than Kepler? (Like Haswell > Ivy Bridge > Sandy Bridge) That's the core (pun intended) of my question.

Yes, provided 'One for One' means 1 Watt to 1 Watt (the only fixed reference I can really compare). There is both more computation per Watt 'possible', and better facilities to hide latencies/overheads such that higher efficiency is extracted.
3) Message boards : Number crunching : GTX 960 vs GTX 780 for Crunching (Message 1681831)
Posted 8 days ago by Profile jason_gee
Yes relative performance for different software ( either games or apps ) will vary considerably depending on how those apps/games are written. older games will tend to run at very high frames per second, and newer texture heavy ones become more framebuffer dependant, depending on the settings and resolution driven.

For non-graphics compute applications, you will have similar considerations, but with different balance and bottleneck. There is so much (theoretical) compute horsepower in either GPU, that for 'our' purposes, on these high performance GPUs, the limiting factor is the VRAM, driver latency, and system overheads dictated by the aging application designs.

I mention the aging application designs, not to denigrate their effectiveness etc, but to point out that underlying Cuda is DirectX, and for NV's OpenCL implementation similar calls.

The DirectX implementation API level under the 780 is 11.2, while the 960 is DirectX12, which is not released formally until Windows 10.

The difficulty that presents, is that there are shifting goalposts in terms of the underlying infrastructure, right up through OS and drivers and the applications. Apples to Apples comparisons therefore become difficult to find, and depend on your precise conditions & requirements.

More detail on your needs in the change would help the comparison, as there are just too many variables to make firm assertions about what you would likely achieve (other than reduced power and heat). Boinc RAC, for example, has something like a +/- 37% variance under stable running conditions, so would be a bad comparison to make. Number of workunits in a specific angle range controlled situation might be workable, if power was measured and factored in.

If the comparison had been between 780 & 970 or 980, then it might be a bit easier to draw conclusions (but not much if not concerned about power/heat).

My usual tactic when things are that hard to call, is to sit tight as is, and wait for the next gen where the jump is clearer.
4) Message boards : Number crunching : GTX 960 vs GTX 780 for Crunching (Message 1681563)
Posted 9 days ago by Profile jason_gee
Generally speaking (there are a lot of variables to look at, as Mike suggested), The benefit of the Maxwell architecture is performance per Watt. Looking at market segmentation of the models, with a 960, which is a midrange card, you could expect somewhere around the performance of the prior generation ( + say 20-30% ) in the similar class, but using less power. That should place performance above a 760 overall, but quieter and more refined.

With the 780 comparison, the chip was a different breed (GK110) which is a monster. Once you get above a certain amount of raw compute horsepower, the limitations become largely related to memory bandwidth, and how smoothly you can 'feed' the GPU work (which includes all kindos of other system issues).

So in short, there are ways you can ensure you're getting the most out of your cards, though going from 780s to 960s might not be a straightforward comparison, depending on if you have considerations like power and heat. Directly, with current applications ( MB, AP and other ) limitations are more likely not with the compute cores themselves.
5) Message boards : Number crunching : Performance drop on new CPU (Message 1677037)
Posted 19 days ago by Profile jason_gee
That's what I was looking at, and estimates for those were pretty darn accurate, like I was saying. Ah well, will keep tracking.

heh, I think with respect to boinc estimates, the quote "Even a stopped clock gives the right time twice a day" fits
6) Message boards : Number crunching : 9 dollar computer: CHIP (Message 1677035)
Posted 19 days ago by Profile jason_gee
Could be very interesting for Robots/drones and CNC/CAM etc. In fact can see a number of uses around the house where the ice-cream container full of microcontrollers wouldn't quite have something right, and an arduino type arrangement a bit too much to justify.
7) Message boards : Number crunching : What causes a Blank stderr? (Message 1675365)
Posted 20 days ago by Profile jason_gee
Yeah, was up to where it can skip path entries without printing that it saw them (still a level out from there). Hadn't spotted a problem in the outer logic going inward, so *should* be OK once he makes that fix. There's a bit of the usual needless spaghettification (namely low 'coupling', i.e. functions so small you end up jumping around like a frog in a sock, likely in the name of excessive reuse, making the call stack too deep for easy navigation).

The thread safety concerns applicable to the likes of the stderr (and probably result) files don't apply here internally, so are a separate issue (that can make the deletion fail). I don't see any handle leaks here either.

[Edit:] Given its all in a Windows specific preprocessor block, not sure why he didn't use single IFileOperation or SHFileOperation Calls set for recursive deletion. Oh Well.
8) Message boards : Number crunching : What causes a Blank stderr? (Message 1675333)
Posted 20 days ago by Profile jason_gee
It's going to be interesting to look at the file enumeration part, the functions used and MSDN references.
9) Message boards : Number crunching : Crunching apears to stop (Message 1675126)
Posted 20 days ago by Profile jason_gee
underneath most bench code uses that function (for the Windows code), which is basically just a CPU timestamp counter call and appropriate serialising instruction. Some Motherboards and windows versions have issues, as well as some caveats with hyperthreading and such.

If the stock bench code is using the RDTSC instruction directly, or that Windows API function (On Windows builds Obviously), there would be some alternative ways to use a lower resolution counter not prone to the issues.

When I get to that point, I'll probably test the reliability of the used timer on the host, and use a less accurate means instead of that timestamp [where necessary]. Another possibility is that the timers in the stock variant are overflowing somehow. Since the hardware counters involved should not overflow for ~100 years or so (64 bit IIRC), then it's possible only a portion of the value is used (e.g. if it only uses 32 bits, the number of 'ticks' might only be a couple of seconds on some hosts, and some benches take longer than that.)

So plenty of possibilities to check out. I'd be interested to reproduce the issues in a dedicated standalone test piece down the line, allowing proving of alternatives/fixes, especially since what I'm working towards will be heavily dependant on timers.
10) Message boards : Number crunching : Crunching apears to stop (Message 1675111)
Posted 20 days ago by Profile jason_gee
It's not just AMD CPUs, I see this on my C2D T5500 running Ubuntu 14.04, I've tried compiling my own apps, no change, I've reported about it at Lunatics with some suggestions, don't think anyone was interested.

Part of a long list of ToDos for my own builds (both CPU and GPU of various types), has for a long time been some C++ class based inheritance for key processing functions.

When I mentioned that I was going to be enabling builds to use various CPU FFTs, Cuda and OpenCL, with internal dispatch, Eric did express interest in having a more 'pluggable' implementation for the FFTs at least (which currently are not benched), and that he would appreciate if I could put the same facilities into main.

As selection of those depends on hardware, libraries, accuracy and performance, dispatch there has to be a bit more flexible and generic than the existing mechanism, so I started on the Class hierarchy to include the other processing functions as well.

Since shifting build system, Cuda7, and various Boinc issues have taken precedence, for stock that has sat at a bare/unpopulated file/class structure I committed quite a while back, in a folder under stock v7.

That will probably recommence my end, as soon as I've mastered the basics of the Gradle build system, and by nature requires redoing the benchmark code.

[Edit:] since, while testing gradle backstage, we tested some of the precision timers involved a little while back in small test puces, and they appeared to work for a range of devices/purposes without issue, probably the bench code will receive some of that work in the end. Yep, bits and pieces everywhere to tie together.
11) Message boards : Number crunching : Task Postponed? (Message 1675108)
Posted 20 days ago by Profile jason_gee
Quick update.

That task that exit command just got validated by my wingman.

+10 points for failure recovery code :)
12) Message boards : Number crunching : Task Postponed? (Message 1675107)
Posted 20 days ago by Profile jason_gee
Had a quick look. Hard to help without knowing the system & GPU in person, but the artefact scanner might yield some clues as to stability, providing there aren't other major issues with the system there.
13) Questions and Answers : GPU applications : GTX 970 problem (Message 1675104)
Posted 20 days ago by Profile jason_gee
The consistency of the issue here seems suspicious, in that the usual power and temperature first suspects seem unlikely.

After clean driver install and reboot, I would check the temperatures, PSU and clocks again anyway, and run an artefact scanner. The factory clocks/boost on GPU core or VRAM (SC edition was mentioned) may be just a little high, and require a small voltage bump for reliability. The failing portion of code is indeed VRAM access intensive, so consistent failure there could indicate one or more memory chips running on the hairy edge, which a small clock backoff or voltage increase should address easily.

These things are sold for gaming, and competition in the mid-high range is fierce, so sometimes the manufacturers are erring on the side of performance over reliability when setting default clocks and voltages.
14) Message boards : Number crunching : What causes a Blank stderr? (Message 1674565)
Posted 22 days ago by Profile jason_gee
I just asked why blank results happen, not the entire programming behind it, LOL

Haha. that's what happens if you open a can of worms. The summary answer is 'outdated design and programming methods'. Why it's never been properly addressed since I reported something related circa 2007 is another matter, involving pointless arguments about 'My OS is better than your OS' and 'not broken here syndrome'. [A special term I've just made up, I like it]
15) Message boards : Number crunching : What causes a Blank stderr? (Message 1674560)
Posted 22 days ago by Profile jason_gee
I would expect the different OS on each machine is more likely a factor in the issue.
It could be related to issues I've had deleting files/folder in Windows. Since the release of Vista. Which only got better post Windows 7 SP1.

Highly likely. More specifically, multithreaded C-Runtime libraries (made the default and only option since Visual studio 2005 ). extensively kernel buffered I/O (which is a desktop OS performance optimisation), and proliferation of multi-core.

Win7 (SP1) is a fair bit more aggressive about its own garbage collection, hiding some serious application level headaches. It's a bit of a case where some could say 'd@mn M$', but then seeing vaguely similar scaling problems starting to happen with Linux and Android, I'd suggest that most likely bad programming and development inertia are ubiquitous.
16) Message boards : Number crunching : What causes a Blank stderr? (Message 1674428)
Posted 22 days ago by Profile jason_gee
... I'm interpreting that as saying that files may be present for hours, maybe days, after BOINC calls DeleteFile(). I don't think any OS's garbage collection is as lazy as that...

True, however when there are open handles on those files, the deletion may well successfully mark for later deletion. In principle, a corrupt DLL global space (including open handles/locks in the user mode driver) can persist until reboot.

IOW, you can walk through as much logic that seems reasonable, but as soon as you have race conditions and non-threadsafe behaviour, all bets are off, and you will see seemingly non-deterministic behaviour.

[Edit:] note that at a high level, the 'race condition' concerned typically begins the instant the application writes the finished file. The race is between the finished file being created by the system, and the boinc client picking up the application exit with zero status. The non threadsafe ('insecure') behaviour is primarily the hard process termination, but includes other app and library usage specific issues.
17) Message boards : Number crunching : What causes a Blank stderr? (Message 1674380)
Posted 22 days ago by Profile jason_gee
To me it seems the addition of a "Check folder is actually empty before reuse" would solve the issue.

Should do, that'd effectively be a mutex. More than one way to skin a cat of course. Another is certainly to make slots unique as described (or suitable variant), and garbage collect finished with resources at leisure, as a background thread at low priority. That last option (or some variant) would let the client do what it needs to do with new tasks/slots in the most timely fashion, and reduce logic involved in checking/locking. Might not be the best option for some situations, but probably the best for multicore PC type platforms at least.
18) Message boards : Number crunching : What causes a Blank stderr? (Message 1674376)
Posted 22 days ago by Profile jason_gee
3) Don't reuse the slot if (2) fails

Well there's probably the 'original' issue. The logic should instead be something like:
'3) Don't reuse the slot unless everything's deleted and it's not allocated etc'

, because #1 and #2 can succeed on one core/thread, and not be seen on another core/thread until later (race condition). #1 & #2 succeeding, doesn't mean they're complete.
19) Message boards : Number crunching : What causes a Blank stderr? (Message 1674374)
Posted 22 days ago by Profile jason_gee
makes sense, but David's 3 step process there implies the use of mutexes (locks, atomic transactions etc), which aren't in the code. #1 and #2 can succeed but the file not be physically deleted yet. That's [asynchronous] buffered IO, and doesn't occur in sequence (unless you make it so, [with additional logic])
20) Message boards : Number crunching : What causes a Blank stderr? (Message 1674370)
Posted 22 days ago by Profile jason_gee
Well, we haven't caught our 'exceeded disk limit' yet, but the search with <slot_debug> has turned this up:

failed to remove file slots/0/stderr.txt: unlink() failed

Heh. Interestingly the (for Windows) underlying DeleteFile() Windows Api call, under the unlink() logic, is a non blocking call:
...The DeleteFile function fails if an application attempts to delete a file that has other handles open for normal I/O or as a memory-mapped file (FILE_SHARE_DELETE must have been specified when other handles were opened).
The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED.

At least in the present example, it points to open handles, likely via Windows memory mapped file implementation used by Boinc MFILE or MIOFILE (whichever) structures. i.e. the application is still shutting down, or was forced closed with TerminateProcess()... [Yeah that old chestnut].

With the code present in sandbox.cpp The failure rate would be proportional to the deleting client to finishing app process priority ratio (app is usually idle-below normal), the total of the file sizes being deleted, system contention, to some limited extent filesystem performance itself, desktop optimisations by Windows version and C-Runtime used, and maybe some caching policies at various levels (Hardware, OS and Driver).

Best practices solution might likely involve doing something like this before allowing the slot to be used again:
- setting a mutex (of any suitable type) indicating start of a bulk deletion transaction,
- delete,
- check,
--retry for failures, for slowness just accept the IO will get around to it and allow very generous timeouts (if any),
- only release the mutex once complete.

The above, for large files etc, might take too long (Many seconds to minutes) for the Boinc client in its current architecture, because the file deletions seem to be in the main processing loop (thread). As the contention will be highest at task completion for all sorts of reasons, What would be better is either some dedicated garbage collection thread, that runs independently, allowing normal client other processing to continue (in other slots) while making sure nothing will try to use the slot with a deletion in progress, OR keeping transactions much smaller.

Next 20

Copyright © 2015 University of California