Posts by jason_gee


log in
1) Message boards : Number crunching : Correct delete of HDD before new OS install? (Message 1504515)
Posted 8 hours ago by Profile jason_gee
You are installing a brand new OS on the disk, what are you caring about the old content on the disk? :?
Old content is step by step bashed by the new OS using the disk and will be absolutely safely erased with the future use. I don't get it, what do you want - erasing compromising content?! :?

;)

[edit]
Forgot: You'll simply have to choose "format" to get a "blank" disk to use for the new OS.


It's true for those reasons that User initiated low level formats fell out of common practice in the early '90s onwards. However, partition and sector information written at the beginning of a drive's lifecycle does not adapt to the thermal calibration / head alignment / wear in mechanisms over time, and also tools like cgsecurity's testdisk have demonstrated data is often retrievable over multiple soft formats (and overwrites), making simple (both full and quick) high level formats mostly cosmetic.

'Nuking' a drive, optionally with special patterns that randomise the magnetic polarisations, is for those who really want their drive to be closest to a fresh start (including refreshed sector information), short of buying a new one. Mechanical drives wear, and bits move around. Deep erase is to realign the tracks/sectors, sometimes even refreshing a drive that habitually goes into thermal recalibration (click-click-click) due to age.

[Edit:] Also I happen to know Dirk is passionate about making things perfect, so the DBAN option with full military erase might be the go ;)
2) Message boards : Number crunching : Correct delete of HDD before new OS install? (Message 1504426)
Posted 12 hours ago by Profile jason_gee
Since it's for re-using an older drive, I'd recommend to use Darik's Boot and nuke (DBAN at http://www.dban.org/), and at least wipe the drive low level with 0's (advanced deep secure erase is also possible if you worry about old viruses resurfacing etc, but takes longer)

This low level kindof wipe will delete all the partition information, and Also effectively refresh every cluster on the drive, which can be a big help (A bit like SpinRite, but not needing to rescue any data)

I've done it before on 'cranky' drives, and works a treat sometimes to bring drives like new, before OS fresh install :).
3) Message boards : Number crunching : Restarted from 100% to fail?... (Message 1503984)
Posted 1 day ago by Profile jason_gee
Any opinions?


Only that we can only assume from what we see that the stderr is stuck from some prior run. The elapsed and CPU times differ, as do counts from time to time, so something is being launched, but either eororing before initial state can be written, or fully processing and failing to update state. I'd guess either damaged permissions on the slot preventing deletion, and/or a prior state file in the slot. Perhaps the old 6.6.31 client shown there has particular bugs in clearing the slot, or in sanity checks before launch.
4) Message boards : News : SETI@home now supports Intel GPUs (Message 1502414)
Posted 5 days ago by Profile jason_gee
...Other things that are design and database changes to track resources rather than code changes will take more time.


For the fine tuning/damping end (well after the coarse scaling issues), I can just pinch the sample arrays already being used for pfc_scale and host_scale rubbish averages. Full PID implementation would only need 6 sample spaces per scale. 3 for the fixed gain knobs, and 3 for the internal variables. I make that a saving of about 2*94 database lookups per host result validation, and so 188*sizeof(double) bytes per host result pending in the working set/Cache.
5) Message boards : News : SETI@home now supports Intel GPUs (Message 1502397)
Posted 5 days ago by Profile jason_gee
Another question... do multi-threaded apps consistently report CPU time to be about n_compute_threads*elapsed_time on all platforms (so we could use CPU time/elapsed time to determine a multiplier)?


I've been hopeful yes, for other purposes, but haven't been able to fully check the boincapi end completely yet. For compound [asymmetric] apps to work with runtime change I need the total across resources, and to ride the <flops> rate as well, which I expect would run into all sorts of safeties.

[Edit:] The basic plan was to cut down projected GBT Astropulse from 6 months, by getting the entire GPU Users group working on one WU at a time, multithreaded and multiGPU'd.
6) Message boards : News : SETI@home now supports Intel GPUs (Message 1502389)
Posted 5 days ago by Profile jason_gee
That's what I never understood. There's enough information in the PFC values to determine credit scaling, but a pfc_scale factor is calculated instead. A scale less than 1 should never be possible (for a CPU app) if we're scaled to the least efficient. And a scale more than 1 should never be possible if we're scaled to the most efficient.

Yet a quick check shows that our pfc_scales range from 0.51 to 1.30. So I'd say because of that our credit grants are probably low by about 1/0.51=1.9X.

The way the current code seems to work is that the most common CPU app (windows) sets the scaling. That needs to be fixed.


That's right, so the samples used for averages get weighted by the most commonly returned results (Windows SSE/AVX enabled by nature). It's scaling to the 'most efficient' but the the method used to determine throughput is faulty too, using Boinc FPU whetstone for a vector unit. --> impossibly low pfc_scale.

-> compare sisoft Sandra FPU single thread WHetstone, to Boinc WHetsone [same]
-> compare Sisoft Sandra SSE Whetsone to FPU Whetstone, [2-3x]

pfc_scale will oscillate from about 0.3 to 2, depending on the population of the last n samples [platform, CPU, app caps]. Likewise, without damping those scales, the 'most efficient' app incorrectly selected can swap around too.
7) Message boards : News : SETI@home now supports Intel GPUs (Message 1502315)
Posted 5 days ago by Profile jason_gee
Another possibility I've considered. I've never confirmed that the credits are really scaled to the least efficient CPU version for a platform. In theory, if I were to create a CPU version of SETI@home with no threading or SIMD using the Ooura FFT and release it under the plan class "calibration". After 100 results come back from that version, the credits of everything else should go up. In theory, of course.

More work would be required to allow short running calibration versions.

Then for credit calibration, all a project would need to do is generate a calibration version of every application including GPU apps and some server code to greatly limit the number of calibration apps that go out.



From memory (needs another walk through when awake), it's scaling to the dodgy average of the lowest effective claim (which will always be overweighted toward AVX populating the last n results in the sample set). Raw claims there (for AVX) are about one fifth of [reality or] original estimate, mixed with a mid to dominant proportion of SSE-SSE3 by volume. Combined that brings the claim to about one third of [reality or] initial estimate (which I always interpreted as a minimum [based on fundamental compute complexity] ). ---> Shorties should be above 100 credits, not ~40 +/-25% . We added autocorrelation since the time they used to be 90-100. [there was a drop to ~60 in between, before AVX and autocorrelations, attributable to CreditNew's introduction not accounting for the existing SIMD optimisations.]

That's reasonably close to the original old multiplier of 2.85, which more or less compensated for a lot of overhead and some flop count shortfalls (whether that was the intent or not).

A possible middle ground with less logistical issues, but slightly less precision, would be to send out an app with just a bench, to grab CPU caps and several forms of Whetstone ( FPU double, FPU single, SSE-SSE3 signal/double and AVX... maybe even baseline GPU )

That should yield (at least for CPU) first a cross check for Boinc's Whetstone (approximating clock rate for x87 builds), detailed host capabilities, and coarse corrective multiplers for the scaling (given the server knows about app capabiltiies already somewhere).

Anyway, still looking for the options with the least work involved first. Fingers crossed with the noise rejection and stability improved, and the coarse scaling assumptions repaired, the thing would converge on its own [Likely immediately around COBBLESTONE_SCALE when correct]
8) Message boards : News : SETI@home now supports Intel GPUs (Message 1502006)
Posted 6 days ago by Profile jason_gee
Yes, if we went back to flop counting we would need to standardize. It makes sense to standardize on the most common algorithms for things like FFT, trig and exp. FFT would be 5*N*log(N), trig functions would be about 11 if the result is used as single precision and I've forgotten the number (17?) for double precision.

Granting a standardized value rewards optimization that removes operations (i.e. sincosf() would get credit for 22 FLOPS rather than 16.) Of course, the project needs to be honest about whether it needs both values from the sincosf().

That said, I think the SETI@home FLOP counting grants 1 FLOP for sin() or cos().


The 'other' way I came up with that *should* remove the coarse scaling error, is to accept the initial (unscaled) wu estimate as minimum operations, or minimum x some constant for the benefit of allowing for some small overhead plus initial breathing room to prevent aborts ('make sure never to underestimate').

That's likely the main initial thing I'll be testing at Albert (when the time comes), because it allows the automatic scaling to compensate for SIMD, optimisation, and potentially multithreading while still keeping the automatic scaling for finetuning and sanity as intended.

That of course would rely on projects setting the minimum estimate (*some constant). Heuristic might go something like this, while saving a lot of the costly sanity checks in place at the moment.

// normal sanity checks here, minus some costly ones that won't be needed // anymore when system is stable in the engineering sense. // look for outliers properly ... // if not an outlier... credit_multiplier = raw_flop_claim/wu_estimate; if credit_multiplier < 1 then //... must be SIMD, or optimised, there's an inbuilt underclaim without this //... round this to steps if desired // raise some red flags if this is lower than say 1/6 or 1/8 ... // could be missing some outlier or old/broken clients ... credit_multiplier = 1 / credit_multiplier; else // either estimate was spot on, or the application is multithreaded // and sending back sum of elapsed per resource // (as good multithreaded should) // ... assume multithreaded for high credit_multplier, and allow whatever is // ... possible/consistent with app version & known host resources if app_is_mt && host_has_mt then ... // allow it else ... // probably we have some coarse overestimate ... // allow for usage variation .. // adjust credit_multiplier and app_ver_wu_scale used by scheduler end end ... //A PID controller smooths this, tune for rapid convergence // which allows for hardware change (small initial overshoot) // this is better than weighted sigma (undamped averages) host_app_ver_scale = host_app_ver_update_est_scale( app_ver, credit_multiplier ); // PID the global app version scale too, used for self-tuning initial estimates // globallly and finding the 'most efficient' app. // tune for slow response. wu_scale = wu_scale_update(app_ver, host_app_ver_scale); new_credit_claim = raw_credit_claim*wu_scale*host_app_ver_scale;


Likely logic gremlins aside, cascading two controllers like that is fine, and creditnew currently does that. The problem is that when both are unstable it leads to the confusing effects we all see user side. Implementing these scales as PID controlled outputs allows noise rejection / damping, while potentially removing the need for certain costly sanity checks and database accesses. (e.g. no need to look up and adjust a database to average a bunch of values spanning a month)

The three knobs, P, I and D (which are 'gains' )can be set to 1,0,0 to emulate the behaviour of the current system (ignoring the logic changes above), set to a 'classic' preset, or manually tuned (one-time). Tuning won't affect the coarse scale, just stability and noise rejection. The invisible internal controls self adjust, so no work there.

If any initial tuning at all is too difficult for a project, and no classic presets seem suitable, then a fuzzy assist is doable (and not as big a deal as it sounds)

All that would basically achieve is some convergence on the ( 'fair') COBBLESTONE_SCALE, noise immunity and better response to hardware or usage pattern changes. I've been using a modified 6.10.58 to track task estimates for a couple of years now, that implements the PID controller. Client side it's able to adapt near-real-time to machine usage and hardware change, without intervention.
9) Message boards : Number crunching : Questions about iGPU and GPU card use (Message 1500167)
Posted 10 days ago by Profile jason_gee
... My questions are -- Wouldn't it be better to use the IGPU exclusively as the monitor, ignoring it in BOINC, so that the AP/OpenCL processing will not be interrupted during those times that games/DVDs are being used?


It's possible, but at the same time I know that the software and firmware underneath is more complex than that. In a perfect world I would hope so, while I know that forcing chipset drivers to stick, getting RAM and MCH to play nicely, and decent drivers can be a challenge.

50/50 chance it would work 'out of the box', with the other 50% chance being a nightmare :D
10) Message boards : News : SETI@home now supports Intel GPUs (Message 1500163)
Posted 10 days ago by Profile jason_gee
EDIT: regarding operations counting - are you sure that FLOPS == work done no matter what algorithm used? AstroPulse, for example does merely c=a+b in most of its parts. While some other project could do something like c=exp(a)*sin(b) for example. What FLOPS counting will give if the need in memory accesses will be accounted for ?


You can of course factor in that Recent FFT developments reduced (serial) FFT algorithm complexity from knlogn ( O(nlogn) ) to a little bit less ( still knlogn, O(nlogn) ), but optimal compute complexity still remains more or less what it was, and ignores all memory/storage accesses (full latency hiding is assumed, before and now). That's the first major change in Fourier analysis in I think 30 years or so.
11) Message boards : News : SETI@home now supports Intel GPUs (Message 1499798)
Posted 11 days ago by Profile jason_gee
Since Albert's willing to let us use their beta to test & tune some things (in time), we're hopeful for an Apollo 13 style rescue, over a Coors-Light party train disaster.

Any ideia when the test begins? I allready join Albert to help with that.

Last I heard, they were wrestling with some infrastructure upgrades. Probably I'll consult with the others on that during the week to try get a rough idea of timing.

The difference here is that we have to do it while the vehicle's in motion and fully loaded with passengers.

A nice challenge not? :)

Sure :) nothing wrong with a little pressure :-X
12) Message boards : News : SETI@home now supports Intel GPUs (Message 1499455)
Posted 12 days ago by Profile jason_gee
The server uncompensates for attempts to compensate.

Then we are doomed... Snif!

Sorry but i can´t understand why you simply can´t add a simple multiplier to the actual credit paid. Something like (not the actual values just a guess)

MB Credit = MB Credit * 3.3

After all the creditscrew calculations where made, Nothing else. NO chanches in the Creditscrew code itself (that´s for another time).

If IIRC 3.3 was the number finded by Jasons as the discrepance on the values (i could be wrong)

Something similar could be made on the AP section.


Basically the system is a lot like one of those mechanical governers on a steam engine (but with the wrong kindof ball weight, and some bent pushrods at the moment... hooked to another for each passenger...).

You can wedge a brick in it to keep the throttle open, but then the boiler explodes and everyone dies, or the thing runs out of control derails & runs through downtown San-Francisco :)
[ Then also, piling on bandaids is probably what created the bent pushrods in the first place ]

The way to fix it would be to unbolt the device, make all the adjustments and test, then reinstall and finetune. The difference here is that we have to do it while the vehicle's in motion and fully loaded with passengers. Since Albert's willing to let us use their beta to test & tune some things (in time), we're hopeful for an Apollo 13 style rescue, over a Coors-Light party train disaster.
13) Message boards : News : SETI@home now supports Intel GPUs (Message 1498965)
Posted 13 days ago by Profile jason_gee
I'll laugh at the Collatz joke when they start counting operations and using that to grant credit.


I suppose us [temporarily] switching back to flopcounts with a SIMD compensating credit multiplier is out, given the logistics...
14) Message boards : News : SETI@home now supports Intel GPUs (Message 1498621)
Posted 13 days ago by Profile jason_gee
...then the four known credit System design flaws currently under detailed analysis.


Jason, can you elaborate on this part of your response?


Sure, I'll summarise here. I did post more details in NC a while back, but these tend to get buried in mayhem. The Collatz April Fools joke post makes a nice summary too ;)

#1) Coarse scaling error: The FLOPs estimates from returned tasks at validation time, which get converted to the cobblestone scale to aquire a raw credit claim, are based on Boinc Whetstone [and elapsed time]... which for CPU implies that FPU, SSE-SSE3, and AVX are all the same. As the detected 'most efficient' application (stock CPU for multibeam), scales down every other application, then there is an effective underclaim of between 2.85 and 8x (probably around 3.3, subject to isolating the actual SIMD performance of modern processors & applications. That's important because creditnew is ignoring what's known as 'Instruction Level Parallelism'

#2) Stability: In Engineering senses, Credit awards show all the instabilities (ringing, overshoot, and skew) associated with an uncalibrated control system. Mathematically the system is a Sigma type feedback with some weighting, which is the same algorithm as a cheap DAC in a $5 disposable CD player. I'd like to see a properly stabilised PID controller implementation, one time tuned, with proper damping... for each of the two scales (host scale and gloabal app version scale). The current implementation appears to meet the requirements for Chaos, which are 'sensitivity to initial conditions', and some 'stirring' by the way validation times mix up estimates/avergaes, and self similar looking oscilaltions. (different to 'random')

#3) No Multithreading support: Milkyway users complained when MT plan class applications divided credits. Obviously Elapsed time 'per resource' should be being summed, making it greater than wall-time. It's used directly

#4) Overly sensitive to Task estimates: This really, in control theory, is a side effect of the earlier items. The task estimates only need to be a theoretical minimum (usually established from numerical computing theory). The problem here is that system is then receiving shorter elapsed times then expected (i.e. impossible), then downscaling estimates for everyone (feedback), and so attempting to convince us that magic is real, as opposed to some of the systems assumptions being faulty [such as the faulty assumption that stock CPU applications have no parallelism].

The old system used a credit multiplier of 2.85, which was removed/disabled. That would have been about right for SSE-SSE3 type SIMD implementation.. since AVX for multibeam we need a larger figure I believe around 3.3, but it'll be nicer to maintain the working - self scaling, self correcting - aspects of creditNew, but fix those assumptions, and damp the oscillations. Doing those things and embedding a little foresight toward upcoming technology change, will probably be a wise investment now, more useful long term than just piling on more bandaids and workarounds.
15) Message boards : News : SETI@home now supports Intel GPUs (Message 1498401)
Posted 14 days ago by Profile jason_gee
Yeah, in theory it's possible to make a device independent OpenCL app. But thus far it doesn't seem to have been done. Jason, Josef or Raistmer will correct me if I'm wrong about that.


That's one of the challenges I'm currently facing, moving X-branch x42 toward fully dynamic heterogeneous capability.

There are a number of roadblocks, but none insurmountable. Most of those actually revolve first around boincapi and client limitations, then the four known credit System design flaws currently under detailed analysis.

The only barriers that really revolve around OpenCL per se, are slightly different implementations by hardware vendor, which Raistmer has already shown are manageable separately, so can be managed with unified conditional code down the line.

Also The Cuda builds cope with 6 quite different GPU architectures, though there are OS induced latency issues to be solved there, by refocussing on new ways to determine optimal code paths. The literature suggests a combination of traditional ( multibeam + fttw ) wisdom-like runtime optimisation, plus some amount of hardware awareness and narrowed search space by offline (install-time) optimisation (like mobile devices do), and hard dispatch, would yield the best usability results, though require some thought toward different user goals (like efficiency versus throughput versus display responsiveness or application sharing)
16) Message boards : Number crunching : Warning when using 7.3.14 (Message 1498218)
Posted 14 days ago by Profile jason_gee
Part of the problem here, is that under modern operating systems, limits on and with respect to physical RAM make no sense, as all the memory is virtualised. That makes 'working set size' and page commit behaviour technically more relevant, though pretty far removed from typical user perception (Except perhaps for the few that completely disable paging, and understand that up to 50% of Windows memory is non-paged Kernel memory)

As the implementation appears to have been intended, the limit setting, along with subsequent aborts, will appear as nonsense, simply because applications do not use ANY physical memory directly.

If Boinc wants 'safeties', it needs to use ones that make sense, like the obvious:
"received an out of memory exception with XYZ application"... don't send that anymore, and somehow provide info as to why (with a possible reset)... application details page perhaps ?
17) Message boards : Number crunching : Warning when using 7.3.14 (Message 1498113)
Posted 15 days ago by Profile jason_gee
I keep telling people that newer versions don't necessarily make them better.


> Also, it will lead to lots of aborted jobs, which is bad for volunteer morale.

Now that's strange hearing that from Dr. D.A., are you sure that he sent it?

Cheers.


The impressions I've been given from activity on the boincapi and CreditNew fronts, is that Dr. A's quite aware there are problems, and needs development help fixing them due to low resources. I'm not at all surprised that a change to tighten a questionable failsafe, might create more work than it alleviates, so be retracted for a rethink.
18) Message boards : Number crunching : 780ti not faster than a GTX 580 (Message 1498028)
Posted 15 days ago by Profile jason_gee
Yeah that'll do, thanks Juan for finding that, as I'm a bit tied up at the moment.

From the OP's query, Even though I know nothing of this particular app, I can make some comments. If you see other similar hosts/cards achieving better times with similar work & application, the best is to first check every possible difference with available information, like:
- Different App version ?
- Different OS ?
- Overclock / watercooling on host you're comparing against ?
- Different CPU ? (and associated Overclocks etc too)

There are so many reasons a given application may perform on one system with the same GPU, and on another not. Working out where the key differences are is a start, and extends through these levels and back:

1) GPU-application code (was it made scalable to the new GPU architecture?)
2) GPU library runtimes (different version of Cuda/OpenCL?)
3) Driver, User mode Level. ( IS the device being shared ? is the driver beta, outdated, or other wise flakey)
4) OS Driver Level (as with #3, but add in if OS is changing, or misconfigured, such as power saving settings... e.g. Windows is changing also, to share nicer between applications. )
5) Other Hardware drivers. ( Are any devices in the system using outdated BIOS, Firmware [especially SSD] , Drivers... especially PCI Express with generic Microsoft drivers dated 2006 ? )

So there's quite a lot to narrow down before a specific problem/fault could be pointed at, but hopefully gives some ideas.
19) Message boards : Number crunching : GPU Detection Failed, Error Code 1073740940 (Message 1495814)
Posted 20 days ago by Profile jason_gee
In general development my end, some business associates (non-boinc-seti related) have read me the riot act, demanding that x42 be a complete 'proper' refactor/re-engineering effort, and cross platform (WIN/Linux/MAC/Arm-Cuda-readyish). That's slowed things quite a bit, particularly with the amount of wrapping of Boinc libraries I'll have to do to meet standards...

I thought that was rather precocious of those business associates, considering you're an (as afar as I know) unpaid volunteer developer, I would hope there was some payment, or at the very least some hardware,

Which brings us to this post at Einstein:

http://einstein.phys.uwm.edu/forum_thread.php?id=9686&nowrap=true#130145

Maybe whowever those business associates are, they'll supply you with one of those Jetson TK1 Development Kits, ;)

Claggy


LoL, I thought so too. Periodic contract work that pays the bills, and no not for nVidia, so I do what they say (If I want more work ;) ).

Still haven't received my T-shirt for helping NV with Cuda 6 development, so I'm not counting on any hardware. Those prototype ARM+Kepler GPU prototyping boards were sent out, I believe, to test Cuda + ARM toolchain functionality in advance of Maxwell, (750ti being the architectural test of the GPU portion) though that's full of surmising.
20) Message boards : Number crunching : 194 (0xc2) EXIT_ABORTED_BY_CLIENT - "finish file present too long" (Message 1494667)
Posted 22 days ago by Profile jason_gee
I'm not sure whether that's the same problem or not. It looks as if the program reached the normal completion point, but couldn't close down properly for some reason. So, it started again, and crashed on the restart.

Still, lots of lovely debug information logged, so I'll save that for Jason - looks like the replacement wingmate is a very fast host, so it may disappear too soon.

Got one last evening similar to Juan's, on a stock cuda42 task, 3453302106, if you you want to grab the debugging info before it vanishes.

Based on the timing, I think I caused this one myself, by shutting down and restarting BOINC several times while testing a theory related to the "Zombie" AP tasks that started this thread in the first place. (I'll probably post more on that theory in a separate thread when I can.) It seems as if BOINC actually shuts down before the applications do, and this task must have managed to "finish" after BOINC was already terminated. It then restarted at 86.16% but with the "finish file" already present.


Yeah, Boinc client likes to aggressively terminate the processes if it doesn't hear what it wants to hear in fixed time periods (i.e. That's correct termination if the client killed the app, aborting the task.).

Unfortunately that means rapidly starting/stopping can induce modes of failure totally outside of the scope of the application or boincapi embedded in it. This case is solely in the Boinc client's court (complete with some design issues).

That's especially likely when we're talking multiple tasks being asked to shut down while having been still ramping up. That's simply because the system will be under high contention (loading from disk etc), and possibly not yet even reached main code where it can listen for exit requests.

That's where Boinc has some issues with 'thread-safety', and fixed assumptions about time, that really don't apply except on 'Real-Time' operating systems like those used in car computers.


Next 20

Copyright © 2014 University of California