Posts by jason_gee


log in
1) Message boards : Number crunching : 1073741205 Error Code (Unknown Error) (Message 1796990)
Posted 12 days ago by Profile jason_gee
Just a sidenote/point-of-order: I'd advise keeping the applicable update in place, because trying to launch processes during shutdown is a legitimate vector for malware. Perhaps that's not so much a concern for dedicated crunching machines, but in the larger scheme of things it's something the Boinc client needs to change its behaviour with.
2) Message boards : Number crunching : GPU Wars 2016: News & Rumors (Message 1796258)
Posted 15 days ago by Profile jason_gee
already ? Seems soon for ti/Titan, but I suppose many are waiting.
3) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1795935)
Posted 17 days ago by Profile jason_gee
We're going to a new world.


Oh glad you see it too. I think I'm proud to live in these times.
4) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1795934)
Posted 17 days ago by Profile jason_gee
A task is just a bunch of bytes.

On that level, a computer program is just a bunch of bytes, too. But a complete specification of a task comprises both the data bytes, and the processing bytes, even if expressed as a choice rather than a prescription.


Well not quite. The computation complexity may remain order N nomatter what program/hardware, but the communications (i.e .memory) complexity may be reduced to zero by having infinite registers. In that sense, for cobblestone scale we only award computation, since memory transactions are more or less arbitrary. [i.e. the defined+paid work is computational work, rather than communications work]
5) Message boards : Number crunching : Are some gpu tasks longer now? (Message 1795199)
Posted 20 days ago by Profile jason_gee
Hopefully things will work out.
BTW, I just downloaded another copy of the sah_v7_opt folder and I'm still getting the same error with the PetriR_raw2 files;
Undefined symbols for architecture x86_64: "cudaAcc_GetAutoCorrelation(float*, int, int)", referenced from: seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o "cudaAcc_FindAutoCorrelations(int, int)", referenced from: seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1


Probably one of the first things I'll end up looking at, because the autocorrelation streamlining is one of the safest areas, and should give a near constant improvement at all angle ranges.
6) Message boards : Number crunching : Are some gpu tasks longer now? (Message 1795192)
Posted 20 days ago by Profile jason_gee
...But you might consider the possibility that 'application launch order' might affect queuing, somewhere down the line.


Here is some of the detail from the Cuda handbook, that pertains specifically to Windows WDDM (Vista+ drivers):

...On WDDM, if there are applications competing for time on the same GPU, Windows can and will swap memory objects out in order to enable each application to run. The Windows operating system tries to make this as efficient as possible, but as with all paging, having it never happen is much faster than having it ever happen.
7) Message boards : Number crunching : Are some gpu tasks longer now? (Message 1795158)
Posted 20 days ago by Profile jason_gee

But I have a suspicion that the newer and larger the GPU, the greater the slowdown. I'll try and test that next time I have a gap between GPUGrid tasks on my GTX 970.


You are right. Low ar makes pulsefinding to run on one SM/SMX on NVIDIA GPU's. When PoTLen == PulsePoTLen the work can not be (currently) divided to all SM units. So the hit is 16x on 980, 12x on 780, 5x on 750, etc. depending on the number of SM units on the GPU.

I have done some experimenting with my 1080 and it runs guppi vlar units in about 200-300 seconds. But is has an issue with not finding all pulses or finding too many pulses.

Would it be possible to make this change to the Baseline App and see if it still had problems finding the correct number of pulses? From my experience the Baseline App is very accurate and might be useful very quickly if all the SMs could be used. Right now it seems the problem with the SIGBUS Errors I was having is related to the OS. The Apps compiled in Mountain Lion don't produce any Errors when compiled with Toolkit 7.5. So, for now it appears the problem with SIGBUS Errors can be avoided.


Possible. This weekend for me is to involve direct comparisons between Petri's modifications and Baseline sources, then injecting the least-risky/widest-compatibility/biggest-impact components. Whether or not the strange pulses are a simple precision change, or a logic breakage somewhere, I won't know for a while. Either way the Logic changes Petri and I chatted about seemed headed down the right path to me, so whatever the weirdness is will likely turn up along the way.
8) Message boards : Number crunching : Are some gpu tasks longer now? (Message 1795142)
Posted 20 days ago by Profile jason_gee
When two cuda50 tasks are running on the same GPU, fairly obviously, one will have started before the other - by anything between a fraction of a second and several minutes. It seems to me that the first to start consistently runs faster. This property is inheritable: when the first starter finishes, the second task becomes the 'first to start' and runs faster. A third task will start, becoming the 'second starter' for the time being, and accordingly run slowly.

I don't think that's purely the result of non-linear progress reporting (progress %age reporting moves more slowly at the start of the task), but it's easy to confuse it with that and I might have been confused. But you might consider the possibility that 'application launch order' might affect queuing, somewhere down the line.


In the Cuda handbook publication, it explains there is only one DMA engine, so some software pipelining needs to happen if multiple threads or processes (with their own threads) want to use the device concurrently. In Petri's case he's raising efficiency and hiding latencies with Cuda streams, such that optimal is a single instance. In my experience the latencies of the simpler model on Linux are smaller to start with. Whether on not these aspects change with Pascal & newer Linux+drivers, no idea as yet.

[Edit:] correction Kepler+ have two, but they are different priorities, and probably saturating with many small requests in baseline code + multiple instances/apps. Upping transfer sizes to over 4MiB for Fermi+, and doing some pipelining anyway, will probably improve things down the line.

Because the command buffer is shared between engines, applications must “software-pipeline” their requests in different streams...

So 'Classic' (Baseline) Cuda code is more likely to 'fight' under the demands of the new tasks.
9) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1794999)
Posted 20 days ago by Profile jason_gee
Actually, the error message is on line 232 of the current file. Are we using an outdated version of seti_header.cpp?

Apart from Jason's provision for Android in 2014 (r2181), most of the file size growth was Eric's r3113 and r3212 for GBT - and the file does "Write a SETI work unit header to a file".


Not sure what missing a parameter addition would do. Is that a possibility for this build?
10) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1794997)
Posted 20 days ago by Profile jason_gee
I believe that type of failure and exit status is the first I've experienced. I am running the latest beta BOINC Manager 7.6.29(x64) which I believe has had some code changed recently to fix Manager exits compared to the last stable release 7.6.22(x64). Richard probably could say just what the code jockeys played with in the latest beta.

[Edit] Looks like my copy of the beta is not the latest now. We're up to 7.6.33(x64)


Yeah, not a 'normal' situation IMO. Would need to be reproducible on demand to localise better.
11) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1794993)
Posted 20 days ago by Profile jason_gee
I actually think it was the BOINC shutdown that froze on exit and then blue-screened the computer that did it. Strange thing is that I always wait till a quiescent period in BOINC activity before I initiate a shutdown. That means no work units are close to finishing, all recently completed work units have successfully uploaded and BOINC is not close to asking for network communication. Only when all those cases are met do I shutdown the Manager and close the client. I can only conclude that BOINC was reading those tasks when the computer blue-screened.



I'd class that as possibly reproducible [Rather than Eddy/Eddie]. Can you try that ? (could take substantial hammering :) )
12) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1794991)
Posted 20 days ago by Profile jason_gee
So does that mean that computer mangled the work units just when it grabbed them for processing?

Many possible layers between the server and client CPU, from download through reading from disk.

And we have seen that error message before, in other applications including CUDA, with no conclusive evidence that the data file has suffered any corruption at all.

It seemed (IIRC) to be more prevalent on task restarts than initial runs. I think that the code generating that error message dates from the original Berkeley CPU code: checking that for trigger points might give us a better handle on what's really happening under the hood.


Any prevalence more common than about once every 3 months on a given host, would indicate either a configuration, system or indeed client or application issue. Less frequently than that on sub-workstation grade componentry indicates noise (radiation).
13) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1794987)
Posted 20 days ago by Profile jason_gee
So does that mean that computer mangled the work units just when it grabbed them for processing?


Many possible layers between the server and client CPU, from download through reading from disk.
14) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1794981)
Posted 20 days ago by Profile jason_gee
0xC0000018
STATUS_CONFLICTING_ADDRESSES
{Conflicting Address Range} The specified address range conflicts with the address space.


Probably if not repeatable, then a genuine bitflip (e.g from cosmic rays or radioactive carbon in the processor/ram). Workstation grade components with ECC memory reduce the probability of that. We've been referring to that as ' "Eddys in the spacetime continuum", "Eddie Who's Eddie?", "No Not WHo's Eddie, What's Eddie?", "What? What's Eddie doing in the spacetime continuum ?"
15) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1794678)
Posted 21 days ago by Profile jason_gee
Don't see why "stock" should be made deliberately slower than "optimized".

See- Why need 5 different stock AMD OpenCL GPU applications? thread.


Exactly
16) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1794587)
Posted 21 days ago by Profile jason_gee
If you have one card running an OpenCL App, and another running a CUDA App, both tasks will be slowed Waaayyy Down.

Different tasks, on different cards, with different applications significantly affect each other?


On newer OSX, there appear to be the largest (system/driver-stack) latencies involved that I've come across on the 3 platforms so far. That's going to require scaling up everything to reduce and hide them effectively. Fortunately I *may* have found some way to get meaningful utilisation data on this platform, where monitoring tools for NV are quite limited (to be confirmed/rejected when I can). Petri's approach with Cuda streams should ultimately have the biggest impact on this platform.
17) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1794410)
Posted 22 days ago by Profile jason_gee
Obviously, that leads to the first such application being a bit of a doormat - allowing other projects to trample all over it. I think experienced users would know how to handle that and micro-manage their own machines to suit the new behaviour, but it could be a bit of a problem with a stock deployment for the public at large.


Acting like a 'normal/familiar' application would be the first priority, since they 'work'.

Mmmm, Well I'm not so sure that the simplistic model Boinc uses for resource management isn't the one thing it does well, even if not ideal. It wants me to run these tasks, with such and such resources, say no more.

I think if 'my' application was trampled on, I would rather yield than cripple a host. That's why standard default (single main device) Cuda applications will probably remain simple/familiar blocking-sync style, at least until the more sophisticated examples can at least equal the behaviour with minimal configuration (and preferably better it).

Naturally part of the adaptation process for better/simpler behaviour will have roadblocks, though I don't consider the ridiculous estimates as a huge one in the scheme of things. More challenging to me is demonstrating why a single process [and single task] per resource is increasingly a bad idea.
18) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1794404)
Posted 22 days ago by Profile jason_gee
Yes - not specifically Resource Share (that's more of a long term objective that can be sorted out later), but overcommitment of resources. If SETI needs extra CPU for a certain work process, and another project has been given the green light by BOINC to use a full core's worth of CPU time, what happens?

At best, both tasks use the CPU, with a bit of thrashing as they swap contexts. That's well established in the CPU world, and both should make progress, even if slower than expected.

Other approaches exist, like the precautionary one being suggested for OpenCL: "reserve a full core for the app, whether it actually is going to use all the power of a core or not". That was the flaw I questioned when the idea of supporting VM tasks under the BOINC framework was described at the 2011 BOINC Workshop in London. Since the VMs handle their own despatch (and more - all their own communications as well), the outer BOINC layer doesn't know whether the inner VM is actually using its reserved resource - and hence, can't offer it to another process when idle.

Now that VMs are actively being used by CERN projects, I see that same question has come up again in discussion, bot I don't think it's been answered yet.


Yeah, that's where the reed bending bit comes in. Users and applications know more about the tasks and their hardware (provided sufficient information) than Boinc needs to do to do its job. That's project/application/task domain specific knowledge. Since the first (Boinc-enabled) application of its kind will know what's going on, through the user and dispatch-support-tools, it has the responsibility to yield in the same way the individual applications normally would (or hopefully better).

The only 'Real' functional difference underneath, is precisely about overcommit and contention. Can spontaneously decide to shrink to a low priority single CPU core, go full throttle, or some intermediate, depending on what Boinc gives, or what the user/tools ask for.
19) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1794398)
Posted 22 days ago by Profile jason_gee
I think Richard is more worried about resource share, if a heterogenous app grabs devices BOINC has earmarked for other project's apps?


Don't grab anything not issued to the stub apps :D

but then the stub app needs to know what it has available and tell boinc what it might want.

In effect, the app could say 'please give me as many GPU and CPU cores as possible' and boinc has to be able to reply 'ok you can have x CPU and y GPU cores' with an option for 'excuse me can you free up another core ' or ' have another'...

That's quite a revamp of current scheduling...


Not really, think about it.

Current clients starts n applications with xyz resources (total)

I can make those n applications stubs, send them to sleep other than to periodically update progress and/or state/checkpoint, and hand over the resources to another process.

Nothing that technically the current applications don't do with worker threads, they're just limited to 1 worker per process. Heterogeneous app would use the same total set of resources, and gain the advantage of being able to dynamically respond if Boinc adds or removes a resource mid processing.
20) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1794396)
Posted 22 days ago by Profile jason_gee
I think Richard is more worried about resource share, if a heterogenous app grabs devices BOINC has earmarked for other project's apps?


Don't 'Grab' anything not issued to the stub apps [collectively] :D

Feyd-Rautha: [whispers] You see your death. My blade will finish you.
Paul Atreides: [voiceover] I will bend like a reed in the wind.
-- Dune


Next 20

Copyright © 2016 University of California