Posts by jason_gee

log in
1) Message boards : Number crunching : benchmark stock vs. optimized -- problem (Message 1732477)
Posted 3 days ago by Profile jason_gee
Yeah, x41zc will be a tad better, and current work going in ( Some from Petri33 and some of my own) will end up a fairly big step.

I'll certainly be trying to replicate soon, if only to completely understand if that behaviour is an artefact of the exe's linkage, or something environmental (and so how to avoid it)

For the purposes of general usage, you may consider x41zc usable live. Our delays in development are mostly just to do with a massive switch to a different team oriented development model, that is giving us a culture shock on top of Real life demands. On the other hand a sanity check Linux build with some cosmetic tweaks is due shortly, which I would add to downloads. All that is though is a baseline build environment check for all the new work going in. It will have some minor cosmetic tweaks, though be more or less functionally equivalent, so using what;s there now is fine if it works for you.
2) Message boards : Number crunching : benchmark stock vs. optimized -- problem (Message 1731894)
Posted 5 days ago by Profile jason_gee
For Linux, the equivalent bible appears to be Program Library HOWTO: Shared Libraries. I can't be certain, but I don't recall any reference in this thread so far to running ldconfig after copying the new library files.

Yeah, that relates to the ld library path I mentioned, and seems to vary a bit by distribution, as to where the conf files with exports are located. and exact procedure. More just a heads up than anything, in that it does get 'a bit hairy' from there, which is the main reason I injected the executable's origin in the search path for later builds.
3) Message boards : Number crunching : benchmark stock vs. optimized -- problem (Message 1731879)
Posted 5 days ago by Profile jason_gee
OK well that'll be it then I believe (or some twist on it). x41g is before I injected the $origin tag (probably Aaron Haviland's original build) . Either the boinc client or other script might be exporting an LD_LIBRARY_PATH before running the app perhaps, but I guess the normal console bench doesn't.

What happens if you move the Cuda libs to the 64 bit subdirectory, (Where it happily picks up other libs) and double check the executable permission on them ? Also is the boinc client executing under a different user account by default ? IIRC Ii run mine under my home directory,/login but am sure that probably isn;t the default ( will have to look at that myself?)

Yeah it's been tackling the small mysterious niggles that has made XBranch last. Could be one of those mysteries, but then we do have a development run in progress, so spotting any obvious easy fixes now would be handy.
4) Message boards : Number crunching : Updating GPU drivers in Linux. (Message 1731626)
Posted 6 days ago by Profile jason_gee
I fear my card is not supported by Linux for now.
Not really a way around win.

What made think that ?

I have read somewhere maybe on Planet 3d now Linux doesN`t support new hardware.

You should be at least able to log in and change drivers.
Thats a minimum to me.

If your old driver does not recognize your new GPU than your only way will be using the command line, the shell.

With nv cards I have to do that every time there is a kernel update, or a driver update, so seems not a unique situation.
5) Message boards : Number crunching : benchmark stock vs. optimized -- problem (Message 1731598)
Posted 6 days ago by Profile jason_gee
Missed the memory figure, thanks. Had apparently been looking at another thread so missed this was Kepler class. . That makes attempts at replication easier, as the 680's on my Ubuntu machine.

How's the file names of CUDA libs? Is there any possibility that in bench x41g is picking wrong CUDA version and crashing because of that? Gene said he put the libs in /usr/lib and not in bench directory.

Yes I suspect it's diverting to another set of libraries via a symlink or somesuch, which can get messed up by Cuda driver or toolkit installs etc, pointing to libraries not of the precise version required & supplied.

The Supplied libraries should indeed be in the bench folder, as the executable's origin is included in the search path. I can't recall the command to verify 'origin' is in there right now, but will be back onto my Linux machine tonight.

Naturally I'll be thinking about the horrible way it dies, and possible ways to handle it better in future, such as manually load the libraries and adding some detail. Missing/incorrect libraries does pretty horrible things on Windows too, So if push comes to shove I'll consider embedding them . [licences permitting... ]

objdump -p exename | grep RPATH

Should hopefullly reveal an eentry with $ORIGIN

the precise filenames required should be revealed by:
objdump -p exename | grep NEEDED
6) Message boards : Number crunching : Updating GPU drivers in Linux. (Message 1731538)
Posted 6 days ago by Profile jason_gee
If it's anything like the bootloader setup on my Ubuntu+Fedora rig, In the boot loader you should be able to press a key for boot options, and select the boot section to edit. Once into the edit thing, add 'text' before the splash option, and it should boot to a textonly prompt to let you reinstall drivers. At one point with nv I did have to put in some weird options as well, but that was only needed with a certyain card and driver series. I suspect you won;t need to go that far, and just getting to a command prompt will do.
7) Message boards : Number crunching : benchmark stock vs. optimized -- problem (Message 1731506)
Posted 6 days ago by Profile jason_gee
and standalone mode should disable all expectations of heartbeats, PIDs, or any such reliance on a live client.

Yes. and for that build we're talking unmodified boincapi. I suspect verifying modes of failure on pressured systems is going to be tough, and I may be forced to raise the lower limit to 384MiB (and get one of my 9600 GSOs back) unless a viable test subject appears.

Hi Ben, hate to ask, but it looks like some of my builds are dying weirdly on low-end GPUs. Can I have one of the 9600GSOs back ?

Can I see the ldd output please ? just on the off chance driver install linked to system libraries instead of the supplied ones in the bench folder.

[Edit2:] next step after verifying ldd finds the libraries in the bench folder, would be comparing the md5 checksums of those libraries against those in the seti project folder, which work. 2 points of failure eliminated if the libraries are found and no difference. damaged cufft or cuda library in the bench folder seems possible.

[Edit3:] Ben's put aside one of the cards, and will be seeing him tomorrow, so should be able to factor Pre-Fermi + Ubuntu tests into the current development cycle, though unclear if I'll be able to replicate the fault precisely. [maybe if I load up the VRAM with something, hmmm]
8) Message boards : Number crunching : benchmark stock vs. optimized -- problem (Message 1731499)
Posted 6 days ago by Profile jason_gee
It might be easier for Jason to list the minimum required elements that init_data.xml needs to supply.

Going from extremely vague memory ( yeah that's about how long since I looked in one of those files) about the only relevant parameter in there I can think of the m_nbytes, which is the host memory requirement, which should be low for any practical purposes of any machine capable of driving a Cuda capable GPU, especially under bench.

To me it still looks more like this sequence:
- Startup
- Initialise A Cuda device
- There is less that 256MiB (or whatever) total VRAM on the device, so do CUFFT plans early for paranoia
- Die Nicely when those CUFFT Plans fail.

The Last step, which should be a temp exit (but in a dated build could be a hard exit) could be terminating via standard boincapi, which would be a problem as that tends to kill evidence of where things got to before failure.

Why those might fail under bench, but not live running, is a mystery I'll certainly have to think about. The Cuda 3.2 build is quite dated so has some variables like driver stability and boincapi revision to check out.

Assuming there is actually sufficient VRAM, then about the only thing there that jangles some memory neurons is that around that time someone was messing with the heartbeat mechanism, so inexplicably violent suicide seems plausible. Will investigate as
I'm doing regression tests on some sanity check rebuilds soon. I guess either the problem will appear, or it won't. In either case, it'd be helpful if some 256MiB GPU people could hang around, because our tester's last one died a few weeks back.
9) Message boards : Number crunching : benchmark stock vs. optimized -- problem (Message 1731418)
Posted 7 days ago by Profile jason_gee
Maybe it's not needed in the benchmark context.

Correct, you use the benchmark script's command line facilities, where you can add multiple variants of command line for a given app ( in the readme IIRC). There are also some properties the client would normally feed in the init_data.xml (like checkpoint period)

The error out in question though is still most likely a hard error exit out that choked inside some Cuda library call like CUFFT Planning (possibly due to low memory) but sigsevs out because helper threads get buffer objects freed underneath them. Can't really help that without a few tweaks to the code, mentioned in the previous post.
10) Message boards : Number crunching : benchmark stock vs. optimized -- problem (Message 1731415)
Posted 7 days ago by Profile jason_gee
I see. Noting the crash was in pthreads, it looks as though the Cuda library and/or pthreads itself might be balking at boincapi's exit procedure, in a similar way as Windows does but with different symptoms. Anyway, it's a working theory and incentive to harden that part of code. Unfortunately that's in Boincapi territory, which might mean I'll have to customise Linux Boinc as well. The devs seemed pretty adamant that no projects multithread and they weren't going to fix the threading model, so we might be stuck with that behaviour until I can get out of some work for a stretch, and port the boincapi customisations to the other platforms.

Glad it seems to work OK live, though obviously it means yet again I'll be working on something that I shouldn't have to be. I'm OK with that really, just annoyed with myself that I didn't push hard enough to get this fixed.

[Edit:] I did partially tweak the boincapi in the 6.0 build. If that works as expected I can do similar for other builds.
11) Message boards : Number crunching : benchmark stock vs. optimized -- problem (Message 1731030)
Posted 8 days ago by Profile jason_gee
Still puzzling over the seg fault of x41g application...

A while back there was a rash of Windows + MaxWell GPU errors related to Cuda 3.2 that I never managed to find the time to track down, and instead requested the project stop sending Cuda 3.2 to Maxwell class GPUS. While the 650 isn't a Maxwell,, I see no reason there couldn't be similar driver or OS change over time that would reveal any similar limitation on Linux. IOW the Cuda60 build probably just works there because it's newer and you did updates to other parts of the system, while Cuda 3.2 seems to be in some kindof decay. AGain still not sure the origins of that, but for the purposes of performance and accuracy on Fermi and Kepler class GPUs, you can safely assume that x41zc is from the same to better than x41g.

Since I didn't make the Cuda 3.2 build personally, I would check ldd to ensure the libraries it needs are present.

The [Cuda60 experimental] test build on my site has been getting the best unsolicited feedback of all the Linux builds I personally made so far, so worth comparing. As Arkayn mentions work is underway, though very much at a snails pace with work pressures.
12) Message boards : News : Andrew Siemion testifies to United States Congress (Message 1731006)
Posted 8 days ago by Profile jason_gee
Enjoyed that a lot :). I watched other pieces hoping to find some geocentrist or creationist testimony, but didn't come across any. I guess this was a select committee ? (Probably for the best I guess)
13) Questions and Answers : GPU applications : NVIDIA GPU CUDA 32 & 42 WU Errors: Task Postponed 180.000000 Sec: CuFFT Plan Failure, Temporary Exit (Message 1730660)
Posted 8 days ago by Profile jason_gee
If those CUFFT plans fail, you are likely running out of video memory, in a way that bothers the CUFFT library's internally managed resources (these aren't in the scope of the app code, but the Cuda libraries)

You will probably need to ensure you free as much VRAM as possible, which can be tricky with Cuda 3.2 onwards, because those libraries do tend to bloat to use more, with Cuda version.

The other option is to force use of the Cuda23 build by using the Lunatics Installer.
14) Message boards : Number crunching : Alternative Instead Of BOINC v7.6.9? (Message 1730317)
Posted 9 days ago by Profile jason_gee
We can get stuff done to the client, but we can't tell the projects what to do.

Had to think about this for a bit in various contexts. An 'ideal' (unrealistic) solution would involve both the application and client behaving exactly as expected. I say unrealistic because we know there are and always will be misbehaving applications, and known buggy clients in circulation.

So I 'feel' that a more tolerant client solves a lot of problems. For a trivial example, why not an adjustable file open timeout in cc_config ? (as opposed to hard wired 5 seconds magic number, it can still default to 5 seconds)

From the point of view of getting projects or users to update anything, well a rule of thumb I try to use is make something 'twice as good' in some metric (or combination of metrics). Small frequent manual updates can be good and all, for those that have the time, but it seems to me placing the cost of fixes onto the customer might be unreasonable in a lot of circumstances. The extra time up front can be worth it, and I'm not really comfortable with the apparent trend to using the public as alpha testers, particularly with mission critical software.
15) Message boards : Number crunching : Alternative Instead Of BOINC v7.6.9? (Message 1730314)
Posted 9 days ago by Profile jason_gee
yeah the threading/sequence/syncing thing fits into the same problems as the IO and IPC, with OS dependance in each case. Flags *might* work to some extent, though I suspect leveraging C++'s features capable of hiding different implementations under the same interfaces would be better. Then best practices could be followed on each platform. That raises other design choices/issues though, to do with mixing object oriented and non object oriented code. To me that mix is the biggest stumbling block.
16) Message boards : Number crunching : Alternative Instead Of BOINC v7.6.9? (Message 1730313)
Posted 9 days ago by Profile jason_gee
Yeah, I'd say that's a fairly accurate assessment. as one of the two occasions on the Mac that I spotted odd behaviour, the GUI had lost contact with the client, for reasons unknown. Being clear, that's happened once since I setup the Mac Pro ~2 months or so ago.

Having used all three platforms I'm pretty convinced they are all 'as good' as one another with their own purposes in mind, but that one size fits all level solutions are likely to be problematic, when you're trying to use low level mechanisms (like IO and interprocess communications).
17) Message boards : Number crunching : Alternative Instead Of BOINC v7.6.9? (Message 1730302)
Posted 9 days ago by Profile jason_gee
BOINC has released a new version of the BOINC client (7.6.6) which fixes a known bug causing ~3% validation errors on MilkyWay@home and other projects. Please update your clients soon to fix this issue.

(though they failed to mention that the fix is required for Windows only)

Also, Fully aware they (Milkyway) probably aren't software engineers, I'd just like to point out that the language used is a little imprecise. (Which I happen to know can throw some people for loops, especially if there is a language barrier):

The original problematic (At least on Windows) implementation was as designed, therefore not a bug. Second,, it appears to be not a 'design level fix' but a workaround. Not trying to be nitpicky, just more concerned with root causes than symptoms at this point.

Some totally different symptoms appear with 7.4.42 on Mac OSX, that seem possibly connected to the same or similar design issues, but more research is needed on my part before putting forward suggestions.
18) Message boards : Number crunching : NVIDIA 355.98 (Message 1728837)
Posted 14 days ago by Profile jason_gee
I suspect most driver work would currently be directed toward DirectX12 refinement, and Cuda 7/7.5. Not sure if there would be much (if any) change with respect to the Cuda apps, though I suppose if on Win10 and/or using OpenCL apps there might be something important in them (or not)
19) Message boards : Number crunching : Need a Little Help With MB CUDA Settings in Linux (Message 1728836)
Posted 14 days ago by Profile jason_gee
Ooh, I can use that tip too, thanks for that :) App side will definitely have to work out something that can be at least somewhat consistant across platforms. probably won't happen short term, but will probably factor into next major design change.
20) Message boards : Number crunching : Need a Little Help With MB CUDA Settings in Linux (Message 1728813)
Posted 14 days ago by Profile jason_gee
Hi FawkesGuy,
For Linux builds the graphic driver model is considerably different, enough that those settings don't make sense, so are not used (and the mbcuda.cfg in fact is never read there or on Mac.

About the only possible setting that would apply at this time, is some priority setting capability, but that is also too different to use the mechanism, and automated re-nicing should be used if desired.

As we move to a more unified and automated development system, some of these things will either change entirely in versions to come, be removed and replaced with better more general mechanisms, and/or the (Windows) problems those settings address (not that well) avoided by engineering a completely different mechanism of action.

Next 20

Copyright © 2015 University of California