Message boards :
Number crunching :
benchmark stock vs. optimized -- problem
Message board moderation
Author | Message |
---|---|
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
I thought it would be a straight forward test, but... It's in a Linux x64 system, GTX650, and it has been running the optimized application, setiathome_x41g_x86_64-pc-linux-gnu_cuda32, for many months with no problems. Boinc 7.4.23 and Nvidia driver 352.41 . So, with the recent release of the stock application, ..7.08...opencl_nvidia_sah I thought it would be fun, and maybe instructive, to run a benchmark of those two applications. I'm using the Lunatics v2.01.08 (Linux) benchmark package. I've used this before to compare stock vs. optimized AstroPulse. I set up one as an APP and the other as the REF_APP, copied a work unit, and put the libcudart and libcufft libraries in /usr/lib where they would be found (and strace says they were found and read), suspended all boinc tasks just to avoid resource contention, and then let ./benchmark run. (as root, in directories far from the boinc projects tree). To my surprise, the "new" stock 7.08 application ran fine; but the "old" x41g version failed after 1 second with a seg fault (exit code 193). Evidently I am missing something like an environment variable or a command line parameter that is needed in the benchmark context. I am stumped. What on earth can cause a seg fault (in the benchmark context) for an application that runs perfectly well otherwise??? |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
Have you checked that both apps have "execute" permissions set ? _\|/_ U r s |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
And lets get a test in on the newer beta release out there as well. http://www.arkayn.us/forum/index.php?action=tpmod;dl=item132 Jason should be getting another version ready fairly soon, just as soon as RL stops dumping paying work on him. |
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
Urs-- Yes, even 777 permissions on files and directories. If not executable permission I'm sure Linux would have reported either "permission denied" or "file not found." ...see next paragraph... Arkayn-- Lunatics benchmark v2.01.08 GTX 650, AMD FX-4300, Nvidia 352.41 NO command line options for either application. all BOINC projects suspended. /APPS/ = setiathome_x41zc_x86_64-pc-linux-gnu_cuda60 /REF_APPS/ = setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah (with MultiBeam_Kernels_r2936.cl in working directory) Run times: <7.08> 792 seconds; <x41zc> 766 seconds rescmpv5_1 reports Results Strongly Similar Q=99.95%. Still puzzling over the seg fault of x41g application... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Still puzzling over the seg fault of x41g application... A while back there was a rash of Windows + MaxWell GPU errors related to Cuda 3.2 that I never managed to find the time to track down, and instead requested the project stop sending Cuda 3.2 to Maxwell class GPUS. While the 650 isn't a Maxwell,, I see no reason there couldn't be similar driver or OS change over time that would reveal any similar limitation on Linux. IOW the Cuda60 build probably just works there because it's newer and you did updates to other parts of the system, while Cuda 3.2 seems to be in some kindof decay. AGain still not sure the origins of that, but for the purposes of performance and accuracy on Fermi and Kepler class GPUs, you can safely assume that x41zc is from the same to better than x41g. Since I didn't make the Cuda 3.2 build personally, I would check ldd to ensure the libraries it needs are present. The [Cuda60 experimental] test build on my site has been getting the best unsolicited feedback of all the Linux builds I personally made so far, so worth comparing. As Arkayn mentions work is underway, though very much at a snails pace with work pressures. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
The [Cuda60 experimental] test build on my site has been getting the best unsolicited feedback of all the Linux builds I personally made so far, so worth comparing. As Arkayn mentions work is underway, though very much at a snails pace with work pressures. 'my site' being http://jgopt.org/download.html If the tortoise gets there, I don't see any reason for the snail not to get there. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
Regarding the Strongly Similar comparison mentioned in a previous post, just up the page a few... The differences are in <peak_power> and <mean_power> at the 7th significant digit. I expect that to be due to differences in arithmetic ordering, or some other innocuous side effects. #jason Just to clarify. The x41g application is running fine, without errors in any validated results, in my normal boinc/seti directories and work flow. It is only in the benchmark context that x41g exits with SIGSEGV. So, obviously, ldd has no complaints. Top lines of stderr.txt (for SIGSEGV exit) follow: shmget in attach_shmem: Invalid argument 11:21:08 (28555): Can't set up shared mem: -1. Will run in standalone mode. setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 650, 1023 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 2 clockRate = 1058500 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 650 is okay SETI@home using CUDA accelerated device GeForce GTX 650 SIGSEGV: segmentation violation Stack trace (24 frames): ./setiathome_x41g_x86_64-pc-linux-gnu_cuda32(boinc_catch_signal+0x65)[0x551265] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f719749f8d0] /usr/lib/x86_64-linux-gnu/libcuda.so(+0x3b8bd0)[0x7f7192898bd0] ...snip... Comparing that with "normal" stderr it seems that the application doesn't get past the GPU initialization. The next lines should be --- Cuda Active: Plenty of total Global VRAM (>300MiB). All early cuFft plans postponed, to parallel with first chirp. I am experimenting with strace and valgrind to get more insight into the situation. So far, more confusing than illuminating. As an aside... in the normal boinc/seti task processing the contents of the <app_info.xml> file somehow gets used but I don't see any way to pass that information in the benchmark script. Maybe it's not needed in the benchmark context. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I see. Noting the crash was in pthreads, it looks as though the Cuda library and/or pthreads itself might be balking at boincapi's exit procedure, in a similar way as Windows does but with different symptoms. Anyway, it's a working theory and incentive to harden that part of code. Unfortunately that's in Boincapi territory, which might mean I'll have to customise Linux Boinc as well. The devs seemed pretty adamant that no projects multithread and they weren't going to fix the threading model, so we might be stuck with that behaviour until I can get out of some work for a stretch, and port the boincapi customisations to the other platforms. Glad it seems to work OK live, though obviously it means yet again I'll be working on something that I shouldn't have to be. I'm OK with that really, just annoyed with myself that I didn't push hard enough to get this fixed. [Edit:] I did partially tweak the boincapi in the 6.0 build. If that works as expected I can do similar for other builds. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Maybe it's not needed in the benchmark context. Correct, you use the benchmark script's command line facilities, where you can add multiple variants of command line for a given app ( in the readme IIRC). There are also some properties the client would normally feed in the init_data.xml (like checkpoint period) The error out in question though is still most likely a hard error exit out that choked inside some Cuda library call like CUFFT Planning (possibly due to low memory) but sigsevs out because helper threads get buffer objects freed underneath them. Can't really help that without a few tweaks to the code, mentioned in the previous post. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
#jason Extrapolating from the Windows environment, which is all I have knowledge of: The modern way for BOINC to pass task parameters to a science app in normal running is via a file called 'init_data.xml'. The Windows version of the bench script package (let's call it that, rather than benchmark, to avoid confusion) is supplied with a cut-down version of init_data.xml (the full thing is immensely bloated, and contains a lot of information the science app has no business knowing). But buried in there are important science processing parameters, like which GPU to use (which type, and its device enumeration in both native and OpenCL modes). So I'm wondering if the different between running and test modes might be some required initialisation data which is missing from the test bench version? You would need to compare a working copy of init_data.xml (found in the slot directory) with the supplied test version, and copy across anything "relevant": but be warned, that may become an exercise in looking for a needle in a haystack. It might be easier for Jason to list the minimum required elements that init_data.xml needs to supply. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
It might be easier for Jason to list the minimum required elements that init_data.xml needs to supply. Going from extremely vague memory ( yeah that's about how long since I looked in one of those files) about the only relevant parameter in there I can think of the m_nbytes, which is the host memory requirement, which should be low for any practical purposes of any machine capable of driving a Cuda capable GPU, especially under bench. To me it still looks more like this sequence: - Startup - Initialise A Cuda device - There is less that 256MiB (or whatever) total VRAM on the device, so do CUFFT plans early for paranoia - Die Nicely when those CUFFT Plans fail. The Last step, which should be a temp exit (but in a dated build could be a hard exit) could be terminating via standard boincapi, which would be a problem as that tends to kill evidence of where things got to before failure. Why those might fail under bench, but not live running, is a mystery I'll certainly have to think about. The Cuda 3.2 build is quite dated so has some variables like driver stability and boincapi revision to check out. Assuming there is actually sufficient VRAM, then about the only thing there that jangles some memory neurons is that around that time someone was messing with the heartbeat mechanism, so inexplicably violent suicide seems plausible. Will investigate as I'm doing regression tests on some sanity check rebuilds soon. I guess either the problem will appear, or it won't. In either case, it'd be helpful if some 256MiB GPU people could hang around, because our tester's last one died a few weeks back. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Assuming there is actually sufficient VRAM, then about the only thing there that jangles some memory neurons is that around that time someone was messing with the heartbeat mechanism, so inexplicably violent suicide seems plausible. Will investigate as I'm doing regression tests on some sanity check rebuilds soon. I guess either the problem will appear, or it won't. In either case, it'd be helpful if some 256MiB GPU people could hang around, because our tester's last one died a few weeks back. Again by analogy with Windows behaviour, the first output from a bench run should be [ stderr ] and standalone mode should disable all expectations of heartbeats, PIDs, or any such reliance on a live client. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
and standalone mode should disable all expectations of heartbeats, PIDs, or any such reliance on a live client. Yes. and for that build we're talking unmodified boincapi. I suspect verifying modes of failure on pressured systems is going to be tough, and I may be forced to raise the lower limit to 384MiB (and get one of my 9600 GSOs back) unless a viable test subject appears. Hi Ben, hate to ask, but it looks like some of my builds are dying weirdly on low-end GPUs. Can I have one of the 9600GSOs back ? [Edit:] Can I see the ldd output please ? just on the off chance driver install linked to system libraries instead of the supplied ones in the bench folder. [Edit2:] next step after verifying ldd finds the libraries in the bench folder, would be comparing the md5 checksums of those libraries against those in the seti project folder, which work. 2 points of failure eliminated if the libraries are found and no difference. damaged cufft or cuda library in the bench folder seems possible. [Edit3:] Ben's put aside one of the cards, and will be seeing him tomorrow, so should be able to factor Pre-Fermi + Ubuntu tests into the current development cycle, though unclear if I'll be able to replicate the fault precisely. [maybe if I load up the VRAM with something, hmmm] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
- There is less that 256MiB (or whatever) total VRAM on the device, so do CUFFT plans early for paranoia Gene posted this earlier: setiathome_CUDA: Found 1 CUDA device(s): How's the file names of CUDA libs? Is there any possibility that in bench x41g is picking wrong CUDA version and crashing because of that? Gene said he put the libs in /usr/lib and not in bench directory. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Missed the memory figure, thanks. Had apparently been looking at another thread so missed this was Kepler class. . That makes attempts at replication easier, as the 680's on my Ubuntu machine. How's the file names of CUDA libs? Is there any possibility that in bench x41g is picking wrong CUDA version and crashing because of that? Gene said he put the libs in /usr/lib and not in bench directory. Yes I suspect it's diverting to another set of libraries via a symlink or somesuch, which can get messed up by Cuda driver or toolkit installs etc, pointing to libraries not of the precise version required & supplied. The Supplied libraries should indeed be in the bench folder, as the executable's origin is included in the search path. I can't recall the command to verify 'origin' is in there right now, but will be back onto my Linux machine tonight. Naturally I'll be thinking about the horrible way it dies, and possible ways to handle it better in future, such as manually load the libraries and adding some detail. Missing/incorrect libraries does pretty horrible things on Windows too, So if push comes to shove I'll consider embedding them . [licences permitting... ] [EDit:] objdump -p exename | grep RPATH Should hopefullly reveal an eentry with $ORIGIN the precise filenames required should be revealed by: objdump -p exename | grep NEEDED "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
Here's the ldd from the executable copy in the benchmark directory. gene64:> ldd setiathome_x41g_x86_64-pc-linux-gnu_cuda32 linux-vdso.so.1 (0x00007fff80180000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f94516a0000) libcudart.so.3 => /usr/lib/libcudart.so.3 (0x00007f9451450000) libcufft.so.3 => /usr/lib/libcufft.so.3 (0x00007f944f698000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f944f388000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f944f080000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f944ee68000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f944eab8000) /lib64/ld-linux-x86-64.so.2 (0x00007f94518e8000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f944e8b0000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f944e6a8000) The cuda libs are found in the /usr/lib directory. If I put them in the working directory of "benchmark" they are not found, in fact an strace in that configuration shows no stat, lstat, or open calls to the working directory - only calls to various permutations of "standard" libraries. That's why I put them in /usr/lib, as that is one of the places actually looked in. In the "live" seti directory the libcudart and libcufft ARE in the working directory but the relevant app_info.xml file has identified them, eg. ... <file_ref> <file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name> <main_program/> </file_ref> <file_ref> <file_name>libcudart.so.3</file_name> </file_ref> <file_ref> <file_name>libcufft.so.3</file_name> </file_ref> ... With that app_info.xml information they apparently do not have to be put into a "lib" directory. ldd reports them as missing yet the application finds them anyway. [md5sums of the /usr/lib copies match the originals.] for the application copy in the benchmark directory, objdump -p setiathome_x41g_x86_64-pc-linux-gnu_cuda32 | grep RPATH returns nothing.(?) objdump -p setiathome_x41g_x86_64-pc-linux-gnu_cuda32 | grep NEEDED returns NEEDED libpthread.so.0 NEEDED libcudart.so.3 NEEDED libcufft.so.3 NEEDED libstdc++.so.6 NEEDED libm.so.6 NEEDED libgcc_s.so.1 NEEDED libc.so.6 I think I got all the requested items above. Seeing the reference to init_data.xml I checked the size in a boinc slot and it is >8 Kbytes; in the benchmark package it is 2371 bytes. As noted in one of the posts in this thread there is likely a lot of "extra" stuff there. If there is something specific to try to add to the benchmark template I can do that. I appreciate all the "big guns" jumping on this. And I understand it is not a big, critical, issue since it does not affect the live application. One last wild thought -- can any selinux protection features be restricting the benchmark script? I.e., considering the script as "alien" code subject to restrictions not applied to the application as executed in user space via boinc? Not very likely, I guess, since other application versions do run o.k. via the script. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
OK well that'll be it then I believe (or some twist on it). x41g is before I injected the $origin tag (probably Aaron Haviland's original build) . Either the boinc client or other script might be exporting an LD_LIBRARY_PATH before running the app perhaps, but I guess the normal console bench doesn't. What happens if you move the Cuda libs to the 64 bit subdirectory, (Where it happily picks up other libs) and double check the executable permission on them ? Also is the boinc client executing under a different user account by default ? IIRC Ii run mine under my home directory,/login but am sure that probably isn;t the default ( will have to look at that myself?) Yeah it's been tackling the small mysterious niggles that has made XBranch last. Could be one of those mysteries, but then we do have a development run in progress, so spotting any obvious easy fixes now would be handy. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Jason will remember that we hit similar problems working out which copy of a Windows DLL (library) file would be loaded, if there were multiple files of different versions but the same file name. David Anderson referred us to MSDN: DLL Search Order for Desktop Applications: for *Windows*, the default location was '1. The directory from which the application loaded.', which surprised David, who - as a Linux programmer - expected something different. For Linux, the equivalent bible appears to be Program Library HOWTO: Shared Libraries. I can't be certain, but I don't recall any reference in this thread so far to running ldconfig after copying the new library files. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
For Linux, the equivalent bible appears to be Program Library HOWTO: Shared Libraries. I can't be certain, but I don't recall any reference in this thread so far to running ldconfig after copying the new library files. Yeah, that relates to the ld library path I mentioned, and seems to vary a bit by distribution, as to where the conf files with exports are located. and exact procedure. More just a heads up than anything, in that it does get 'a bit hairy' from there, which is the main reason I injected the executable's origin in the search path for later builds. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
A few more tidbits regarding strace and responding to question about boinc user account (from Jason): I run the boinc client and boincmgr as user=gene in /home/gene/BOINC/... with the executables and cuda libs in ../projects/setiathome.berkeley.edu/ For the benchmark attempt I've put the cuda libs in /lib/x86_64-linux-gnu/ and the result is the same as when they were in /usr/lib/. In both cases ldd is happy and there are no run-time messages about missing libraries. I have the "strace -f -tt -o/tmp/trace ./benchmark" output file, when running the benchmark script as root. It is probably not much help, since it only logs the sys calls and a lot can happen that isn't logged. But for what it might be worth, here is what I see: (1) all the initialization, including library linking, appears to proceed as expected; (2) a "boinc_lockfile" is opened; (3) the (benchmark supplied) init_data.xml file is opened and read; (4) the work_unit.sah (soft link to the real data file) is opened and read; (5) the last sys call issued is: readlink("/proc/12481/exe","/tmp/BENCH/KWSN-Bench-Linux-MBv7"...,128) = 84 ...followed immediately by: --- SIGSEGV {si_signo=SIGSEGV,si_code=MAPERR,si_addr=0} --- Thinking about that readlink call, maybe it is an effect of the MAPERR and not the cause, even though strace logs them in the order shown. The name of the executable is one of the first things written to the stderr as part of the stack trace info. The immediately preceeding call is an ioctl -> /dev/nvidia0. (6) everything after that I interpret as a "normal" error exit, unwinding all the threads and writing stack trace to stderr etc. Meanwhile, x41zc appears a bit better than stock 7.08 so that looks like the way I want to go. (As soon as it is "officially" ready.) I'll go ahead with a couple of larger work units (benchmark) for those two versions and post results here just for anyone's interest. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.