benchmark stock vs. optimized -- problem

Message boards : Number crunching : benchmark stock vs. optimized -- problem
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1730822 - Posted: 1 Oct 2015, 21:02:09 UTC

I thought it would be a straight forward test, but...

It's in a Linux x64 system, GTX650, and it has been running the optimized application, setiathome_x41g_x86_64-pc-linux-gnu_cuda32, for many months with no problems. Boinc 7.4.23 and Nvidia driver 352.41 .

So, with the recent release of the stock application, ..7.08...opencl_nvidia_sah I thought it would be fun, and maybe instructive, to run a benchmark of those two applications.

I'm using the Lunatics v2.01.08 (Linux) benchmark package. I've used this before to compare stock vs. optimized AstroPulse. I set up one as an APP and the other as the REF_APP, copied a work unit, and put the libcudart and libcufft libraries in /usr/lib where they would be found (and strace says they were found and read), suspended all boinc tasks just to avoid resource contention, and then let ./benchmark run. (as root, in directories far from the boinc projects tree).

To my surprise, the "new" stock 7.08 application ran fine; but the "old" x41g version failed after 1 second with a seg fault (exit code 193).

Evidently I am missing something like an environment variable or a command line parameter that is needed in the benchmark context. I am stumped. What on earth can cause a seg fault (in the benchmark context) for an application that runs perfectly well otherwise???
ID: 1730822 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1730867 - Posted: 1 Oct 2015, 22:23:48 UTC - in response to Message 1730822.  
Last modified: 1 Oct 2015, 22:25:04 UTC

Have you checked that both apps have "execute" permissions set ?
_\|/_
U r s
ID: 1730867 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1730950 - Posted: 2 Oct 2015, 2:09:07 UTC

And lets get a test in on the newer beta release out there as well.

http://www.arkayn.us/forum/index.php?action=tpmod;dl=item132

Jason should be getting another version ready fairly soon, just as soon as RL stops dumping paying work on him.

ID: 1730950 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1731014 - Posted: 2 Oct 2015, 6:31:38 UTC

Urs--
Yes, even 777 permissions on files and directories. If not executable permission I'm sure Linux would have reported either "permission denied" or "file not found." ...see next paragraph...

Arkayn--
Lunatics benchmark v2.01.08
GTX 650, AMD FX-4300, Nvidia 352.41

NO command line options for either application.
all BOINC projects suspended.

/APPS/ =
setiathome_x41zc_x86_64-pc-linux-gnu_cuda60
/REF_APPS/ =
setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah
(with MultiBeam_Kernels_r2936.cl in working directory)

Run times: <7.08> 792 seconds; <x41zc> 766 seconds
rescmpv5_1 reports Results Strongly Similar Q=99.95%.

Still puzzling over the seg fault of x41g application...
ID: 1731014 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1731030 - Posted: 2 Oct 2015, 7:51:55 UTC - in response to Message 1731014.  
Last modified: 2 Oct 2015, 8:01:05 UTC

Still puzzling over the seg fault of x41g application...


A while back there was a rash of Windows + MaxWell GPU errors related to Cuda 3.2 that I never managed to find the time to track down, and instead requested the project stop sending Cuda 3.2 to Maxwell class GPUS. While the 650 isn't a Maxwell,, I see no reason there couldn't be similar driver or OS change over time that would reveal any similar limitation on Linux. IOW the Cuda60 build probably just works there because it's newer and you did updates to other parts of the system, while Cuda 3.2 seems to be in some kindof decay. AGain still not sure the origins of that, but for the purposes of performance and accuracy on Fermi and Kepler class GPUs, you can safely assume that x41zc is from the same to better than x41g.

Since I didn't make the Cuda 3.2 build personally, I would check ldd to ensure the libraries it needs are present.

The [Cuda60 experimental] test build on my site has been getting the best unsolicited feedback of all the Linux builds I personally made so far, so worth comparing. As Arkayn mentions work is underway, though very much at a snails pace with work pressures.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1731030 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1731068 - Posted: 2 Oct 2015, 9:47:58 UTC - in response to Message 1731030.  

The [Cuda60 experimental] test build on my site has been getting the best unsolicited feedback of all the Linux builds I personally made so far, so worth comparing. As Arkayn mentions work is underway, though very much at a snails pace with work pressures.

'my site' being http://jgopt.org/download.html

If the tortoise gets there, I don't see any reason for the snail not to get there.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1731068 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1731406 - Posted: 3 Oct 2015, 6:04:33 UTC

Regarding the Strongly Similar comparison mentioned in a previous post, just up the page a few...

The differences are in <peak_power> and <mean_power> at the 7th significant digit. I expect that to be due to differences in arithmetic ordering, or some other innocuous side effects.

#jason
Just to clarify. The x41g application is running fine, without errors in any validated results, in my normal boinc/seti directories and work flow. It is only in the benchmark context that x41g exits with SIGSEGV. So, obviously, ldd has no complaints.

Top lines of stderr.txt (for SIGSEGV exit) follow:

shmget in attach_shmem: Invalid argument
11:21:08 (28555): Can't set up shared mem: -1. Will run in standalone mode.
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: GeForce GTX 650, 1023 MiB, regsPerBlock 65536
     computeCap 3.0, multiProcs 2 
     clockRate = 1058500 
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GTX 650 is okay
SETI@home using CUDA accelerated device GeForce GTX 650
SIGSEGV: segmentation violation
Stack trace (24 frames):
./setiathome_x41g_x86_64-pc-linux-gnu_cuda32(boinc_catch_signal+0x65)[0x551265]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f719749f8d0]
/usr/lib/x86_64-linux-gnu/libcuda.so(+0x3b8bd0)[0x7f7192898bd0]
...snip...


Comparing that with "normal" stderr it seems that the application doesn't get past the GPU initialization. The next lines should be ---
Cuda Active: Plenty of total Global VRAM (>300MiB).
 All early cuFft plans postponed, to parallel with first chirp.



I am experimenting with strace and valgrind to get more insight into the situation. So far, more confusing than illuminating.
As an aside... in the normal boinc/seti task processing the contents of the <app_info.xml> file somehow gets used but I don't see any way to pass that information in the benchmark script. Maybe it's not needed in the benchmark context.
ID: 1731406 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1731415 - Posted: 3 Oct 2015, 7:01:37 UTC - in response to Message 1731406.  
Last modified: 3 Oct 2015, 7:05:08 UTC

I see. Noting the crash was in pthreads, it looks as though the Cuda library and/or pthreads itself might be balking at boincapi's exit procedure, in a similar way as Windows does but with different symptoms. Anyway, it's a working theory and incentive to harden that part of code. Unfortunately that's in Boincapi territory, which might mean I'll have to customise Linux Boinc as well. The devs seemed pretty adamant that no projects multithread and they weren't going to fix the threading model, so we might be stuck with that behaviour until I can get out of some work for a stretch, and port the boincapi customisations to the other platforms.

Glad it seems to work OK live, though obviously it means yet again I'll be working on something that I shouldn't have to be. I'm OK with that really, just annoyed with myself that I didn't push hard enough to get this fixed.

[Edit:] I did partially tweak the boincapi in the 6.0 build. If that works as expected I can do similar for other builds.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1731415 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1731418 - Posted: 3 Oct 2015, 7:13:44 UTC - in response to Message 1731406.  
Last modified: 3 Oct 2015, 7:21:48 UTC

Maybe it's not needed in the benchmark context.


Correct, you use the benchmark script's command line facilities, where you can add multiple variants of command line for a given app ( in the readme IIRC). There are also some properties the client would normally feed in the init_data.xml (like checkpoint period)

The error out in question though is still most likely a hard error exit out that choked inside some Cuda library call like CUFFT Planning (possibly due to low memory) but sigsevs out because helper threads get buffer objects freed underneath them. Can't really help that without a few tweaks to the code, mentioned in the previous post.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1731418 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1731437 - Posted: 3 Oct 2015, 9:12:28 UTC - in response to Message 1731406.  

#jason
Just to clarify. The x41g application is running fine, without errors in any validated results, in my normal boinc/seti directories and work flow. It is only in the benchmark context that x41g exits with SIGSEGV. So, obviously, ldd has no complaints.

Extrapolating from the Windows environment, which is all I have knowledge of:

The modern way for BOINC to pass task parameters to a science app in normal running is via a file called 'init_data.xml'. The Windows version of the bench script package (let's call it that, rather than benchmark, to avoid confusion) is supplied with a cut-down version of init_data.xml (the full thing is immensely bloated, and contains a lot of information the science app has no business knowing). But buried in there are important science processing parameters, like which GPU to use (which type, and its device enumeration in both native and OpenCL modes).

So I'm wondering if the different between running and test modes might be some required initialisation data which is missing from the test bench version? You would need to compare a working copy of init_data.xml (found in the slot directory) with the supplied test version, and copy across anything "relevant": but be warned, that may become an exercise in looking for a needle in a haystack. It might be easier for Jason to list the minimum required elements that init_data.xml needs to supply.
ID: 1731437 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1731499 - Posted: 3 Oct 2015, 14:43:24 UTC - in response to Message 1731437.  
Last modified: 3 Oct 2015, 14:44:47 UTC

It might be easier for Jason to list the minimum required elements that init_data.xml needs to supply.


Going from extremely vague memory ( yeah that's about how long since I looked in one of those files) about the only relevant parameter in there I can think of the m_nbytes, which is the host memory requirement, which should be low for any practical purposes of any machine capable of driving a Cuda capable GPU, especially under bench.

To me it still looks more like this sequence:
- Startup
- Initialise A Cuda device
- There is less that 256MiB (or whatever) total VRAM on the device, so do CUFFT plans early for paranoia
- Die Nicely when those CUFFT Plans fail.

The Last step, which should be a temp exit (but in a dated build could be a hard exit) could be terminating via standard boincapi, which would be a problem as that tends to kill evidence of where things got to before failure.

Why those might fail under bench, but not live running, is a mystery I'll certainly have to think about. The Cuda 3.2 build is quite dated so has some variables like driver stability and boincapi revision to check out.

Assuming there is actually sufficient VRAM, then about the only thing there that jangles some memory neurons is that around that time someone was messing with the heartbeat mechanism, so inexplicably violent suicide seems plausible. Will investigate as
I'm doing regression tests on some sanity check rebuilds soon. I guess either the problem will appear, or it won't. In either case, it'd be helpful if some 256MiB GPU people could hang around, because our tester's last one died a few weeks back.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1731499 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1731504 - Posted: 3 Oct 2015, 14:58:00 UTC - in response to Message 1731499.  

Assuming there is actually sufficient VRAM, then about the only thing there that jangles some memory neurons is that around that time someone was messing with the heartbeat mechanism, so inexplicably violent suicide seems plausible. Will investigate as I'm doing regression tests on some sanity check rebuilds soon. I guess either the problem will appear, or it won't. In either case, it'd be helpful if some 256MiB GPU people could hang around, because our tester's last one died a few weeks back.

Again by analogy with Windows behaviour, the first output from a bench run should be

[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.

and standalone mode should disable all expectations of heartbeats, PIDs, or any such reliance on a live client.
ID: 1731504 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1731506 - Posted: 3 Oct 2015, 15:04:14 UTC - in response to Message 1731504.  
Last modified: 3 Oct 2015, 15:49:39 UTC

and standalone mode should disable all expectations of heartbeats, PIDs, or any such reliance on a live client.


Yes. and for that build we're talking unmodified boincapi. I suspect verifying modes of failure on pressured systems is going to be tough, and I may be forced to raise the lower limit to 384MiB (and get one of my 9600 GSOs back) unless a viable test subject appears.

Hi Ben, hate to ask, but it looks like some of my builds are dying weirdly on low-end GPUs. Can I have one of the 9600GSOs back ?


[Edit:]
Can I see the ldd output please ? just on the off chance driver install linked to system libraries instead of the supplied ones in the bench folder.

[Edit2:] next step after verifying ldd finds the libraries in the bench folder, would be comparing the md5 checksums of those libraries against those in the seti project folder, which work. 2 points of failure eliminated if the libraries are found and no difference. damaged cufft or cuda library in the bench folder seems possible.

[Edit3:] Ben's put aside one of the cards, and will be seeing him tomorrow, so should be able to factor Pre-Fermi + Ubuntu tests into the current development cycle, though unclear if I'll be able to replicate the fault precisely. [maybe if I load up the VRAM with something, hmmm]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1731506 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1731586 - Posted: 3 Oct 2015, 19:00:51 UTC - in response to Message 1731499.  

- There is less that 256MiB (or whatever) total VRAM on the device, so do CUFFT plans early for paranoia


Gene posted this earlier:

setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce GTX 650, 1023 MiB, regsPerBlock 65536
computeCap 3.0, multiProcs 2
clockRate = 1058500
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 650 is okay
SETI@home using CUDA accelerated device GeForce GTX 650



How's the file names of CUDA libs? Is there any possibility that in bench x41g is picking wrong CUDA version and crashing because of that? Gene said he put the libs in /usr/lib and not in bench directory.
ID: 1731586 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1731598 - Posted: 3 Oct 2015, 19:55:01 UTC - in response to Message 1731586.  
Last modified: 3 Oct 2015, 20:06:42 UTC

Missed the memory figure, thanks. Had apparently been looking at another thread so missed this was Kepler class. . That makes attempts at replication easier, as the 680's on my Ubuntu machine.

How's the file names of CUDA libs? Is there any possibility that in bench x41g is picking wrong CUDA version and crashing because of that? Gene said he put the libs in /usr/lib and not in bench directory.


Yes I suspect it's diverting to another set of libraries via a symlink or somesuch, which can get messed up by Cuda driver or toolkit installs etc, pointing to libraries not of the precise version required & supplied.

The Supplied libraries should indeed be in the bench folder, as the executable's origin is included in the search path. I can't recall the command to verify 'origin' is in there right now, but will be back onto my Linux machine tonight.

Naturally I'll be thinking about the horrible way it dies, and possible ways to handle it better in future, such as manually load the libraries and adding some detail. Missing/incorrect libraries does pretty horrible things on Windows too, So if push comes to shove I'll consider embedding them . [licences permitting... ]

[EDit:]
objdump -p exename | grep RPATH

Should hopefullly reveal an eentry with $ORIGIN

the precise filenames required should be revealed by:
objdump -p exename | grep NEEDED
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1731598 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1731728 - Posted: 4 Oct 2015, 5:58:25 UTC

Here's the ldd from the executable copy in the benchmark directory.
gene64:> ldd setiathome_x41g_x86_64-pc-linux-gnu_cuda32 
        linux-vdso.so.1 (0x00007fff80180000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f94516a0000)
        libcudart.so.3 => /usr/lib/libcudart.so.3 (0x00007f9451450000)
        libcufft.so.3 => /usr/lib/libcufft.so.3 (0x00007f944f698000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f944f388000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f944f080000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f944ee68000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f944eab8000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f94518e8000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f944e8b0000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f944e6a8000)


The cuda libs are found in the /usr/lib directory. If I put them in the working directory of "benchmark" they are not found, in fact an strace in that configuration shows no stat, lstat, or open calls to the working directory - only calls to various permutations of "standard" libraries. That's why I put them in /usr/lib, as that is one of the places actually looked in.

In the "live" seti directory the libcudart and libcufft ARE in the
working directory but the relevant app_info.xml file has identified them, eg.
 ...
       <file_ref>
       <file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name>
         <main_program/>
       </file_ref>
       <file_ref>
         <file_name>libcudart.so.3</file_name>
       </file_ref>
       <file_ref>
         <file_name>libcufft.so.3</file_name>
       </file_ref>
 ...


With that app_info.xml information they apparently do not have to be put into a "lib" directory. ldd reports them as missing yet the application finds them anyway.

[md5sums of the /usr/lib copies match the originals.]

for the application copy in the benchmark directory,
objdump -p setiathome_x41g_x86_64-pc-linux-gnu_cuda32 | grep RPATH
returns nothing.(?)

objdump -p setiathome_x41g_x86_64-pc-linux-gnu_cuda32 | grep NEEDED
returns
  NEEDED               libpthread.so.0
  NEEDED               libcudart.so.3
  NEEDED               libcufft.so.3
  NEEDED               libstdc++.so.6
  NEEDED               libm.so.6
  NEEDED               libgcc_s.so.1
  NEEDED               libc.so.6


I think I got all the requested items above.

Seeing the reference to init_data.xml I checked the size in a boinc slot and it is >8 Kbytes; in the benchmark package it is 2371 bytes. As noted in one of the posts in this thread there is likely a lot of "extra" stuff there. If there is something specific to try to add to the benchmark template I can do that.

I appreciate all the "big guns" jumping on this. And I understand it is not a big, critical, issue since it does not affect the live application.
One last wild thought -- can any selinux protection features be restricting the benchmark script? I.e., considering the script as "alien" code subject to restrictions not applied to the application as executed in user space via boinc? Not very likely, I guess, since other application versions do run o.k. via the script.
ID: 1731728 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1731879 - Posted: 4 Oct 2015, 17:39:58 UTC - in response to Message 1731728.  

OK well that'll be it then I believe (or some twist on it). x41g is before I injected the $origin tag (probably Aaron Haviland's original build) . Either the boinc client or other script might be exporting an LD_LIBRARY_PATH before running the app perhaps, but I guess the normal console bench doesn't.

What happens if you move the Cuda libs to the 64 bit subdirectory, (Where it happily picks up other libs) and double check the executable permission on them ? Also is the boinc client executing under a different user account by default ? IIRC Ii run mine under my home directory,/login but am sure that probably isn;t the default ( will have to look at that myself?)

Yeah it's been tackling the small mysterious niggles that has made XBranch last. Could be one of those mysteries, but then we do have a development run in progress, so spotting any obvious easy fixes now would be handy.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1731879 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1731883 - Posted: 4 Oct 2015, 17:51:32 UTC - in response to Message 1731879.  

Jason will remember that we hit similar problems working out which copy of a Windows DLL (library) file would be loaded, if there were multiple files of different versions but the same file name. David Anderson referred us to MSDN: DLL Search Order for Desktop Applications: for *Windows*, the default location was '1. The directory from which the application loaded.', which surprised David, who - as a Linux programmer - expected something different.

For Linux, the equivalent bible appears to be Program Library HOWTO: Shared Libraries. I can't be certain, but I don't recall any reference in this thread so far to running ldconfig after copying the new library files.
ID: 1731883 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1731894 - Posted: 4 Oct 2015, 18:55:36 UTC - in response to Message 1731883.  

For Linux, the equivalent bible appears to be Program Library HOWTO: Shared Libraries. I can't be certain, but I don't recall any reference in this thread so far to running ldconfig after copying the new library files.


Yeah, that relates to the ld library path I mentioned, and seems to vary a bit by distribution, as to where the conf files with exports are located. and exact procedure. More just a heads up than anything, in that it does get 'a bit hairy' from there, which is the main reason I injected the executable's origin in the search path for later builds.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1731894 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1732475 - Posted: 7 Oct 2015, 6:28:07 UTC

A few more tidbits regarding strace and responding to question about boinc user account (from Jason):

I run the boinc client and boincmgr as user=gene in /home/gene/BOINC/... with the executables and cuda libs in ../projects/setiathome.berkeley.edu/
For the benchmark attempt I've put the cuda libs in /lib/x86_64-linux-gnu/ and the result is the same as when they were in /usr/lib/. In both cases ldd is happy and there are no run-time messages about missing libraries.

I have the "strace -f -tt -o/tmp/trace ./benchmark" output file, when running the benchmark script as root. It is probably not much help, since it only logs the sys calls and a lot can happen that isn't logged. But for what it might be worth, here is what I see:
(1) all the initialization, including library linking, appears to proceed as expected;
(2) a "boinc_lockfile" is opened;
(3) the (benchmark supplied) init_data.xml file is opened and read;
(4) the work_unit.sah (soft link to the real data file) is opened and read;
(5) the last sys call issued is:
readlink("/proc/12481/exe","/tmp/BENCH/KWSN-Bench-Linux-MBv7"...,128) = 84
...followed immediately by:
--- SIGSEGV {si_signo=SIGSEGV,si_code=MAPERR,si_addr=0} ---

Thinking about that readlink call, maybe it is an effect of the MAPERR and not the cause, even though strace logs them in the order shown. The name of the executable is one of the first things written to the stderr as part of the stack trace info. The immediately preceeding call is an ioctl -> /dev/nvidia0.

(6) everything after that I interpret as a "normal" error exit, unwinding all the threads and writing stack trace to stderr etc.

Meanwhile, x41zc appears a bit better than stock 7.08 so that looks like the way I want to go. (As soon as it is "officially" ready.) I'll go ahead with a couple of larger work units (benchmark) for those two versions and post results here just for anyone's interest.
ID: 1732475 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : benchmark stock vs. optimized -- problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.