Long compute times after moving from Debian 7 to 9

Questions and Answers : Unix/Linux : Long compute times after moving from Debian 7 to 9
Message board moderation

To post messages, you must log in.

AuthorMessage
Matt Roberds

Send message
Joined: 16 Jun 99
Posts: 7
Credit: 1,455,455
RAC: 1
United States
Message 1911930 - Posted: 9 Jan 2018, 1:19:54 UTC

Hello all!

I recently upgraded the OS on a computer that I run Seti@Home on. I didn't change any of the hardware in the machine during the upgrade. Since the upgrade, work units seem to be taking much longer (6 times or more) to complete. Will cleaning out something (what?) under /var/lib/boinc and retrying help this situation any?

Old situation:
Debian 7 "wheezy", kernel 3.2.0
BOINC 7.0.27
Seti@Home 8.00 and 8.05 (there are binaries for both under my old /var/lib/boinc-client )
AMD Athlon 64 x2, 3.2 GHz, dual core, 4 GB RAM
No GPU computing

New situation:
Debian 9 "stretch", kernel 4.9.0
BOINC 7.6.33
Seti@Home 8.00 and 8.05 (as reported by boincmgr for different work units, plus I have binaries for both)
AMD Athlon 64 x2, 3.2 GHz, dual core, 4 GB RAM
No GPU computing

I didn't do a direct OS upgrade; I basically did a clean install of Debian 9. I kept my old root partition around for reference, but I'm not using any of the files in it.

The previous situation, on Debian 7, was that most workunits took maybe 3 to 4 hours to complete. I let Seti@Home fully use one core of the dual-core machine.

The new situation, on Debian 9, for two workunits so far: when the work unit starts, the "Progress" field shown by boincmgr starts counting up from 0.000% at a reasonable rate, such that I would expect the work unit to complete in about the same amount of time as previously - 3 to 4 hours. However, as the progress percentage increases, it slows down - a lot. When I first noticed this, I thought the task had gotten stuck somehow, but then I just let it run for a while, and saw that it was taking half an hour or more to gain 0.001 on the progress percentage. I tried restarting boinc-client (as root, /etc/init.d/boinc-client restart), but that didn't seem to help.

The first task I saw this on was, I think, a Breakthrough Listen one (the name started with blc04_2bit_guppi). I let it run until it reported 100.000% complete in boincmgr, and then let it keep running for a few more hours. When it finally got to something like 2.5 *days* elapsed time, as reported by boincmgr, I aborted the task.

The second task is running right now and is, I think, a "regular" task (name starts with 20fe07ag). When it started, the progress as reported by boincmgr increased at a good rate. However, it has slowed down as well - it's now into the 99% range and just over 19.5 hours elapsed time, both as reported by boincmgr. I am letting it run for now.

Both of those tasks were/are running under 8.05, per boincmgr. I have some other tasks that are ready to start that boincmgr says will run under 8.00.

I know the "remaining time" is always an estimate. I feel the "elapsed time" values are pretty close to right, based on when I've watched the workunits start. I *thought* "progress" was always an absolute (either you've processed 50% of the input file or you haven't), but reading some older threads here is making me doubt that assumption.

It seems weird that going to a new version of BOINC, but with the same versions of Seti@Home, would cause such a big change in processing time for each work unit.

It sort of feels like something a "clear cache and restart" would fix, but I'm not sure how much to blow away and how much to keep.

Thanks!

Matt Roberds
ID: 1911930 · Report as offensive
Matt Roberds

Send message
Joined: 16 Jun 99
Posts: 7
Credit: 1,455,455
RAC: 1
United States
Message 1912303 - Posted: 11 Jan 2018, 5:16:55 UTC - in response to Message 1911930.  

I tried a few things, but haven't been able to get any improvement yet.

I stopped boinc-client, blew away the contents of /var/lib/boinc-client/slots/0 , and restarted boinc-client. It gave me a new work unit to process with Seti@Home 8.05, but it had the same problem - it seems to asymptotically approach 100%, but never actually get there. I suspended the work unit after it took 10.5 hours to get to 93%. (On the exact same hardware, it previously only took 3 to 4 hours to complete a work unit.)

I manually suspended the 8.05 workunits in my queue, and let an 8.00 workunit run, to see if it would have the same problem, and it does. At about 1 hour of elapsed time, it was at about 50% progress, but now it's at about 6.5 hours of elapsed time, and it's at 97.9% progress. This is an improvement over the 8.05 workunit, but it's not close to the original performance.

My next action is probably to grab some work for the "backup" project I compute for (TheSkyNet POGS), and see if it has similar trouble completing work units.
ID: 1912303 · Report as offensive
Matt Roberds

Send message
Joined: 16 Jun 99
Posts: 7
Credit: 1,455,455
RAC: 1
United States
Message 1913151 - Posted: 15 Jan 2018, 9:29:36 UTC - in response to Message 1912303.  

I grabbed a few TheSkyNet POGS work units, and let them crunch. They seem to work OK, and complete in a reasonable amount of time, compared to how they worked on my old OS install.

I thought I might discover that some library on my new install is somehow tripping up Seti@Home, but the binaries appear to be statically linked. setigraphics is dynamic and links to a bunch of X libraries, among others, but I'm not running setigraphics - just the plain setiathome executables.

The two binaries I have are setiathome_8.00_x86_64-pc-linux-gnu and setiathome_8.05_i686-pc-linux-gnu - maybe a little suspicious that one is "x86_64" and the other is "i686", but I had the same trouble with work units that ran under both 8.00 and 8.05.

I'm starting to run out of ideas.
ID: 1913151 · Report as offensive
Matt Roberds

Send message
Joined: 16 Jun 99
Posts: 7
Credit: 1,455,455
RAC: 1
United States
Message 1915392 - Posted: 27 Jan 2018, 2:34:02 UTC - in response to Message 1913151.  

So I have two updates to report.

tl;dr: I let one work unit run interrupted; it got to 3 days and then aborted itself due to exceeding the elapsed time limit. I also set up a Debian 7 guest in a VM, on the same host hardware, and that VM can process workunits just fine.

The long version:

1. I let one work unit run all the way through, on my Debian 9 system, without ever manually suspending it, or restarting boinc-client. With previous work units, I had sometimes done either or both of the above, and I didn't know if that might have affected the processing.

It was a "normal" work unit... it didn't have "blc" at the beginning or "vlar" at the end. It exhibited the same behavior as previously - at first the "progress" percentage increased at a reasonable rate along with the elapsed time, but as it ran, it took more and more elapsed time to get even an 0.001 percentage point increase in progress. The progress did eventually make it to 100.00%, but the work unit kept computing. Finally, after a touch over 3 days, it aborted due to exceeding the elapsed time limit:

Aborting task 09mr07ae.4540.19295.8.35.76_1: exceeded elapsed time limit 264331.20 (3700655.74G/14.00G)

2. Using Qemu, I set up a Debian 7 guest machine on my Debian 9 system. The guest machine sees an AMD64 CPU, but an earlier version than the physical AMD64 CPU on the host. The guest also sees less RAM than the host has. I installed the boinc-client and boincmgr supplied by Debian, and connected them to my account. It downloaded the setiathome 8.05 binaries, and then some workunits - so far mostly "blc" ones.

Running inside the emulator, Seti@Home has finished 5 tasks so far, each one taking roughly 3 hours. This is roughly inline with what Seti@Home did when it was running under Debian 7 directly - not in the emulator. (I realize the emulation will slow things down a little.) So, hooray for getting work done, but boo for having to run it under emulation for that to happen. For completeness, the specs on the emulated system are: Debian 7 "wheezy", kernel 3.2.0; BOINC 7.0.27; Seti@Home 8.05, AMD64, 3.2 GHz, single core, 384 MB RAM.

Conclusion: Something about the combination of Seti@Home 8.05 and Debian 9 doesn't work correctly. I don't know if Seti@Home's time estimates are not working right, or something about the way Debian builds the Linux kernel is odd, or something else entirely is happening.
ID: 1915392 · Report as offensive
Matt Roberds

Send message
Joined: 16 Jun 99
Posts: 7
Credit: 1,455,455
RAC: 1
United States
Message 1921830 - Posted: 1 Mar 2018, 5:16:52 UTC - in response to Message 1915392.  

Last update.

I've been running Seti@Home under Debian 7 in the Qemu virtual machine I described in my previous post for about a month now, and it seems to work fine. It runs both Seti@Home 8.00 and 8.05. I wish it worked natively on Debian 9, but using the virtual machine is probably as close as I'm going to get. I plan to run it in the VM from now on.

I initially had X installed on the virtual machine so boincmgr would work. I then downloaded and compiled boinctui 2.5.0 (a text-mode GUI for Boinc) and verified that it could report BOINC's status and control it. After that, I uninstalled X, so I could run the virtual machine with less RAM. I have it turned down to 96 MB, and it has about 12 MB free when Seti@Home is running; I haven't tried to trim it down any further than that.

A couple of notes on boinctui:

Press F9 to get into the menus. This isn't documented anywhere but the source.

The build system seems a bit broken, at least as shipped with 2.5.0. It tries to run autotools but doesn't succeed. What I eventually had to do to get it to build was to run autoconf myself:

autoconf configure.in >my-conf.sh
chmod 755 my-conf.sh
./my-conf.sh --prefix=/usr/local
make
su
make install


I hope this helps!
ID: 1921830 · Report as offensive

Questions and Answers : Unix/Linux : Long compute times after moving from Debian 7 to 9


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.