MB constantly "Restarting", never finishes


log in

Advanced search

Questions and Answers : Unix/Linux : MB constantly "Restarting", never finishes

Author Message
Porcelain Mouse
Send message
Joined: 17 May 99
Posts: 24
Credit: 19,541,839
RAC: 30,219
United States
Message 1402126 - Posted: 12 Aug 2013, 10:50:34 UTC

The last time I checked closely was in June, but as best I can tell, everything was working fine until 1 August, when I ran out of astropulse work.

I had been using just lunatic's AP binary for a long time and thought I would add MB back into the app_info.xml file so I could do some work while the AP pipeline was refilled. But, I don't think it's working. Tasks I got on the 6th were never reported.

The log is full of "Restarting tasks..." events, but no "Computation...finished" events. And, I noticed than whenever I check the GUI, it noticed that all the running tasks have been running for less than 5 minutes; it should be pretty random. And, I've even caught it restarting all the tasks while I watched, but there aren't any errors or reasons for the restart in the log.

I even reset my project and let the default MB binary get installed...same behavior.

I'm going back to AP-only, but I haven't got any new tasks yet.

Does someone else have this problem? I'm using Fedora 19, and they still haven't updated the boinc-manager package with the latest manager. I don't think that is the problem, but I did just upgrade to F19, so something else may have changed.
____________

Porcelain Mouse
Send message
Joined: 17 May 99
Posts: 24
Credit: 19,541,839
RAC: 30,219
United States
Message 1403118 - Posted: 14 Aug 2013, 17:02:12 UTC - in response to Message 1402126.

So, AP still works fine, like before MB v7 was released, and I'm turning in work units (when AP tasks are available, that is).

But, I cannot run MB correctly. So, no one else is having a problem with v7 MB, just me?
____________

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 1403489 - Posted: 15 Aug 2013, 16:35:48 UTC

Let your cache run down and try a project reset.
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Porcelain Mouse
Send message
Joined: 17 May 99
Posts: 24
Credit: 19,541,839
RAC: 30,219
United States
Message 1407060 - Posted: 23 Aug 2013, 20:15:25 UTC - in response to Message 1403489.

I tried that. I get the same behavior. Do you mean I should try that again? That's okay, but I just want to make sure you know I did that once already.
____________

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 1407378 - Posted: 24 Aug 2013, 18:21:05 UTC

From the looks of it you are trying to use the optimized apps. You'll need to make sure the installed files have the proper permissions for them to work
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Porcelain Mouse
Send message
Joined: 17 May 99
Posts: 24
Credit: 19,541,839
RAC: 30,219
United States
Message 1407566 - Posted: 25 Aug 2013, 9:10:04 UTC - in response to Message 1407378.

Hmm, I've been using the optimized apps for a long time, and it's been working very well. If you are talking about file permissions, I checked and I don't see anything wrong; I'm not sure how that could have changed, either.

Besides, that cannot be the case with the official v7 that I got after resetting the project.

With no AP work units and the MB app broken, my systems are completely useless.

(Speaking about no AP work, I checked the message boards and I don't see any threads obviously related to it. Does anyone know how long it might be until the pipeline is full and sustained again? It seems likes the shortage started 5 or 6 weeks ago.)
____________

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8401
Credit: 56,889,514
RAC: 77,931
United Kingdom
Message 1407590 - Posted: 25 Aug 2013, 12:15:30 UTC

Answering your second question.
APs are produced when the tapes are freshly loaded, they split far faster the MBs, and fewer of them can be split from a tape (they are bigger and have less overlap than MBs).

A thought on your first - you are trying to use the correct version of the Lunatics offering for your processor - the correct level of ss* etc. - get it wrong and nothing will work (I know, I've got that tee-shirt).
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Porcelain Mouse
Send message
Joined: 17 May 99
Posts: 24
Credit: 19,541,839
RAC: 30,219
United States
Message 1407652 - Posted: 25 Aug 2013, 18:36:28 UTC - in response to Message 1407590.

Thanks for the information. That's good, I just wanted to make sure I didn't missing some announcement about AP work being absent for some period of time.

Regarding Lunatic apps, I checked again because I updated to v7 a while ago and I certainly could have made a mistake. For AP app, I'm sure I have one that works since I installed that in January, according to timestamps, and it continues to work fine (when I can get work).

For MB, AFAIK, I have the right one, too. The latest Lunatics MB for my platform is r1848 compiled in June for Linux 64 w/ sse3, CPU only. My system claims sse sse2 sse41 sse42 sse4a and avx, plus Wikipedia also says sse3 is supported, so I think the sse3 version should be correct. I hesitate only because the Lunatics AP app I use uses avx and I couldn't find any avx or sse4 version of MB from them. I suppose I could try the sse2 version; it should also work.

But, I'm still concerned by the fact that after the reset, it pulled down the "official" MBv7 which had the same problem. It makes me think there is some weird edge case my system is triggering in both official and Lunatic apps. Does anyone know why a running work unit would just be restarted in the middle of processing. That's just strange, right? If it's not an error, and it's not reporting an error, then what code path would restart a work unit?
____________

Profile tullioProject donor
Send message
Joined: 9 Apr 04
Posts: 3721
Credit: 383,087
RAC: 599
Italy
Message 1407660 - Posted: 25 Aug 2013, 19:07:07 UTC
Last modified: 25 Aug 2013, 19:08:09 UTC

I am running stock MBs on my HP laptop and Lunatics AP on my SUN WS, vintage 2007, which has been running 24/7 since January 2008. I only upgraded its RAM to 8 GB and also its disks. The main disk is a 500 GB Seagate Barracuda and the second disk is an OCZ SSD at 120 GB which I use also as swap disk. My Linux is SuSE.
Tullio
____________

Porcelain Mouse
Send message
Joined: 17 May 99
Posts: 24
Credit: 19,541,839
RAC: 30,219
United States
Message 1408247 - Posted: 27 Aug 2013, 6:09:02 UTC - in response to Message 1407652.

I tried another reset w/o any app_info.xml file to get back to the normal situation and, again, it downloaded a new MB app, but had the same symptom as before.

I also tried again with Lunatics MBv7 sse3: same symptoms.

I also tried a different Lunatics MBv7 app, sse2: same symptoms.

It must be something wrong with my platform. I recently upgraded to Fedora 19. Is anyone else having problems with F19 and SETI@Home?

____________

Questions and Answers : Unix/Linux : MB constantly "Restarting", never finishes

Copyright © 2014 University of California