Restarting message.


log in

Advanced search

Questions and Answers : Wish list : Restarting message.

Author Message
Buzz
Send message
Joined: 4 Jan 08
Posts: 6
Credit: 125
RAC: 0
Australia
Message 698804 - Posted: 10 Jan 2008, 5:30:57 UTC

I am getting a message about restarting a particular work unit as follows

SETI@home|Restarting task 02mr07ac.27614.4162.9.6.37_1 using setiathome_enhanced version 527

This has been occurring for several days now (I don't check the messages very often) with a frequency of 10 - 60 minutes (it varies). There are about 120 of them so far for the same work unit.

The percentage complete for this work unit never seems to get above about 12% and drops back to 0 each time the message is issued.

I do not know whether this indicates a problem or not.

Does it?

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12466
Credit: 2,690,600
RAC: 1,166
Netherlands
Message 698890 - Posted: 10 Jan 2008, 10:22:22 UTC

It doesn't. It tells you that the Seti application is doing its hourly switch with any other project's application. Only those projects which tasks run for under the amount of 'switch every x minutes' and those running through a wrapper, will not show this behaviour as they'll run in one sweep.

Applications that checkpoint and halt at about the switch time, show this message when they restart. The restarting isn't done from zero percent, it's done from wherever you were in the checkpoint.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Buzz
Send message
Joined: 4 Jan 08
Posts: 6
Credit: 125
RAC: 0
Australia
Message 698918 - Posted: 10 Jan 2008, 16:02:01 UTC - in response to Message 698890.

Ok. Thanks again for your help!

Cheers

Richard Haynes
Volunteer tester
Send message
Joined: 21 Jan 08
Posts: 1
Credit: 37,246
RAC: 0
United Kingdom
Message 709314 - Posted: 7 Feb 2008, 16:57:15 UTC

I would like to question that theory. I do not have that option set yet one of my units keeps switching as described by the original poster. I have three units in my tasks list yet only one keeps switching back to zero, not any other percentage, just zero. Im getting extremely fustrated as its now spent 5 hours and never reached above 7%. Yet in the same time the other two have reached around 40% and only one of them runs at a time. If this is an issue for multiple users then the team needs to have a look at the units as there is clearly a problem. If anyone can help, contact me at hainus1@hotmail.com Thanks

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12466
Credit: 2,690,600
RAC: 1,166
Netherlands
Message 709320 - Posted: 7 Feb 2008, 17:30:30 UTC - in response to Message 709314.

Just abort the task that is returning to zero percent. It'll probably be something in the task that doesn't like to be checkpointed or otherwise do what it's supposed to.

As for emailing... never put your email address in a legible form on the forums. The place here is crawling with spam bots trying to find things like that.

____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12466
Credit: 2,690,600
RAC: 1,166
Netherlands
Message 712156 - Posted: 13 Feb 2008, 18:32:12 UTC

Something has come up on this. If you still have the problem, do you use BOINC's CPU throttling?
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Thanar
Avatar
Send message
Joined: 14 May 99
Posts: 47
Credit: 1,518,626
RAC: 1,684
Greece
Message 712846 - Posted: 15 Feb 2008, 9:18:34 UTC

Ageless,

I've been using CPU throttling almost for ever (at least on my laptops) and it's been more than 6 months now that it gives me no issues whatsoever. What has come up in regards to restarting?

The only issue I've had in regards to tasks restating is connected to the heardbeat mechanism BCC is using, especially when it comes to certain sync-DNS-lookups (this has also been fixed with the latest betas).
____________

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12466
Credit: 2,690,600
RAC: 1,166
Netherlands
Message 713043 - Posted: 15 Feb 2008, 18:28:01 UTC - in response to Message 712846.
Last modified: 15 Feb 2008, 18:30:31 UTC

I've been using CPU throttling almost for ever (at least on my laptops) and it's been more than 6 months now that it gives me no issues whatsoever. What has come up in regards to restarting?

When using CPU throttling, tasks would tend to restart. Especially on systems that were only attached to one project.

A fix should be in 5.10.42, which can be gotten from http://boinc.berkeley.edu/download_all.php
Anyone still reading and having the problem, please try it out and report back.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Thanar
Avatar
Send message
Joined: 14 May 99
Posts: 47
Credit: 1,518,626
RAC: 1,684
Greece
Message 713099 - Posted: 15 Feb 2008, 19:51:08 UTC - in response to Message 713043.

When using CPU throttling, tasks would tend to restart. Especially on systems that were only attached to one project.


Nope, never had that problem, at least on the throttled machines, all running OSX.

As I said earlier, I used to have a restarting issue (exit with zero status), but that was due to sync-DNS-lookups forcing hartbeat to fail and it's another problem.

5.10.42 brought async-DNS-lookups from what I can see, thus eliminating restarting tasks completely for me.

What the other people are talking about is stubborn WUs which occasionally get through the checkup servers, that refuse to get over a CERTAIN % no matter what.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 530,050
RAC: 324
United States
Message 713654 - Posted: 16 Feb 2008, 15:33:10 UTC

Also about the time of 5.10.42, there was a fix to directory scans in BOINC. Instead of opening each and every file to get the file size, the information is retrieved from the directory. This cut the time to find the total size of a directory from > 120 seocnds to < 1 second on one of my computers. This 120 seconds was longer than the heartbeat so every 2 to 3 minutes there would be a 2 minute pause - wihch restarted the task that was running.
____________


BOINC WIKI

Profile Thanar
Avatar
Send message
Joined: 14 May 99
Posts: 47
Credit: 1,518,626
RAC: 1,684
Greece
Message 713779 - Posted: 16 Feb 2008, 18:16:50 UTC

I guess you have hundreds of tasks waiting to run... Wow ((c)Neo)...
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 530,050
RAC: 324
United States
Message 713786 - Posted: 16 Feb 2008, 18:23:00 UTC - in response to Message 713779.

I guess you have hundreds of tasks waiting to run... Wow ((c)Neo)...

Actually, no. It was one project that insisnted on having a few thousand files per task...
____________


BOINC WIKI

Questions and Answers : Wish list : Restarting message.

Copyright © 2014 University of California