Restarting message.

Questions and Answers : Wish list : Restarting message.
Message board moderation

To post messages, you must log in.

AuthorMessage
Buzz

Send message
Joined: 4 Jan 08
Posts: 6
Credit: 125
RAC: 0
Australia
Message 698804 - Posted: 10 Jan 2008, 5:30:57 UTC

I am getting a message about restarting a particular work unit as follows

SETI@home|Restarting task 02mr07ac.27614.4162.9.6.37_1 using setiathome_enhanced version 527

This has been occurring for several days now (I don't check the messages very often) with a frequency of 10 - 60 minutes (it varies). There are about 120 of them so far for the same work unit.

The percentage complete for this work unit never seems to get above about 12% and drops back to 0 each time the message is issued.

I do not know whether this indicates a problem or not.

Does it?
ID: 698804 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 698890 - Posted: 10 Jan 2008, 10:22:22 UTC

It doesn't. It tells you that the Seti application is doing its hourly switch with any other project's application. Only those projects which tasks run for under the amount of 'switch every x minutes' and those running through a wrapper, will not show this behaviour as they'll run in one sweep.

Applications that checkpoint and halt at about the switch time, show this message when they restart. The restarting isn't done from zero percent, it's done from wherever you were in the checkpoint.
ID: 698890 · Report as offensive
Buzz

Send message
Joined: 4 Jan 08
Posts: 6
Credit: 125
RAC: 0
Australia
Message 698918 - Posted: 10 Jan 2008, 16:02:01 UTC - in response to Message 698890.  

Ok. Thanks again for your help!

Cheers
ID: 698918 · Report as offensive
Richard Haynes
Volunteer tester

Send message
Joined: 21 Jan 08
Posts: 1
Credit: 146,742
RAC: 0
United Kingdom
Message 709314 - Posted: 7 Feb 2008, 16:57:15 UTC

I would like to question that theory. I do not have that option set yet one of my units keeps switching as described by the original poster. I have three units in my tasks list yet only one keeps switching back to zero, not any other percentage, just zero. Im getting extremely fustrated as its now spent 5 hours and never reached above 7%. Yet in the same time the other two have reached around 40% and only one of them runs at a time. If this is an issue for multiple users then the team needs to have a look at the units as there is clearly a problem. If anyone can help, contact me at hainus1@hotmail.com Thanks
ID: 709314 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 709320 - Posted: 7 Feb 2008, 17:30:30 UTC - in response to Message 709314.  

Just abort the task that is returning to zero percent. It'll probably be something in the task that doesn't like to be checkpointed or otherwise do what it's supposed to.

As for emailing... never put your email address in a legible form on the forums. The place here is crawling with spam bots trying to find things like that.

ID: 709320 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 712156 - Posted: 13 Feb 2008, 18:32:12 UTC

Something has come up on this. If you still have the problem, do you use BOINC's CPU throttling?
ID: 712156 · Report as offensive
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 712846 - Posted: 15 Feb 2008, 9:18:34 UTC

Ageless,

I've been using CPU throttling almost for ever (at least on my laptops) and it's been more than 6 months now that it gives me no issues whatsoever. What has come up in regards to restarting?

The only issue I've had in regards to tasks restating is connected to the heardbeat mechanism BCC is using, especially when it comes to certain sync-DNS-lookups (this has also been fixed with the latest betas).
ID: 712846 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 713043 - Posted: 15 Feb 2008, 18:28:01 UTC - in response to Message 712846.  
Last modified: 15 Feb 2008, 18:30:31 UTC

I've been using CPU throttling almost for ever (at least on my laptops) and it's been more than 6 months now that it gives me no issues whatsoever. What has come up in regards to restarting?

When using CPU throttling, tasks would tend to restart. Especially on systems that were only attached to one project.

A fix should be in 5.10.42, which can be gotten from http://boinc.berkeley.edu/download_all.php
Anyone still reading and having the problem, please try it out and report back.
ID: 713043 · Report as offensive
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 713099 - Posted: 15 Feb 2008, 19:51:08 UTC - in response to Message 713043.  

When using CPU throttling, tasks would tend to restart. Especially on systems that were only attached to one project.


Nope, never had that problem, at least on the throttled machines, all running OSX.

As I said earlier, I used to have a restarting issue (exit with zero status), but that was due to sync-DNS-lookups forcing hartbeat to fail and it's another problem.

5.10.42 brought async-DNS-lookups from what I can see, thus eliminating restarting tasks completely for me.

What the other people are talking about is stubborn WUs which occasionally get through the checkup servers, that refuse to get over a CERTAIN % no matter what.
ID: 713099 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 713654 - Posted: 16 Feb 2008, 15:33:10 UTC

Also about the time of 5.10.42, there was a fix to directory scans in BOINC. Instead of opening each and every file to get the file size, the information is retrieved from the directory. This cut the time to find the total size of a directory from > 120 seocnds to < 1 second on one of my computers. This 120 seconds was longer than the heartbeat so every 2 to 3 minutes there would be a 2 minute pause - wihch restarted the task that was running.


BOINC WIKI
ID: 713654 · Report as offensive
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 713779 - Posted: 16 Feb 2008, 18:16:50 UTC

I guess you have hundreds of tasks waiting to run... Wow ((c)Neo)...
ID: 713779 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 713786 - Posted: 16 Feb 2008, 18:23:00 UTC - in response to Message 713779.  

I guess you have hundreds of tasks waiting to run... Wow ((c)Neo)...

Actually, no. It was one project that insisnted on having a few thousand files per task...


BOINC WIKI
ID: 713786 · Report as offensive

Questions and Answers : Wish list : Restarting message.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.