Long CPU times

Message boards : Number crunching : Long CPU times
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1468132 - Posted: 24 Jan 2014, 11:30:21 UTC
Last modified: 24 Jan 2014, 11:30:56 UTC

Just noticed these rather long CPU times on a computer that errored out against one of my WUs. I hope the operator is able to fix this soon. I sent an email and hope he will see it and asked him to come here.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=2381109

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1468132 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1468138 - Posted: 24 Jan 2014, 11:43:48 UTC - in response to Message 1468132.  

LOL - long indeed, about 600 years.

The interesting thing is that he's running BOINC v5.10.45, which doesn't report elapsed time and CPU time separately. If you look at his pending and valid tasks, you'll see that the server has recorded identical times in both columns - I think that's a server workround for credit purposes.

It's just possible that this may be outside the user's control, though a reboot or other TLC is never a bad idea. On the other hand, if it is a server problem, I don't see how it could have happened, or what we could do about it.
ID: 1468138 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1468147 - Posted: 24 Jan 2014, 12:13:12 UTC

The errors are -177 runtime exceeded aborts - yes I'd abort that too if it had been running for 600 years!

So I reckon something local on the host makes it go to unreasonable CPU times. Boinc sees an unreasonable CPU time and aborts the task with -177.
Task is returned to server, server makes sanity check on CPU time - can't be larger than time received - time sent. Run time (for credit purposes) is set to the maximal possible time i.e. time received - time sent.
That explains the discrepancy between the fields (the host has a few good tasks - and a pending one with an unreasonable run time.
Looking through the errors that host seems to have been having problams with that for quite some time.

So, what on earth might lead to a host suddenly getting an insane CPU time field? Memory glitch?
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1468147 · Report as offensive

Message boards : Number crunching : Long CPU times


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.