Long CPU times


log in

Advanced search

Message boards : Number crunching : Long CPU times

Author Message
Profile Bill GProject donor
Avatar
Send message
Joined: 1 Jun 01
Posts: 347
Credit: 42,015,329
RAC: 70,923
United States
Message 1468132 - Posted: 24 Jan 2014, 11:30:21 UTC
Last modified: 24 Jan 2014, 11:30:56 UTC

Just noticed these rather long CPU times on a computer that errored out against one of my WUs. I hope the operator is able to fix this soon. I sent an email and hope he will see it and asked him to come here.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=2381109
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8551
Credit: 50,450,983
RAC: 51,283
United Kingdom
Message 1468138 - Posted: 24 Jan 2014, 11:43:48 UTC - in response to Message 1468132.

LOL - long indeed, about 600 years.

The interesting thing is that he's running BOINC v5.10.45, which doesn't report elapsed time and CPU time separately. If you look at his pending and valid tasks, you'll see that the server has recorded identical times in both columns - I think that's a server workround for credit purposes.

It's just possible that this may be outside the user's control, though a reboot or other TLC is never a bad idea. On the other hand, if it is a server problem, I don't see how it could have happened, or what we could do about it.

Profile WilliamProject donor
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1602
Credit: 9,469,424
RAC: 265
Message 1468147 - Posted: 24 Jan 2014, 12:13:12 UTC

The errors are -177 runtime exceeded aborts - yes I'd abort that too if it had been running for 600 years!

So I reckon something local on the host makes it go to unreasonable CPU times. Boinc sees an unreasonable CPU time and aborts the task with -177.
Task is returned to server, server makes sanity check on CPU time - can't be larger than time received - time sent. Run time (for credit purposes) is set to the maximal possible time i.e. time received - time sent.
That explains the discrepancy between the fields (the host has a few good tasks - and a pending one with an unreasonable run time.
Looking through the errors that host seems to have been having problams with that for quite some time.

So, what on earth might lead to a host suddenly getting an insane CPU time field? Memory glitch?
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

Message boards : Number crunching : Long CPU times

Copyright © 2014 University of California