Clock cycles gone Wild!

Message boards : Number crunching : Clock cycles gone Wild!
Message board moderation

To post messages, you must log in.

AuthorMessage
Dave Mickey

Send message
Joined: 19 Oct 99
Posts: 178
Credit: 11,122,965
RAC: 0
United States
Message 219741 - Posted: 22 Dec 2005, 13:33:57 UTC

This host

http://setiathome.berkeley.edu/show_host_detail.php?hostid=327482

did something interesting, as part of the fallout of the most recent Big Outage:

It had done this workunit

http://setiathome.berkeley.edu/workunit.php?wuid=40716138

which generated a -6 error (which I saw mention of in some other threads as taking place). Looks like everybody else got the same thing on that WU. Something about a bad header.

I don't really ever notice that I have error-ed out units except for noisy(9) ones, but, OK, fine. I wasn't the only one.

Then, it was working on it's next one:

http://setiathome.berkeley.edu/workunit.php?wuid=40860511

when I noticed it. It was taking a long time - in fact it was up to about 2X typical time. And, it had only started working this WU 15 hours ago, but had accumulated 30 hours of CPU time.

Then, I also noticed that this host had begun requesting downloads of more WU's a good bit more frequently than it should (in the time it returned one, it downloaded about 5 more. So, I looked at it a bit, using BoincView. Have BV set to query hosts every 1 minute. Each minute, this host reported having consumed 2 minutes of CPU time (in the CPU Time column of the Work tab).

This was driving the work buffer (from BV Hosts tab) down so quick it kept asking for more work, approaching too-much-for-deadline territory. But, I decided to let it finish that one and see what goes. It did, and started the next WU. But this WU also was thinking that it was running at double speed for twice as long. CPU time reported was 2X real time.

Also, the report of the 2X unit drove my "efficiency" on the web site to something like 1.5. (does the web site stat allow that on purpose for multi-proc or HT units, or should it limit itself to 1.0 max?)

Anyway, enough was enough. So I rebooted the miscreant, and my time warp came to an end - CPU time = real time. This stopped my work buffer drain, and restored normalcy. Except that it had doubled the "completion time" estimate of all the units on hand. But no big deal, cuz they are completing at normal rates in the real-time world.

This is all on BV 1.2.2. (no BOINCmgr running), BOINC 5.2.13, TMR maniacally optimized app, and Win98. My 5.2.13 is fairly new (from 5.2.5), but then I've never seen a -6, either. I have to suspect it's related to the Win98, just cuz I've really come to hate the MS consumer products, compared to the pro stuff (NT4, 2Kpro). But who knows!?

But maybe you want to watch out for those -6 error units....

Dave
ID: 219741 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 219987 - Posted: 22 Dec 2005, 23:50:51 UTC - in response to Message 219741.  

Similar, but not identical, observation here.

Both my Windows 98 machines are running BOINC 5.2.13, 50:50 on Seti and Einstein. I'm watching them with BV 1.3 beta 2, still on default 5-second refresh.

Occasionally, all the timing figures will double up as Dave describes - but usually only for 10 seconds or so, and then they flick back to normal. I haven't had a bad workunit on either machine, so in my case at least it isn't linked to the outage. I haven't noticed it enough to work out any pattern: it just seems to be random.

But each of the two machines has reported one Eistein WU today, and both went into time-warp mode just as they finished crunching: they reported double the normal time for the result, and claimed double credit. (They haven't cheated on a Seti unit yet, so I feel I can own up here!) Subsequent units have double the estimated completion time, but seem to be crunching at normal speed (without a reboot).

I also have a Win 2000 and a Win XP Home box: I haven't noticed any time-warping on either of these. Yet.

Richard
ID: 219987 · Report as offensive
Profile Tern
Volunteer tester
Avatar

Send message
Joined: 4 Dec 03
Posts: 1122
Credit: 13,376,822
RAC: 44
United States
Message 220055 - Posted: 23 Dec 2005, 1:33:50 UTC
Last modified: 23 Dec 2005, 1:48:38 UTC

Win95, Win98, and WinME have no idea what "CPU time" is. They fake it. This is why they aren't even _supported_ on Rosetta, which cares a lot more about CPU time than SETI does.

Generally you get the other case, where it decides that the last 2 hours it spent on a result was actually 0 seconds, so you should claim 0 credit. This is the first I've heard of it trying to make up for those 0 credit claims by doubling a new one...

EDIT:: Wasn't this thread made into a movie, or a TV show, or something? Something about Mardi Gras? Or was that something else...
ID: 220055 · Report as offensive
Dave Mickey

Send message
Joined: 19 Oct 99
Posts: 178
Credit: 11,122,965
RAC: 0
United States
Message 220143 - Posted: 23 Dec 2005, 3:39:32 UTC

Seems that whatever a -6 error does to the Boinc client
(or maybe the science app?) does a left bit shift on something
in Win98. Although note the system clock remained correct.
I really wish there was a real TaskMgr app for 98.
Anyone know of such a thing?

(Turning to the judge...)
Your honor, as to Mr. Michaels question, I have no knowledge
of such movies, video, or of anything transpiring in New Orleans.
I have never been there.

And if I did, I assert my 5th Amendment privilege protection
against self incrimination at this time.

Dave
ID: 220143 · Report as offensive
Profile Steve Cressman
Volunteer tester
Avatar

Send message
Joined: 6 Jun 02
Posts: 583
Credit: 65,644
RAC: 0
Canada
Message 220706 - Posted: 24 Dec 2005, 3:47:32 UTC - in response to Message 220143.  

Seems that whatever a -6 error does to the Boinc client
(or maybe the science app?) does a left bit shift on something
in Win98. Although note the system clock remained correct.
I really wish there was a real TaskMgr app for 98.
Anyone know of such a thing?

Dave


I use Process Explorer from http://www.sysinternals.com/, it gives you all the info you need.
98SE XP2500+ @ 2.1 GHz Boinc v5.8.8

And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer.
ID: 220706 · Report as offensive

Message boards : Number crunching : Clock cycles gone Wild!


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.