A periodic System Integrity test besides benchmark test possible to implement?

Questions and Answers : Wish list : A periodic System Integrity test besides benchmark test possible to implement?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 27718 - Posted: 18 Sep 2004, 15:14:25 UTC
Last modified: 18 Sep 2004, 15:16:26 UTC

Hi :)

I was just wondering, since with the beginning migration to BOINC, I now also run some long-range Projects like CPDN.

And with the additional time-sharing among Projects, a single Workunit, be it LHC, SETI or CPDN, is significantly longer exposed to its running System/Host than ever before (e.g. when still running dedicated, non-BOINC Project's Clients)

Since absolute stability and correctness of computation is paramount, and not all Projects offer the Option of detailed Result monitoring (which, when failing to validate would be the only indication of a possible Problem), I was wondering if a frequent stability Test could be implemented.

For example, together with the benchmarks that periodically is repeated.
(something like the Prime95 DC Client's Self-Test)

Would give me a warm, fuzzy feeling if the BOINC Client on each machine could actually tell me once an error is detected, and then stops until the Problem is corrected by the User.
(and for example has the Option of either completely restarting a Workunit or stepping back to last known Checkpoint/Saved state or alike to improve chances of returning a correct result)

Just Ideas, but something in that direction would be really nice to have, especially with e.g. SETI currently not allowing to check if finished Results have validated okay.
(with CPDN, it seems even more critical, since a single Model could easily run for several months, and each seems a unique Model with noone to compare to)

I'd hate to waste a month worth of work, just because a Stick of RAM turned flakey, or anything else in one of the Systems produces spurious Bit errors :(

Of course, such a System self-test would be just a quick check-up, but applied periodically together with the Benchmark, at least common errors (which might not freeze/crash the machine) can be successfully detected within one Minute.
(the Prime95 Self-Test is the best example I can think of, other examples might exist)
___________________________________________
<p>Scientific Network : 36200 MHz «» 8204 MB «» 815.0 GB </p>
ID: 27718 · Report as offensive

Questions and Answers : Wish list : A periodic System Integrity test besides benchmark test possible to implement?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.