Everybody's Wish (fer-sure!)

Questions and Answers : Wish list : Everybody's Wish (fer-sure!)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Steven Douglas Huddleston
Avatar

Send message
Joined: 25 Jun 99
Posts: 7
Credit: 1,271,414
RAC: 0
Puerto Rico
Message 158958 - Posted: 29 Aug 2005, 3:18:01 UTC
Last modified: 29 Aug 2005, 3:33:29 UTC

Oh how nice it would be to have a cute little button you can click on and have all of your completed Work Units restored to the client_state.xml file after a reset! After all the work is there in the project directory, each with its associated result file. Losing all of that time because of some BOINC freak-out seems just plain evil to me.

Isn't there a clever coder out there who can make this right?

Here's what happened:

StartServiceCtrlDispatcher being called.
This may take several seconds. Please wait.
2005-08-28 09:27:45 [---] Starting BOINC client version 4.45 for windows_intelx86
2005-08-28 09:27:45 [---] Executing as a daemon
2005-08-28 09:27:45 [---] Data directory: C:Program FilesBOINC
2005-08-28 09:27:45 [---] BOINC is running as a service and as a non-system user.
2005-08-28 09:27:45 [---] No application graphics will be available.
2005-08-28 09:27:45 [---] Can't parse file info in state file
2005-08-28 09:27:45 [---] State file has different major version (0.00); resetting projects
2005-08-28 09:27:45 [SETI@home] Resetting project
2005-08-28 09:27:45 [---] request_reschedule_cpus: exit_tasks
2005-08-28 09:27:45 [SETI@home] PERS_FILE_XFER_SET::remove(): not found
2005-08-28 09:27:45 [SETI@home] PERS_FILE_XFER_SET::remove(): not found
2005-08-28 09:27:45 [SETI@home] PERS_FILE_XFER_SET::remove(): not found
2005-08-28 09:27:45 [SETI@home] PERS_FILE_XFER_SET::remove(): not found
2005-08-28 09:27:45 [SETI@home] PERS_FILE_XFER_SET::remove(): not found
2005-08-28 09:27:45 [SETI@home] PERS_FILE_XFER_SET::remove(): not found
2005-08-28 09:27:45 [SETI@home] PERS_FILE_XFER_SET::remove(): not found

...etc...

The work of a whole week (on a p4 HT going at 3.00 GHz, 2 CPUs), gone in an instant!

(Sob!) Boo hoo hooooooo!

Mafú
"Experience is the ocean you cross to get from knowledge to truth."
The Kongaloid Website
ID: 158958 · Report as offensive
Ariane Von WolfLand

Send message
Joined: 21 Aug 05
Posts: 480
Credit: 211
RAC: 0
Message 159266 - Posted: 29 Aug 2005, 16:22:28 UTC



Steven,

Apparently you imagine yourself in a market , being about

bargaining a great deal or gaining the prize of a lottery or winning

the big lot and at the end of year calculating the rate of your benefits

or your losses .
ID: 159266 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 159345 - Posted: 29 Aug 2005, 18:35:12 UTC

What did reset? Did you reset the Seti project yourself? If you did, why did you do that?

If you did it to get more work, then please read the front page news, the server status and the technical news for all that has happened in the past couple of weeks. It'll also explain why the schedulers are down.

Also check up on this thread, as well as any other thread in the Number Crunching forum.

If you don't mind doing some other work at this moment, there's a whole host of other projects out there that you can attach to. Use Boincstats as a guideline. It links to those projects that are Up & Running. Although only Einstein, LHC, SZTAKI, CPDN and Predictor have work at this moment.



ID: 159345 · Report as offensive
Profile Steven Douglas Huddleston
Avatar

Send message
Joined: 25 Jun 99
Posts: 7
Credit: 1,271,414
RAC: 0
Puerto Rico
Message 159370 - Posted: 29 Aug 2005, 18:55:48 UTC - in response to Message 159266.  



Steven,

Apparently you imagine yourself in a market , being about

bargaining a great deal or gaining the prize of a lottery or winning

the big lot and at the end of year calculating the rate of your benefits

or your losses .


??? I have no idea what you are talking about. I thought this was the "Wish List" So, I made "a wish".

Here is a week's worth of good work done for a really cool cause, gone to waste due to a freaky software quirk.

I think you have mistaken my meaning, (which is common with polyglots), I'm not competing with anyone, it's just the waste of energy (effort) that I find appalling. Some of us are really interested in the science of it, and try our best to be as efficient as possible.

This problem can easily be eliminated by coding into the software a backup procedure for whenever the software is going to reset. How can losing properly processed work units be good for the project? Isn't the idea of distributed computting all about doing more work in less time?

It hurts when our good-hearted efforts vanish for *no good reason*.
Mafú
"Experience is the ocean you cross to get from knowledge to truth."
The Kongaloid Website
ID: 159370 · Report as offensive
Profile Steven Douglas Huddleston
Avatar

Send message
Joined: 25 Jun 99
Posts: 7
Credit: 1,271,414
RAC: 0
Puerto Rico
Message 159388 - Posted: 29 Aug 2005, 19:07:54 UTC - in response to Message 159345.  

What did reset? Did you reset the Seti project yourself? If you did, why did you do that?


You know? I have no idea! No, I most certainly *DID NOT* reset it.

No, Ageless, I know better than that.

I was browsing something or other (not SETI related) and the browser got locked up, (!@#& Microsoft!), I had to reboot the non-responsive system. If you look at the snippet of log I posted you can see it's from the startup. I think BOINC wasn´t able to close its files properly before the reboot, which is strange because it was a task manager shutdown.

(Sigh!) Oh-gwey, Cest la vie!
Mafú
"Experience is the ocean you cross to get from knowledge to truth."
The Kongaloid Website
ID: 159388 · Report as offensive
Fred1701

Send message
Joined: 25 Aug 00
Posts: 3
Credit: 7,307,451
RAC: 0
France
Message 159453 - Posted: 29 Aug 2005, 20:41:46 UTC

Hi,

I also have this kind of problem sometimes.
And it always happen when my computer resets or shuts down due to a power failure for example.

I'm still subscribed to the project (although it's not always the case, I also have to subscribe) but I've lost all my work.
And this time, it's very annoying since I can't get any work units to replenish my work tank...

If there is some logs or something you need to analyze the problem, do not hesitate to ask.

Fred
ID: 159453 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 159465 - Posted: 29 Aug 2005, 20:51:54 UTC - in response to Message 159388.  
Last modified: 29 Aug 2005, 20:53:08 UTC

Nice to see you look at it that way. Mostly people combine their OS crash with Boinc is shit.. It's good to read the other side of the story once. :)

A backup option for the client_state.xml, now there's a good one. I'll be asking for this as well. Although I am not quite sure how they would implement this prior to a crash, since crashes never make themselves known. ;)

Although the client_state.xml file does make a backup of itself each time it is being written to: Client_state_prev.xml

The trouble with a Windows crash (for whatever reason) is that they usually happen when something is written to disk. So if the client_state.xml file is being written to at that moment, or Boinc is writing the science application's status to the drive and you get the crash at that moment, there's not much that can be done about that. Well, but for for you to not auto start Boinc on startup (difficult as hell when it's a service), and first check if the prev file is still there, then rename or copy that one to the client_state.xml file. And only then start up Boinc.

@Fred: The Seti project is still down, so no one can get any work.
As for losing work, it sucks, but sometimes you can't help it.
ID: 159465 · Report as offensive
Profile Steven Douglas Huddleston
Avatar

Send message
Joined: 25 Jun 99
Posts: 7
Credit: 1,271,414
RAC: 0
Puerto Rico
Message 159645 - Posted: 30 Aug 2005, 0:16:08 UTC - in response to Message 159465.  
Last modified: 30 Aug 2005, 0:19:23 UTC

Mostly people combine their OS crash with Boinc is shit.. It's good to read the other side of the story once. :)


Ahh...yes! Well, most people today are too young to remember an age where "instant gratification" was not in the lexicon. I'm a career coder, so I know what the poor guys at SETI & BOINC must be going through.

A backup option for the client_state.xml, now there's a good one. I'll be asking for this as well. Although I am not quite sure how they would implement this prior to a crash, since crashes never make themselves known. ;)


There is an even better idea! Let's get a discussion started on this, see what ideas may pop up. Who knows what un-harnessed resources are waiting to be tapped in that most-ignored-of-the-distributed-computing-peripherals: The User!

Although the client_state.xml file does make a backup of itself... each time it is being written to: Client_state_prev.xml


Practically useless, unless you discover something is terribly wrong and stop everything before BOINC restarts (and makes a new copy of the already modified client_state). This is like the "last known good configuration" in Windows. If you realize something went wrong before you log on, then you are saved. The problem is you don't usually know something is wrong until after you log on.

...for you to not auto start Boinc on startup (difficult as hell when it's a service), and first check if the prev file is still there, then rename or copy that one to the client_state.xml file. And only then start up Boinc.


Hmmm...how about this: Why delete finished work units at all during a reset? Why not just reset the units in progress, (and whatever is left in the queue if you like), but leave the finished work intact.

As for a backup strategy, I suppose it should not be too hard to program an extra feature into the configuration that backs-up all the critical files in a backup directory, (say, "C:/.../BOINC/BACKUP"), at a user configurable interval, (every 10 minutes, every 1 hour, whatever!), a "RESET" can then just return to that backup-state and at least all would not be lost. Two redundant backup file sets would eliminate the problem altogether since any reboots that happen during the backup-write process would leave the second set intact. If your interval is set to "every one hour" then the most you'd lose in the worst case scenario is two hours of work!!!

I suppose this is one update that would be greatly appreciated, I think, (Returning to your comment on user-judgments), by all but the most hard-core, die-hard misanthropes out there.

Wouldn't that help to reduce the "BOINC-flame-queues" considerably? I betcha that's bogging down the servers some. :D
Mafú
"Experience is the ocean you cross to get from knowledge to truth."
The Kongaloid Website
ID: 159645 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 159652 - Posted: 30 Aug 2005, 0:53:10 UTC

You should post that over on the Boinc forums, you know. :)
ID: 159652 · Report as offensive
Profile Steven Douglas Huddleston
Avatar

Send message
Joined: 25 Jun 99
Posts: 7
Credit: 1,271,414
RAC: 0
Puerto Rico
Message 159684 - Posted: 30 Aug 2005, 2:18:43 UTC - in response to Message 159652.  

You should post that over on the Boinc forums, you know. :)


Now why didn't I think of that? (It must be my grief-stricken state.)

The deed is done! Thanks for the "wake-up-slap".
Mafú
"Experience is the ocean you cross to get from knowledge to truth."
The Kongaloid Website
ID: 159684 · Report as offensive

Questions and Answers : Wish list : Everybody's Wish (fer-sure!)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.