BOINC going beserk.

Message boards : Cafe SETI : BOINC going beserk.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Rudolfensis

Send message
Joined: 20 Nov 99
Posts: 60
Credit: 427,273
RAC: 0
Papua New Guinea
Message 216632 - Posted: 17 Dec 2005, 19:04:19 UTC
Last modified: 17 Dec 2005, 19:06:13 UTC

Thought this would be the best place for it since what happened isn't too specific nor why is know why it happened in the first place.

I had problems with BOINC communications, the old problem where it won't connect to the manager. Not much of a problem, I just start the manager manually and then I start BOINC until, after quite a few boots, by magic it connects again.

I've uninstalled Norton Antivirus 2005 this morning, the only really big change since yesterday in the registry. I started BOINC this afternoon, the whole thing went nuts, I had no settings, no computer ID, no projects, nothing, nada.

I checked the directories, I could see all units and all projects directories. Slowly, BOINC went through the different connections, updated the projects. Things are back to normal eventhough the little bugger created a new computer ID which I had to merge with my previous one which I would have prefered to keep but heck!

Now... only one problem remains. They are units of the old ID which seems to be invisible to BOINC. 3 units to be more precise:

ID 41512302
ID 41195501
ID 40996345
16, 15, 14 december dated.

How in the world can I force BOINC to see these units? They are there, in the directory:

28se04ab.13834.17920.665908.245
23no04ab.2166.10001.11064.11
29se04ab.12732.11344.272152.1
All of them in the 353kb range thus, valid units.

I've restarted BOINC multiple times, to no avail. Should I dettach? Back up the directory first, copy the units back and reattach?

I'm thinking that the uninstallation of NAV2005 might have triggered a small mishap in the registry or open tasks. The change of computer ID may be the problem and may not be the problem, I can't even tell.
ID: 216632 · Report as offensive
Profile Mr.Pernod
Volunteer tester
Avatar

Send message
Joined: 8 Feb 04
Posts: 350
Credit: 1,015,988
RAC: 0
Netherlands
Message 216637 - Posted: 17 Dec 2005, 19:11:14 UTC

sounds like your client_state.xml got corrupted somehow.
if you are able to somehow re-insert the file information on those three results into your current client_state.xml, you can crunch them, but I don't think it's worth the risk to the new work you have on your computer.
if those three units were "ready to run", just let them time out and they will be resend to someone to be crunched at a later date.
ID: 216637 · Report as offensive
Profile Rudolfensis

Send message
Joined: 20 Nov 99
Posts: 60
Credit: 427,273
RAC: 0
Papua New Guinea
Message 216651 - Posted: 17 Dec 2005, 19:28:19 UTC - in response to Message 216637.  

Very good suggestion there. I did look into the XML but I think it may not be possible as I saw that there is a XML signature for each unit, I think I couldn't get passed that one. I don't want to let them time out, this would neg me, I do want to force the units to be crunched as one of the units was progressing to around 75%. There's got to be another way to make boinc see these units. I will try a very low level recovery of the xml itself, if I do get it, I'd have the XML signatures which would allow me to try and do a paste inside the new one.
ID: 216651 · Report as offensive
Profile MJKelleher
Volunteer tester
Avatar

Send message
Joined: 1 Jul 99
Posts: 2048
Credit: 1,575,401
RAC: 0
United States
Message 216665 - Posted: 17 Dec 2005, 19:37:13 UTC - in response to Message 216651.  

I don't want to let them time out, this would neg me,

If that's a concern, don't worry. For each of the results you lose, your quota will go down by one. For every successful result you send, your quota will double up to the max of 100/cpu. You'd never notice.

If you do want to go ahead and try a repair of the xml file, my only suggestion would be the obvious of making a backup before you start to work with it, so you can restore a working file if it doesn't work. Good luck!

MJ

ID: 216665 · Report as offensive
Profile Rudolfensis

Send message
Joined: 20 Nov 99
Posts: 60
Credit: 427,273
RAC: 0
Papua New Guinea
Message 216706 - Posted: 17 Dec 2005, 20:16:24 UTC - in response to Message 216665.  

If you do want to go ahead and try a repair of the xml file, my only suggestion would be the obvious of making a backup before you start to work with it, so you can restore a working file if it doesn't work. Good luck!
MJ


Right about the backup, always a -very- good suggestion. Well, it may seem hopeless, I did try two very powerful tools to recover files. The very most powerful one couldn't recover the client_state's I was interesting in, the ones from this morning. The strange thing is that, there is a gap between dec 9 and dec 17, nothing in between as far as recovering client_state even its own backup client_state_prev. I see nearly a thousand files there, BOINC makes regular backups of that file. But why wouldn't have made these backups during that time period would be a mystery to me since I'm quite sure it must have made these backups.

I was also concern about delays, if I do let them time out, it will delay others from getting their credits. I mean, this is almost funny, the files are there, the units are there, and yet BOINC cannot see them.

ID: 216706 · Report as offensive
Profile MJKelleher
Volunteer tester
Avatar

Send message
Joined: 1 Jul 99
Posts: 2048
Credit: 1,575,401
RAC: 0
United States
Message 216723 - Posted: 17 Dec 2005, 20:28:08 UTC - in response to Message 216706.  

I was also concern about delays, if I do let them time out, it will delay others from getting their credits. I mean, this is almost funny, the files are there, the units are there, and yet BOINC cannot see them.

One of those results has already awarded credit to the other three hosts who got it. The others have two returns, and just need the third to come back. This kind of occurrance is one of the reasons that SETI does send four results out, so that if one has a problem, three will still be able to validate. Even if those thirds don't come back, the other hosts aren't being denied credit, it'll just take a little bit longer to happen.

MJ

ID: 216723 · Report as offensive
Profile Rudolfensis

Send message
Joined: 20 Nov 99
Posts: 60
Credit: 427,273
RAC: 0
Papua New Guinea
Message 216773 - Posted: 17 Dec 2005, 21:15:21 UTC
Last modified: 17 Dec 2005, 21:27:58 UTC

Here's a small picture of both BOINC and the project directory.

Picture of BOINC Itself

and

Picture of project directory

You can see clearly the 14,15,16 ones.
ID: 216773 · Report as offensive
Profile Captain Avatar
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 15133
Credit: 529,088
RAC: 0
United States
Message 216777 - Posted: 17 Dec 2005, 21:17:22 UTC - in response to Message 216773.  

Here's a small picture of both BOINC and the project directory.



and



You can see clearly the 14,15,16 ones.



It sure looks like your data has been corrupted....
ID: 216777 · Report as offensive
Profile Rudolfensis

Send message
Joined: 20 Nov 99
Posts: 60
Credit: 427,273
RAC: 0
Papua New Guinea
Message 216784 - Posted: 17 Dec 2005, 21:22:49 UTC - in response to Message 216777.  

But that would be the case only for units which would be in progress. It couldn't be the case for a unit which wasn't touched yet.
ID: 216784 · Report as offensive
Profile MJKelleher
Volunteer tester
Avatar

Send message
Joined: 1 Jul 99
Posts: 2048
Credit: 1,575,401
RAC: 0
United States
Message 216791 - Posted: 17 Dec 2005, 21:28:53 UTC - in response to Message 216784.  

But that would be the case only for units which would be in progress. It couldn't be the case for a unit which wasn't touched yet.

Your three hidden (to BOINC) results look ok, but you've got three others in the directory that will odds-on fail. They're the ones that don't have a file size of 361,xxx. There was a spate of bad results sent out from the server recently, they'll pass through the system quickly.

MJ

ID: 216791 · Report as offensive
Profile Rush
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 3131
Credit: 302,569
RAC: 0
United Kingdom
Message 216800 - Posted: 17 Dec 2005, 21:37:02 UTC - in response to Message 216632.  
Last modified: 17 Dec 2005, 21:59:58 UTC

I've uninstalled Norton Antivirus 2005 this morning, the only really big change since yesterday in the registry. I started BOINC this afternoon, the whole thing went nuts, I had no settings, no computer ID, no projects, nothing, nada. >snip< I'm thinking that the uninstallation of NAV2005 might have triggered a small mishap in the registry or open tasks. The change of computer ID may be the problem and may not be the problem, I can't even tell.

Yvan, Norton is a complete bastard on your system. You might try the Symantec Norton Removal Tool.

Whether you use their own uninstall program or use Add/Remove Programs, Norton leaves crap all over your system and occasionally wreaks havoc. This tool may help, but keep it mind it will scour your system. After that, use RegCleaner or RegSupreme to get rid of whatever is left over.

This may help. 8^]

Edit=name correction. My apologies. 8^]

Cordially,
Rush

elrushbo2@theobviousgmail.com
Remove the obvious...
ID: 216800 · Report as offensive
Profile Lampros
Avatar

Send message
Joined: 17 Jun 02
Posts: 279
Credit: 13,973,726
RAC: 0
Canada
Message 216811 - Posted: 17 Dec 2005, 21:45:07 UTC - in response to Message 216800.  


Yves, Norton is a complete bastard on your system.
[/quote]

How true that statement is !
I've used a number of Symantecs products and had all sorts of problems.
Now I won't touch any of them, no matter how good the reviews.

ID: 216811 · Report as offensive
Profile Rudolfensis

Send message
Joined: 20 Nov 99
Posts: 60
Credit: 427,273
RAC: 0
Papua New Guinea
Message 216813 - Posted: 17 Dec 2005, 21:45:25 UTC - in response to Message 216800.  
Last modified: 17 Dec 2005, 21:47:40 UTC


Yves, Norton is a complete bastard on your system. You might try the Symantec Norton Removal Tool.


Yvan actually :) This is way funny since this is exactly what I was looking for. As I was browsing the directories, my firewall sprung up and told me that LiveUpdated wanted to update. Funny since I had uninstalled NAV2005 two hours ago! But, to be fair, there is no proof that NAV caused this. I actually don't have a clue to what might have happened here. It's not that much of a big deal when you think about it, no stats lost. But still, I'm annoyed when my computer doesn't do exactly as it is told. We have a great relationship going on, I am Ze Master, it is ze slave. Well... it's not always that case ;)

Kind of ironic to use a Symantec product to remove another Symantec product. But this is digressing much from my original topic :)
ID: 216813 · Report as offensive
Profile Rudolfensis

Send message
Joined: 20 Nov 99
Posts: 60
Credit: 427,273
RAC: 0
Papua New Guinea
Message 216819 - Posted: 17 Dec 2005, 21:49:18 UTC - in response to Message 216791.  

[quote]They're the ones that don't have a file size of 361,xxx. There was a spate of bad results sent out from the server recently, they'll pass through the system quickly.

MJ


Yeah, saw those. And I did suspect they wouldn't go far, I do get those from time to time. I believe the server can't do much but pass them along.

ID: 216819 · Report as offensive

Message boards : Cafe SETI : BOINC going beserk.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.