Rom, Mikey2345 is not the only one...

Message boards : Number crunching : Rom, Mikey2345 is not the only one...
Message board moderation

To post messages, you must log in.

AuthorMessage
.
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 410
Credit: 16,559
RAC: 0
Message 52559 - Posted: 10 Dec 2004, 4:49:51 UTC
Last modified: 10 Dec 2004, 9:38:16 UTC

> The issue causing the host to be recreated with every RPC has been identified
> and a fix has been checked in.
>
> Basically the problem was that BOINC couldn't write to the state file and so
> everytime BOINC was restarted on that machine, it would recreate a new host.
>
> This mostly happened in schools where the administrator/teacher setup the
> software and then the students, with less premissioned accounts, started up
> the software but couldn't modify the state files or store the workunits.
>
> For the short term we are just going to have BOINC shut itself down when it
> detects this condition, eventually we'll have a smart enough setup to allow
> the person settng up the software to configure BOINC to run as a service.
>
> Mikey2345 is currently removing the BOINC software from the schools computers
> until the bug fix is deployed. The bug will be fixed in the next public
> release of BOINC.
>
>
Hi Rom,

Mikey2345 is not the only one who cause this problem! When I'm looking at all my pending WU's, many of them are waiting for result from socalled New's, and when I look at some of them, it seems, there are a lot of Mikey's out there!

F.ex. look at this
http://setiweb.ssl.berkeley.edu/workunit.php?wuid=4842014
:-(

I actually have pondered if I bother to crunch more WU's until the problem with somebody's machines downloading a vast number of WU's on replicated machines has been solved. My own only have 1 - 2 WU's at the time and have not had any problems for a long time either in downloading or uploading, so....

Sincerely Lena
ID: 52559 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 52616 - Posted: 10 Dec 2004, 12:36:35 UTC - in response to Message 52559.  
Last modified: 10 Dec 2004, 12:37:46 UTC

post removed by author....I posted the wrong thing, sorry.

ID: 52616 · Report as offensive
Profile PT

Send message
Joined: 19 May 99
Posts: 231
Credit: 902,910
RAC: 0
United Kingdom
Message 52619 - Posted: 10 Dec 2004, 13:02:25 UTC - in response to Message 52559.  
Last modified: 10 Dec 2004, 13:02:41 UTC

Dear Lena,
I’ve seen some hair-raising example as well but obviously it is a bug involved in some of the cases. To this we have to add some other situations where WUs get lost: My hard disk crashed once and all my WUs disappeared in cyber space. Another time I made a severe mistake and, once again, my WUs joined my earlier WUs in the famous space! I guess I’m not the only one that lost some so this will always happen to all of us!
I do have pending WUs since July so bad things happen.

Happy crunching
ID: 52619 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 52645 - Posted: 10 Dec 2004, 16:09:56 UTC

Just took a look on the wu, one of the computers is made in June and the other in September so any problem is something completely different than Mike2345 making a "new" computer all the time.

As for never returning the wu, can't rule out there isn't a bug somewhere, but the most likely reasons is users resetting, using too big cache so passes the deadline, or a crash loses the work. Any connection-problems also can mean never got the info of new work, or never managed to upload & report the result.

None of these is completely fixable, so some wu will always be re-sent to get enough results.
ID: 52645 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 52653 - Posted: 10 Dec 2004, 16:55:34 UTC

On the plus side...

When the new "Average turnaround time" value and prioritization of WUs are implemented...any WU past deadline will be sent to a host machine with one day turnaround.


ID: 52653 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 52657 - Posted: 10 Dec 2004, 17:41:09 UTC - in response to Message 52653.  

> On the plus side...
>
> When the new "Average turnaround time" value and prioritization of WUs are
> implemented...any WU past deadline will be sent to a host machine with one day
> turnaround.
>
Does this also mean that the cache size will be automatically adjusted for people that have it set too high? Seems silly to perpetuate the situation by allowing people to have a cache size of whatever and get so many units that they cannot possibly return them within the 2 week time alloted.
Maybe a small routine to catch people downloading like Mikey2345 to check their average return time and make sure that they do not get anymore units than they can handle. Could be a simple report because people getting more units could be like me, where I have just added several more machines to the "farm" I have doing Boinc, I actually moved them over from Classic. That way a report could be viewed by a human being and decided on a case by case basis. It would look to a computer that I have suddenly asked for a bunch more units and cannot possibly return then within 2 weeks, when in fact with the new machines on the account I can.

ID: 52657 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 52680 - Posted: 10 Dec 2004, 22:09:06 UTC

Right now this really only delays the awarding of credit, unless all the other hosts are suffering from this bug. That would mean the same workunit would have to be sent to four different hosts suffering from the same bug. Numerically speaking, it isn't very likely.

I'm not trying to make light of this bug, but please understand, the dev team would have to roll back the Alpha group from testing the 4.5x branch to take this fix for the public release right now. That in turn could delay the public launch of E@H by a few weeks.

We are a few weeks away from the E@H launch, and 4.5x/4.6x rollout. The 4.5x/4.6x branch has a fix for this bug, plus the graphical UI for Mac, Linux, and Solaris. Plus the graphical science application support for the already mentioned platforms. This will bring all platforms into feature partiy with each other. It's a big release.

On the Windows side, there is a new installer which can be configured for mass deployments, configuring Windows to run BOINC as a service, and a few other nifty features.

Trying to fix this now would be costly for all the projects.

I hope you all understand, we are trying to get this done as quickly as possible.

----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 52680 · Report as offensive
Profile Captain Avatar
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 15133
Credit: 529,088
RAC: 0
United States
Message 52681 - Posted: 10 Dec 2004, 22:15:23 UTC - in response to Message 52680.  

Thaks for the update and claifications Rom.

I am very excited about all the new changes comming
and wish you all great success and best wishes for the
holidays....



Timmy
ID: 52681 · Report as offensive
Profile Contact
Volunteer tester
Avatar

Send message
Joined: 16 Jan 00
Posts: 195
Credit: 2,249,004
RAC: 0
Canada
Message 52701 - Posted: 11 Dec 2004, 0:41:16 UTC - in response to Message 52680.  

Rom, thanks for this info. Every sentence gold.
Yes. Big release. Continue. Don't rush.
Timmy, thanks for pushing this issue. That’s entertainment!
Merry Christmas to all.
A few weeks away from E@H eh? Cool...

ID: 52701 · Report as offensive

Message boards : Number crunching : Rom, Mikey2345 is not the only one...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.