BOINC needs a overhaul

Message boards : Number crunching : BOINC needs a overhaul
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 975048 - Posted: 1 Mar 2010, 16:23:01 UTC - in response to Message 974932.  

BUG REPORT

Boinc uses Alpha a two way e-mail list to report bugs.
1) Alpha is a horrible name for this error reporting list as the user knows he isn't running Alpha software or Beta software, he is running release software and will continue to look in vain for a place to report errors or bugs on release software.
2) Users should not have to have their e-mail boxes stuffed with other error reports to report a bug. They are not developers!
3) Users should not have to be familiar with arcane log switches and know how to edit config files to report bugs.

END BUG REPORT

SUGGESTION

Place a report a bug in a menu item that automatically generates the necessary items for the developers, sends the report and user contact information if more detailed things are needed to find the bug.

END SUGGESTION

Stop forcing the users to be developers, give them the respect to be just a user.

Effective interfaces do not concern the user with the inner workings of the system. That goes for error reporting too!

Thank you Gary well put!
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 975048 · Report as offensive
Profile Odan

Send message
Joined: 8 May 03
Posts: 91
Credit: 15,331,177
RAC: 0
United Kingdom
Message 975050 - Posted: 1 Mar 2010, 16:25:42 UTC - in response to Message 975039.  

Thank you Richard for your rational discussion of the matter.
Some of us apply more emotion than logic to the situation, and then things degrade to personal attacks when others do not agree with our thoughts or opinions.
Which does little to advance any resolution to the subject at hand.



I think someone has been at the catnip :)
ID: 975050 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 975052 - Posted: 1 Mar 2010, 16:29:15 UTC - in response to Message 974944.  

SNIP

I would suggest that the 2*cpu's rule be rejected on these grounds.

Sorry, you have not made your case.

1) Multi projects are encouraged. So if you cannot get work from one, you could get work from another.

2) There are real cases where a limit is required as explained.

If you can come up with a better limit, propose it and how to calculate it.[/quote]
I am so tired of people in this Seti forum trying to get me to run other projects. I signed on to Seti in the Beginning and that is what I want to run. I never liked the idea of Boinc and that was why I left for awhile. If I didn't run seti I would probably Fold.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 975052 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 975056 - Posted: 1 Mar 2010, 16:36:37 UTC - in response to Message 974961.  

Well, by my reckoning, out of the 22 or so people who have expressed their opinion about 15 would like to see an overhaul... 70% or so.

The whole voting process itself is fundamentally flawed.

Winterknight wrote:

"Well I for one think having to keep a computer on waiting for uploads to finish before it can get any downloads is STUPID.


... and he's voting for a ground-up rewrite because surely this broken feature would not be included.

Hiamps says:
Quick as possible for WHO? The backoff made mine go into 24 hr wait even after things were going again. Was not good for the project either if machines can't start working again. Plus boinc can't handle large loads which are growing with every Nvidia card added.

... and he'd vote for the re-write because it fixes this problem.

... as would msattler because it prevents him from getting his credit fix.

Someone else says "BOINC doesn't cache enough, by default, to work through the weekly outage -- that can't be right!" and guess what, if you're harnessing a waste product (idle clock cycles on computers that would be turned on anyway) then you're wasting a waste product, and it is not the end of the world.

Just for fun, let's define "code bloat" -- this is the list of all "features" that in the sole opinion of the one individual, is useless or actually does damage.

Get a complete list of "code bloat" from everyone here and the entire BOINC code base is gone.

I won't catalog the rest because they're all pet-peeves, and the "voters" are certain that the brain damaged behaviour they see would surely not be intentionally reinstated as part of a ground-up re-write.

The idea of voting is flawed because the people doing the voting can agree by voting that they want "better" without exploring the fact that each voter has a different (perhaps radically different) idea of better.

Fifteen votes, and fifteen different ideas of what an improved "overhauled" BOINC would do.

(Hell, I'll vote "yes" for better, just be aware of the fact that by "better" I'm pretty much rejecting every stupid idea suggested in this forum -- by which I mean nearly all of them.)

Here is an idea on how to "fix" BOINC.

Take the whole credit system and hide it. Call it canonical credit, and don't publish it anywhere. Ever.

Take the canonical credit granted over the past 30 days for all projects, and figure out how much credit was granted per second for each project over the prior 2,592,000 seconds, and grant that to each project for each CPU second BOINC uses. If all projects are out of work, grant credit anyway.

If credit is over-granted compared to canonical credit, the credit rate will adjust downward some.

Now, no one cares about uploads, and there are no "pending" credits. Credit simply rises like clockwork.

Hide all work units. Remove all of the client-side logs. Encrypt all of the data files (work units, client state, etc.).

Replace the "update project" button with a code segment reports "your update is complete" and nothing else. Maybe a delay so it is believable, or give some bonus cobblestones -- they're meaningless anyway.

This way, everyone sees things moving in a nice smooth positive direction, and nobody can see what happens.

There. Fixed.

... and the developers are free to work through the actual issues without being molested by crazy people who buy computers and throw good money to the power companies just to amass valueless credits.

As far as I can tell, just about everything else in the forums (with the possible exception of the raccoon and TLTPW threads in the Cafe) is pure whining from people who can't be bothered to ask "why" before they start graphically demonstrating their ignorance.

Boy, How about all the times I have 2 units running on one core? How about the way Boinc gets confused when the cache is not really that big? Buy yourself a Cuda card or 2(Of course boinc isn't designed for) and watch how fast boinc gets totally confused. How many other problems have I posted in the past Ned? It is a whole lot more than the 24 hour backoff. Wonder why I have a need to push the buttons all the Darn time? If it were working like it should I wouldn't need the buttons. Personnally I saw this thread as a way for some users to let off steam after the recent crash but it seems the ones against Lukes idea are making it a bigger issue.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 975056 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 975067 - Posted: 1 Mar 2010, 17:28:36 UTC - in response to Message 974642.  

I have a few comments about various posters' comments.

Comparing it to a commercial business, isn't it good customer service to reply or answer your calls and emails so the customer knows they have been acknowledged?


Comparing a nil-funded, open source project with a commercial business is an entirely unfair comparison. These are volunteers who contribute their time on the side of their real jobs.

Disaster???
The project wide backoff is a disaster.


If you've studied queuing theory, then you'll know that there is some logic to having an exponential backoff when communication fails. Certainly, the logic needs a good analysis, but there are mathematical ways to calculate an appropriate value without BOINC waiting 23 hours. Did the BOINC developers do the math before programming those values? I don't know.

[BOINC Developers] need to train themselves... [for] multi-resource, multi-core, multi-threading, multi-application, multi-everything projects... [with] a thorough code review (not re-write), concentrating on areas where the entity-scopes are wrong.

Richard made a lot of good points, and I agree some quality control and a lot of beta testers would be very helpful for the BOINC core code. All of that is contingent on a code review and communicating a clear vision of what BOINC should and should not do.

As a side note, I believe the future of distributed grid computing lies in virtualization technology. A good start is the CernVM application. Without a clear vision, communicated to the developers, testers, and the public volunteers, this project will continue to struggle with acceptance and technical limitations.
ID: 975067 · Report as offensive
Luke
Volunteer developer
Avatar

Send message
Joined: 31 Dec 06
Posts: 2546
Credit: 817,560
RAC: 0
New Zealand
Message 975079 - Posted: 1 Mar 2010, 18:11:56 UTC - in response to Message 975014.  

Thank you Richard for taking a civil approach on the discussion, even if our views don't completely align.

Why can't all posters be like him?

One of the first rules of an argument is to attack only the argument, not the person stating it. That just shows weakness.
- Luke.
ID: 975079 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 975082 - Posted: 1 Mar 2010, 18:14:55 UTC - in response to Message 975045.  

Thank you Richard for your rational discussion of the matter.
Some of us apply more emotion than logic to the situation, and then things degrade to personal attacks when others do not agree with our thoughts or opinions.
Which does little to advance any resolution to the subject at hand.


OK, who hacked into SETI user accounts and stole Mark's ID?

I think it was some bad cats on a modem pool in the Ukraine.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 975082 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 975090 - Posted: 1 Mar 2010, 18:36:42 UTC - in response to Message 975079.  

Thank you Richard for taking a civil approach on the discussion, even if our views don't completely align.

Why can't all posters be like him?

One of the first rules of an argument is to attack only the argument, not the person stating it. That just shows weakness.

Exactly, Luke.

We should be able to 'argue' a bit without going down each others throats.
I can be guilty at times myself.

A pity we can not all be so well mannered and insightful as Mr. Haselgrove.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 975090 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 975097 - Posted: 1 Mar 2010, 19:05:51 UTC - in response to Message 975079.  

Thank you Richard for taking a civil approach on the discussion, even if our views don't completely align.

Why can't all posters be like him?

One of the first rules of an argument is to attack only the argument, not the person stating it. That just shows weakness.

I disagree. The first rule of an argument is to win!

One of my favorite quotes.
"The worst part in an argument is when you realize you are wrong."
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 975097 · Report as offensive
Profile RFP
Avatar

Send message
Joined: 21 Jan 10
Posts: 44
Credit: 29,197
RAC: 0
United States
Message 975114 - Posted: 1 Mar 2010, 20:20:59 UTC

Sigh.

What ever happened to "anything worth doing is worth doing correctly"? True this not a commercial endever, but habits established on this type of project tends to follow ones into the commercial world. Is it any wonder why everyday devices such as cars which use computer controls go toes up. Again back to the fact that this is not a commercial effort so there should not be any TTM constraints or project managers looking over you shoulder going 'is it done yet?'. As so many have pointed out this is done on a shoestring budget and uses volunteer developers and testers so why not try for perfection?
ID: 975114 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 975124 - Posted: 1 Mar 2010, 20:48:42 UTC - in response to Message 975114.  
Last modified: 1 Mar 2010, 20:50:07 UTC

... so why not try for perfection?


Can anyone prove they are not trying? As you said, shoestring budgets mean it could take a very long time to get there.

Any time you have finite resources, in a commercial project or otherwise, you have to prioratize those resources. Things like getting the next grant and doing the science are probably more important that keeping us credit hounds busy.

And before everybody jumps on me, I still haven't seen any proof that server outages, upload/download delays, credit issues, etc. effect the quality or the timeliness of the science. Sure, in general a delay in an upload might delay the science, but can you prove that is actually the limiting factor right now in SETI? Or in any other BOINC project?

I'm much more concerned about the lack of new data from the Telescope. How will quicker uploads or downloads help that?

ID: 975124 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30638
Credit: 53,134,872
RAC: 32
United States
Message 975151 - Posted: 1 Mar 2010, 22:02:22 UTC - in response to Message 975048.  

Effective interfaces do not concern the user with the inner workings of the system. That goes for error reporting too!

Thank you Gary well put!

Specific things can get fixed, general ill ease is a lot harder to deal with.

ID: 975151 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30638
Credit: 53,134,872
RAC: 32
United States
Message 975201 - Posted: 2 Mar 2010, 2:26:50 UTC

This is a BOINC part that needs a total re-design!

BUG REPORT

Scheduler Credit/Debit

Situation:
User attaches to LHC -- or any bursty project -- and sets his crunch percent high in the project because he wants to grab a work unit if one becomes available.

Short term this does not cause a big issue, or not an issue if the project has work.

Long term it is a big issue. Consider that two years down the pike ALL the other projects will be claiming they have over crunched two years of data. Even if the user detaches from LHC this two years of credit will still exist on the other projects. This can cause the scheduler to not fetch work units that would complete on time from a project with a short deadline because the crunch time is owed to someone else who no longer exists. A denial of crunch time.

Possible Solution:

Write the credit/debit routine to be a debt driven one, not a credit driven one. That way when a user detaches from a project the rest of the projects will reset to normal expected values.

Better, use a double entry system of bookkeeping, where both the credit and debit are kept track of by each project as switching to a debt only driven system may have its own pitfalls. Accountants do double entry for a reason.

Additional, don't accumulate debit/credit if the project is unable to supply any work units over a time frame. Perhaps if the project can't deliver a work unit in a weeks time then further work unit requests that are not fulfilled will not incur debit/credit until the project sends a work unit. This would prevent dead projects for running up infinite balances. There may also be need for a sanity check on the total amount of debit/credit allowed, perhaps a month's maximum.

END BUG REPORT

ID: 975201 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 975209 - Posted: 2 Mar 2010, 2:56:06 UTC - in response to Message 975201.  
Last modified: 2 Mar 2010, 2:58:28 UTC

This is a BOINC part that needs a total re-design!

BUG REPORT

Scheduler Credit/Debit

Situation:
User attaches to LHC -- or any bursty project -- and sets his crunch percent high in the project because he wants to grab a work unit if one becomes available.

Short term this does not cause a big issue, or not an issue if the project has work.

Long term it is a big issue. Consider that two years down the pike ALL the other projects will be claiming they have over crunched two years of data. Even if the user detaches from LHC this two years of credit will still exist on the other projects. This can cause the scheduler to not fetch work units that would complete on time from a project with a short deadline because the crunch time is owed to someone else who no longer exists. A denial of crunch time.

Possible Solution:

Write the credit/debit routine to be a debt driven one, not a credit driven one. That way when a user detaches from a project the rest of the projects will reset to normal expected values.

Better, use a double entry system of bookkeeping, where both the credit and debit are kept track of by each project as switching to a debt only driven system may have its own pitfalls. Accountants do double entry for a reason.

Additional, don't accumulate debit/credit if the project is unable to supply any work units over a time frame. Perhaps if the project can't deliver a work unit in a weeks time then further work unit requests that are not fulfilled will not incur debit/credit until the project sends a work unit. This would prevent dead projects for running up infinite balances. There may also be need for a sanity check on the total amount of debit/credit allowed, perhaps a month's maximum.

END BUG REPORT


Go signup for the Boinc Alpha mailing list and file it! You are perfectly capable of doing that, you should not require a "middleman!"

Please bring the appropiate Debug Logs.

Regards

Edit:
You do not even have to be an Boinc Alpha Tester, valid input is welcome.
Please consider a Donation to the Seti Project.

ID: 975209 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30638
Credit: 53,134,872
RAC: 32
United States
Message 975258 - Posted: 2 Mar 2010, 6:11:08 UTC - in response to Message 975209.  

You are perfectly capable of doing that, you should not require a "middleman!"

Try to understand this! It is the error, yes, a human factors one, which pervades BOINC and much open source software.

ID: 975258 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 975270 - Posted: 2 Mar 2010, 7:46:31 UTC - in response to Message 975258.  


Try to understand this! It is the error, yes, a human factors one, which pervades BOINC and much open source software.


It's not just open source software. IMV, it's down to programming. Look at Microsoft, NVidia etc etc.... As a system builder, I'm getting customers coming back complaining that their display is out of whack...why? simple....Windows Automatic Updates with updated display drivers amongst other problems.

On my own systems, I've had to return to 185.85 & 182.50 to keep my displays working as they should & keep NVidia updates unchecked.

As for Boinc, I no longer keep on top of the development cycle as 6.10.24 crashed several systems. Find 6.10.19 ok.

It would be nice for developers to return to structured programming.

Also, even though I'm not a programmer, with technology moving at it's current pace, Boinc should be keeping pace as well.
ID: 975270 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 975277 - Posted: 2 Mar 2010, 10:04:53 UTC


I agree...and it needs to be from the ground up...
ID: 975277 · Report as offensive
Profile The Gas Giant
Volunteer tester
Avatar

Send message
Joined: 22 Nov 01
Posts: 1904
Credit: 2,646,654
RAC: 0
Australia
Message 975434 - Posted: 3 Mar 2010, 6:19:12 UTC

All up I think BOINC works very well. I don't always like how it handles work fetch and how it works out what to crunch and what it preempts. I also don't like how it can loose all the stats if your computer freezes at the wrong point in time - which happens all to regularly with 'doze. I also don't like the way GPU jobs are scheduled (FIFO is just all wrong and needs to be fixed).

I'm no fan boy - but it has come a very long way since it's inception and is many times better than the very early versions. And OMG look at the number of projects using it.

It looks like it's doing exactly what it was designed to do.

I also wish reporting bugs / interacting with the devs was easier.

YMMV
ID: 975434 · Report as offensive
Profile The Gas Giant
Volunteer tester
Avatar

Send message
Joined: 22 Nov 01
Posts: 1904
Credit: 2,646,654
RAC: 0
Australia
Message 975649 - Posted: 4 Mar 2010, 5:22:20 UTC
Last modified: 4 Mar 2010, 5:24:04 UTC

Speaking of not liking the work schedular, see this link

PrimeGrid has gotten into deadline issues due to some other weired BOINC issue (I'll try and catch that one next) so now BOINC is going to crunch every wu it thinks is in deadline trouble until the sum of the estimated time to completion is OK again.

This means that each wu will remain in RAM (removing them is not the answer) and cause my machine to start paging to disk. UGH!
ID: 975649 · Report as offensive
Profile Kibble (KB7TIB)
Avatar

Send message
Joined: 6 Dec 99
Posts: 27
Credit: 10,121,469
RAC: 2
United States
Message 975677 - Posted: 4 Mar 2010, 8:18:31 UTC

@ WinterKnight:


There is a case for users to participate in more than one project, but it can never be forced and if a user wants to do only one project then as I see it this rule makes BOINC broken. Find a different solution that doesn't affect projects that don't need it.


Please try this point out at the LHC@home fora. LOL!

Regards

ID: 975677 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : BOINC needs a overhaul


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.