4.42 has been posted.

Message boards : Number crunching : 4.42 has been posted.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 112306 - Posted: 18 May 2005, 3:10:03 UTC - in response to Message 112300.  

At this time, the stable release is probably better for modem connections, or you can keep running 4.42 and reporting so the folks working on it can see how it looks from where you are.

-- Ned


Ned, I know it's not the best. I want them to be aware of issues so they can fix them. If I post problems here, and someone(or many someones) see it and respond with similar issues, then it's more likely that Berkeley will realize there is a problem and FIX the problem. I don't like software telling me that I can only have less than a days work and still share my CPU cycles with the projects the way I choose to do so.

Tony, here is the problem as I see it:

Everyone who is arguing that the new scheduler is badly broken is arguing that they don't see the "proper short-term mix" in their queue. They're right.

The problem is that under the old scheduler, you could easily pass deadlines, especially with projects like Einstein where they have a longish work unit and a short deadline.

People get mad when work is done, but they don't get credit because they missed a deadline.

So we have two incompatible goals: process everything in a strict order and strict proportion, but don't let work go over deadline.

The new scheduler does this by allowing one project to get priority as long as the deadlines are approaching fast, and then effectively locks that project out until the others are caught up.

Your dialup issue is a little different: BOINC doesn't download to fill the cache when it is worried about deadlines, and that probably still needs tuning.

If you get an official answer, it will probably be in the form of a newer release that works more smoothly in your environment.

-- Ned

P.S. as much effort as folks put into BOINC, and farms, and etc. I wonder why the really dedicated crunchers don't use a cable/dsl router with "dial backup" and just live on the backup feature -- no DSL or cable at all.
ID: 112306 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 112307 - Posted: 18 May 2005, 3:11:09 UTC - in response to Message 112108.  

Okay, I found a quirk.

I have been running in deadline mode since 4:07 hours local time, the three Sshh units went out as promised and in time. The Lattice unit wasn't on my system anymore, since I have detached again, after updating to 4.42 last night.

Okay, the last of the Sshh units ran till 18:31:55 local time. And since that time I have been running Einstein@Home, while the preference change is every 90 minutes, but it's in the deadlock of the new thing: 17/05/2005 18:31:55||New CPU scheduler policy: highest debt first.

This way I can never download new work, as there will always be something that will keep my BOINC busy with a highest this aor deadline that. What's next? That CPDN unit I got has highest STD and lowest LTD, so it will run to its end, for the next 4 weeks?

There is a CPU scheduler policy (nearest deadline or highest ST debt first) and there is also a download policy (allowed or not allowed) and they have distinct reasons.

Nearest deadline CPU mode occurs if:
1) A result is due within 24 hours.
or
2) A result is due within 2* the queue size.
or
3) Arrange the results in deadline order, start adding the remaining processing times. If at any time the remaining processing time for the WUs with earlier deadlines is greater than 80% of the time to that deadline (over committed).

No work is allowed if:
1) See #3 above.
2) Add the required fractions of time til the deadline of each WU. If the total is greater than 0.8, do not allow more work to be downloaded.
3) There are more than x (default 5) projects with work on the system. The default can be changed by editing the global_prefs.xml file.

Work will be requested in any case if:
1) A CPU is out of work.
2) Less than the min queue amount of work total for all projects is on the system.

Work will not be requested from a project if it has a negative LT debt unless a CPU is out of work. (This is the resource share balancing mechanism.).


BOINC WIKI
ID: 112307 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 112311 - Posted: 18 May 2005, 3:14:36 UTC - in response to Message 112281.  
Last modified: 18 May 2005, 3:20:14 UTC

John,

The LTD and STD amounts are fun enough and I hope you can teach everyone about them in due time, but please then take out the resource shares, as they are being overriden about 90% of the time anyway!

Or give us the old BOINC as an option, where we, the users, dictate what project should crunch first, by setting the resource shares and a considerable connect to server time.
ID: 112311 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 112314 - Posted: 18 May 2005, 3:17:06 UTC - in response to Message 112160.  

4.42 is now ignoring the resource share preferences set. If I want to run a project more than anything else, I get buffed out by the LTD and STD of other projects attached. I was to be running only Einstein, Seti & CPDN units and none of the project I have set to take most (as its LTD was -70,000 and a bit), so BOINC decides I can't download new units for this project...

So tell me why I am still setting a resource share between projects, when it will be ignored by the new BOINC anyway? Since the new BOINC runs on short term debts (STDs) and Long Term Debts (LTDs) only?

Even changing the resource setting to even higher, then manually updating the project, doesn't tell BOINC that the resource setting has changed.

The resource share is used in calculating the new debt from the old debt. If a project has a huge negative LT debt, it must have been doing some extra crunching recently (-70,000 seconds is a bit less than 20 hours).

debt += project CPU time - wall time * resource share.


BOINC WIKI
ID: 112314 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 112317 - Posted: 18 May 2005, 3:22:37 UTC - in response to Message 112306.  

Your dialup issue is a little different: BOINC doesn't download to fill the cache when it is worried about deadlines, and that probably still needs tuning.

If you get an official answer, it will probably be in the form of a newer release that works more smoothly in your environment.


Ned, I agree with everything you say.

Except, the dial up issue needs more than a little tuning. I only had 3 minutes left of a wu that wasn't due for 12 more days, and it still wouldn't download new work. I don't think the scheduler was "worried about deadlines". If it was, they better get to work on the definition of "worried".

Unless the know of this problem I don't see it making it into the next version. I won't fix the dishwasher unless my family tells me it's broke. They also need to fix the problem that lets PPAH and LHC continue to download even with HIGH negative LTD.
ID: 112317 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 112318 - Posted: 18 May 2005, 3:24:04 UTC - in response to Message 112314.  
Last modified: 18 May 2005, 3:25:09 UTC

If a project has a huge negative LT debt, it must have been doing some extra crunching recently (-70,000 seconds is a bit less than 20 hours).

Uhuh... 3 Sshh units of 9 hours each (on my PC), 2.5 of them that it had to run in the past 24 hours in deadline mode... Because BOINC doesn't follow seperate preferences anymore. So it didn't run the units for 2 hours against 1 hour request for all the other projects.

But okay, blame me. Blame my setting for 0.2 days, so I get 3 units for Sshh and 2 units for every other project that has 8 hour units... I'll admit, it's me!

ID: 112318 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 112331 - Posted: 18 May 2005, 3:53:35 UTC

When a project is processing, its LT and ST debts will be reduced. When it is not processing, its LT and ST will be increased. ST debt is only calculated for projects that have work to be processed on the computer. When the host enters process earliest first mode, the LT and ST debt of the project running will be reduced, and all others will be increased. The time will be made up later by intentionally not allowing download of new work - so resource share balancing is deferred, but not dropped.

4.42 is supposed to fill the cache from any project with a positive LT debt until the cache is at or above min cache for all projects. The problem is if all projects with a positive debt are not handing out work.

Resetting a project will move its debt closer to 0. (It actually is set to 0, and then the LT debts are shifted so that the mean is 0 again - moving the debt for the just reset project away from 0).

As to the reason for going into earliest deadline first if a deadline is within 2 * the min queue. If a result is due within the current min queue time, it is going to be reported late. If it is within 2 * the min queue, it can be reported on time only if it is completed by the the next connection.

The resource sharing will balance better if the min queue setting is about the same as the actuall duration between connections.

The only time that a dial up user should have trouble is if all of the projects with positive debt are not handing out work at the moment. Otherwise, (s)he should be able to fill the cache from the projects with positive debt. I am still thinking about how to fix the problem with modem users and no positive debt projects with work without giving up on resource shares.


BOINC WIKI
ID: 112331 · Report as offensive
Profile Contact
Volunteer tester
Avatar

Send message
Joined: 16 Jan 00
Posts: 197
Credit: 2,249,004
RAC: 0
Canada
Message 112343 - Posted: 18 May 2005, 4:20:01 UTC - in response to Message 112331.  

I am still thinking about how to fix the problem with modem users and no positive debt projects with work without giving up on resource shares.

Well sir. Thanks for still thinking. Thanks for your previous thinking.
Your contribution to the intelligence of BOINC is large.
Thanks also for your forum input. Keep it up!

ID: 112343 · Report as offensive
Profile The Gas Giant
Volunteer tester
Avatar

Send message
Joined: 22 Nov 01
Posts: 1904
Credit: 2,646,654
RAC: 0
Australia
Message 112363 - Posted: 18 May 2005, 5:47:46 UTC

Just a couple of comments about the posts below;
----------
Ned Ludd wrote:

"The problem is that under the old scheduler, you could easily pass deadlines, especially with projects like Einstein where they have a longish work unit and a short deadline."

I have never missed a deadline on any project (unless I have had a computer problem and it has been down for a period of time), so to me the old schedular worked sufficiently well IF the cache size was kept to about a maximum of half the deadline time. So since I run Predictor (as well as LHC and SAH on this machine) I like to set my cache at 4 days. This also works well for weekend outages and since they overestimate their completion times I really only get 3 days of work anyway.

I run Einstein, LHC and Predictor on another machine. And yes with changing resource share between the projects I got close to a deadline with Einstein (really only because they underestimate the wu completion time and I had too much work - but it would have been fine if I hadn't altered the resource share).

-----------------------
John McCleod wrote:

"Nearest deadline CPU mode occurs if:
1) A result is due within 24 hours.
or
2) A result is due within 2* the queue size.
or
3) Arrange the results in deadline order, start adding the remaining processing times. If at any time the remaining processing time for the WUs with earlier deadlines is greater than 80% of the time to that deadline (over committed)."

Now I see my specific problem, I set my cache size to 4 days. Since I run predictor it ends up violating #2 and BOINC goes into deadline mode. This is specifically a problem with projects that have shorter deadlines.

So all up, if a someone is attached to both a short deadline project and a longer deadline project, these constraints set the cache size that BOINC will be then able to effectively allocate resource sharing that the user has set up and keep a cache of wu's from each project. Otherwise it will always be in deadline mode and then LT debt mode and the shorter deadline project will crunch first then not get any more wu's until the LT debt of the longer term project has been satisfied.

Wow, convoluted and not very usable. Maybe #2 can be shortened to 1.1 to 1.5 times the queue length to make it more usable for now and fine tune it later.

Development, gotta love it!

Live long and crunch!

Paul.
ID: 112363 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 112369 - Posted: 18 May 2005, 6:18:15 UTC - in response to Message 112363.  
Last modified: 18 May 2005, 6:18:53 UTC

Just a couple of comments about the posts below;
----------
Ned Ludd wrote:

"The problem is that under the old scheduler, you could easily pass deadlines, especially with projects like Einstein where they have a longish work unit and a short deadline."

I have never missed a deadline on any project (unless I have had a computer problem and it has been down for a period of time), so to me the old schedular worked sufficiently well IF the cache size was kept to about a maximum of half the deadline time. So since I run Predictor (as well as LHC and SAH on this machine) I like to set my cache at 4 days. This also works well for weekend outages and since they overestimate their completion times I really only get 3 days of work anyway.

Things would be a whole lot easier if everyone studied and set things up reasonably -- your settings are fine.

I remember a post (quite a while ago) where the user was having trouble getting and reporting work, so they kept raising what is now "connect every 'x' days" and I think according to his last post had it near 100 days.

That was about the time we got the 10 day limit.

It is entirely possible for someone to crunch Einstein and set BOINC to connect every 10 days. The new scheduler handles that.

If you crunch more than one project, I don't think there is much difference between connect every 4 days and connect every 1/4 days. The long-term-debt mechanism works well with a short cache.

ID: 112369 · Report as offensive
Profile Daykay
Avatar

Send message
Joined: 18 Dec 00
Posts: 647
Credit: 739,559
RAC: 0
Australia
Message 112379 - Posted: 18 May 2005, 8:05:29 UTC
Last modified: 18 May 2005, 8:10:00 UTC

I'd have to say with the new developments in scheduling, I have been lowering my "Connect every x days" preference in an attempt to avoid so much time spent in Panic mode.

At this time I have gone from connecting every 3 days down to 1 day. This has still failed to fix my problems with the new scheduler.

One of my projects (Shh) has 33% resource share, and the WU's have very short deadlines and each WU takes around 8 hours to complete. Thus I am often in Panic mode.

Since I have changed to a more frequent connection time BOINC holds less WU's in cache.

Pirates also has a 33% resource share and is frequently requesting more work (even though I am running in Panic mode).

CPDN, for which I now have 2 WU's, is the next most highly resourced project with a share of 10%. These WU's manage to get some CPU cycles after I have reported a batch of my Shh WU's. It might even be as much as 50% over the last 140 hours, though probably slightly less.

Einstein, LHC and SETI with resource shares of 7.67%, 7.33% and 8.33% respectively, have not downloaded any new WU's (as far as I can tell) during the last 140 hours (approx).

So to sum up it looks like Shh is getting slightly more than it should. CPDN is getting much more than it should. Einstein, LHC and SETI are getting ripped off. Pirates is behaving as I would like, though I'm not sure it should be asking for work while the client is in Panic mode.

(edit)
Oh and some further information, for the record, I have an always on adsl connection so I care not how often it connects to the servers.
Kolch - Crunching for the BOINC@Australia team since July 2004.
Search for your own intelligence...
ID: 112379 · Report as offensive
Profile ksnash

Send message
Joined: 28 Nov 99
Posts: 402
Credit: 528,725
RAC: 0
United States
Message 112381 - Posted: 18 May 2005, 8:17:13 UTC

Well I guess I will be crunching on einstein until Setiathome goes into deadline mode. I have gained back the two days for einstein but nothing else is running.
ID: 112381 · Report as offensive
Anthony Brixey
Avatar

Send message
Joined: 24 Jun 00
Posts: 102
Credit: 1,757,916
RAC: 0
United Kingdom
Message 112397 - Posted: 18 May 2005, 10:39:55 UTC - in response to Message 112331.  
Last modified: 18 May 2005, 10:40:16 UTC

"The resource sharing will balance better if the min queue setting is about the same as the actuall duration between connections."

Boinc is trying to do two things with one number. It would be better to have a ‘Cache size’ as well as a ‘Connect every’ setting.

Anthony
ID: 112397 · Report as offensive
Profile Martin P.

Send message
Joined: 19 May 99
Posts: 294
Credit: 27,230,961
RAC: 2
Austria
Message 112401 - Posted: 18 May 2005, 10:51:04 UTC - in response to Message 112194.  
Last modified: 18 May 2005, 11:26:20 UTC

I agree on all counts. Since 4.35 I've been asking (on the forums) if resource share was respected in "panic" mode, but have NEVER got an answer. I see Rom and JM7 respond to questions before and after mine. They leave mine alone. I'm wondering if it's because of the uproar the real answer would cause.
tony

Actually, Tony, you can answer this yourself by observation.

In panic mode, resource share is not respected -- it is after a "panic" to finish work that may run past deadline.

When the panic is over, the other projects (which got "locked out") have accumulated debt, and that debt gets paid back.

Resource shares are part of the "debt accumulation" so if a project gets a big boost from "panic mode" then BOINC won't even download another WU until the other projects get their share.


Ned,

but this is exactly the problem, especially with SETI@Home and Einstein@Home that have different deadlines (2 weeks vs. 1 weeek). My client finished a work unit for E@H correctly and also reported it. Then it started with SETI@Home which has a deadline of June 1st in "Panic-mode". With this scheduling scheme it will never again download any E@H WUs since the only WUs it has are SETI@Home and these are ofcourse always earlier than none.


The Panic mode should only be activated if the deadline comes within twice the time estimated to finish a WU.


ID: 112401 · Report as offensive
Profile Scottatron

Send message
Joined: 15 Jul 03
Posts: 94
Credit: 220,389
RAC: 0
Australia
Message 112404 - Posted: 18 May 2005, 11:53:33 UTC

Well, I have once again downgraded, due to a machine finishing its last PP@H WU and not requesting any more.

The machine had around 5 SETI WUs left, and on that machine enough to keep it busy for around 11 hours.

So, with all this talk of debt accumulating etc, how can debt not be at its maximum when the project runs out of work?
ID: 112404 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 112413 - Posted: 18 May 2005, 12:38:11 UTC - in response to Message 112404.  

Well, I have once again downgraded, due to a machine finishing its last PP@H WU and not requesting any more.

The machine had around 5 SETI WUs left, and on that machine enough to keep it busy for around 11 hours.

So, with all this talk of debt accumulating etc, how can debt not be at its maximum when the project runs out of work?

The instant a project runs out of work, it is likely to have near its minimum debt as it has just finished using the CPU. If it had a large positive LT debt to begin with, it may still have a positive LT debt. If, on the otherhand, it had a low LT debt, the LT debt is now likely to be negative. Leave it alone, and it should recover.


BOINC WIKI
ID: 112413 · Report as offensive
Profile MrMaxx
Avatar

Send message
Joined: 22 Apr 99
Posts: 135
Credit: 1,645,913
RAC: 1
United States
Message 112425 - Posted: 18 May 2005, 13:52:02 UTC
Last modified: 18 May 2005, 13:54:55 UTC

I've been running 4.41 for a few days because I wanted the "no new work" feature. Unfortunately, when that feature was enabled, it wouldn't upload any results either... is that a bug or is that by design? If it's a bug, has that been fixed in the latest Dev release?
Thanks.
(later)
Just downloaded and upgraded to 4.42. It appears that this WAS a bug and that it was fixed in this version... Thanks to the programmers for fixing it! I needed the "no new work" feature, but also wanted to be able to report my completed WUs! :-)
ID: 112425 · Report as offensive
Profile Captain Avatar
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 15133
Credit: 529,088
RAC: 0
United States
Message 112432 - Posted: 18 May 2005, 14:03:50 UTC - in response to Message 112425.  

I've been running 4.41 for a few days because I wanted the "no new work" feature. Unfortunately, when that feature was enabled, it wouldn't upload any results either... is that a bug or is that by design? If it's a bug, has that been fixed in the latest Dev release?
Thanks.



Not a bug, Under 4.41 it wouldn't upload until 24hrs before it’s due.
I believe 4.42 that is corrected.....

Same old disclaimer: Running development versions you run at your own risk etc..

When you down load the Devals look at the list of new features and Issues....

Always download the most current version before posting a problem
If it still is a problem then post the question.

Always do a clean install. I.E. Under windows Add and remove programs..
This will not affect the data files and you won't loose work....



ID: 112432 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 112433 - Posted: 18 May 2005, 14:10:49 UTC - in response to Message 112432.  

I've been running 4.41 for a few days because I wanted the "no new work" feature. Unfortunately, when that feature was enabled, it wouldn't upload any results either... is that a bug or is that by design? If it's a bug, has that been fixed in the latest Dev release?
Thanks.



Not a bug, Under 4.41 it wouldn't upload until 24hrs before it’s due.
I believe 4.42 that is corrected.....

Same old disclaimer: Running development versions you run at your own risk etc..

When you down load the Devals look at the list of new features and Issues....

Always download the most current version before posting a problem
If it still is a problem then post the question.

Always do a clean install. I.E. Under windows Add and remove programs..
This will not affect the data files and you won't loose work....



It was a bug alright. Under the wrong circumstances, it would not contact a scheduler at all.


BOINC WIKI
ID: 112433 · Report as offensive
Profile Captain Avatar
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 15133
Credit: 529,088
RAC: 0
United States
Message 112434 - Posted: 18 May 2005, 14:14:04 UTC - in response to Message 112433.  



[/quote]
It was a bug alright. Under the wrong circumstances, it would not contact a scheduler at all.[/quote]

I stand corrected it was a bug,,,,

Hi John!



ID: 112434 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : 4.42 has been posted.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.