BOINC doesn't get new work properly???


log in

Advanced search

Message boards : Number crunching : BOINC doesn't get new work properly???

1 · 2 · Next
Author Message
jravin
Send message
Joined: 25 Mar 02
Posts: 994
Credit: 106,612,157
RAC: 91,809
United States
Message 914984 - Posted: 6 Jul 2009, 22:31:02 UTC

I have a machine with 8 cores. It is set as SETI 700 share and Einstein 100 share (I want to work mostly on SETI). E tasks take about 12 hours each on my machine. I had nothing queued at all, and I set my local cache at 2 days. I expected to get 4 or 5 WUs from E (4x12 = 48Hrs = 2days, right?), but instead E d/l about 30 WUs (2 days for EACH of the 8 cores).
What is that all about? It seems like a bug to me; shouldn't BOINC factor in the shares when deciding how much work to d/l for the cache of a given project?
Especially since the SETI WUs I also got are not being run because the E WUs are on a short leash (due July 20).
Is this the way it is supposed to work? Seems wrong, somehow... I would like to understand this bug? feature?
Thanks for any explanations!
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 914985 - Posted: 6 Jul 2009, 22:34:49 UTC - in response to Message 914984.

I have a machine with 8 cores. It is set as SETI 700 share and Einstein 100 share (I want to work mostly on SETI). E tasks take about 12 hours each on my machine. I had nothing queued at all, and I set my local cache at 2 days. I expected to get 4 or 5 WUs from E (4x12 = 48Hrs = 2days, right?), but instead E d/l about 30 WUs (2 days for EACH of the 8 cores).
What is that all about? It seems like a bug to me; shouldn't BOINC factor in the shares when deciding how much work to d/l for the cache of a given project?
Especially since the SETI WUs I also got are not being run because the E WUs are on a short leash (due July 20).
Is this the way it is supposed to work? Seems wrong, somehow... I would like to understand this bug? feature?
Thanks for any explanations!

SETI is having trouble delivering work right now.

So, your machine gets Einstein, and accumulates "debt" so that it knows how much extra Einstein it did while SETI was effectively down.

BOINC will do the work it has in such a way that it does not miss deadlines.

Then, at some time in the future, it will do only SETI until the "debt" is paid back.
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 994
Credit: 106,612,157
RAC: 91,809
United States
Message 915004 - Posted: 6 Jul 2009, 23:05:04 UTC - in response to Message 914985.

But I did get some work from SETI at the same time I got the Einstein work; about 20WU averaging about 1 hour each. None of them have been run, and won't be, if you are right. Why did it over d/l for E? Can't it do so later if there is no SETI work?
Why don't my explicit desires for work shares count? And why can't it TELL me that it wants to do other than I ask?

If you ask me, this sucks.

It means that how I want to share my machine is being simply ignored. A 2 day cache for E is 4 WUS at my desired share. If he runs them faster because there is no work for SETI, then he should d/l more to maintain the 4 WU cache, not simply flood my machine with 30 E WUS.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 531,285
RAC: 347
United States
Message 915020 - Posted: 6 Jul 2009, 23:26:46 UTC - in response to Message 915004.

But I did get some work from SETI at the same time I got the Einstein work; about 20WU averaging about 1 hour each. None of them have been run, and won't be, if you are right. Why did it over d/l for E? Can't it do so later if there is no SETI work?
Why don't my explicit desires for work shares count? And why can't it TELL me that it wants to do other than I ask?

If you ask me, this sucks.

It means that how I want to share my machine is being simply ignored. A 2 day cache for E is 4 WUS at my desired share. If he runs them faster because there is no work for SETI, then he should d/l more to maintain the 4 WU cache, not simply flood my machine with 30 E WUS.

How you want to share your machine is only being ignored short term. Over the long term, the CPU time used should approximately balance out.

Also, because of the large amount of Einstein work downloaded, the CPU scheduler is running in Earliest Deadline First mode in order to ensure that it all gets done on time. Einstein will pay back the CPU time later by not downloading work.

What is the balance you have between "Connect Every X" and "Extra work"? Connect every X should be set to approximate the actual duration between connections.
____________


BOINC WIKI

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 915050 - Posted: 6 Jul 2009, 23:55:27 UTC - in response to Message 915004.

If you ask me, this sucks.

I didn't ask you. You asked a question, and I answered you.

Let me ask you a question:

Which would (to use your terminology) suck more:

1) Not having the exact proportion of work in your queue at all times.

2) Having idle cores because one of your projects was out of work.

If you answered #2, then you want BOINC to do what it's doing.

If you look in the client_state.xml file, you will see that BOINC knows how much extra Einstein it has done -- and it knows that it will need to do more SETI to come back to your desired resource share.

Not only does this work when a project is having trouble, but it means that you can have 7 projects divide equally on your eight cores, or you can ask for one project to get 90% of your processing and spread the remaining 10% across the remainder and it all works.

____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 531,285
RAC: 347
United States
Message 915060 - Posted: 7 Jul 2009, 0:05:08 UTC - in response to Message 915050.

If you ask me, this sucks.

I didn't ask you. You asked a question, and I answered you.

Let me ask you a question:

Which would (to use your terminology) suck more:

1) Not having the exact proportion of work in your queue at all times.

2) Having idle cores because one of your projects was out of work.

If you answered #2, then you want BOINC to do what it's doing.

If you look in the client_state.xml file, you will see that BOINC knows how much extra Einstein it has done -- and it knows that it will need to do more SETI to come back to your desired resource share.

Not only does this work when a project is having trouble, but it means that you can have 7 projects divide equally on your eight cores, or you can ask for one project to get 90% of your processing and spread the remaining 10% across the remainder and it all works.

Or, about 30% on one project and the remaining 70% spread across 60 or so projects...
____________


BOINC WIKI

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 915073 - Posted: 7 Jul 2009, 0:19:31 UTC - in response to Message 915060.

If you ask me, this sucks.

I didn't ask you. You asked a question, and I answered you.

Let me ask you a question:

Which would (to use your terminology) suck more:

1) Not having the exact proportion of work in your queue at all times.

2) Having idle cores because one of your projects was out of work.

If you answered #2, then you want BOINC to do what it's doing.

If you look in the client_state.xml file, you will see that BOINC knows how much extra Einstein it has done -- and it knows that it will need to do more SETI to come back to your desired resource share.

Not only does this work when a project is having trouble, but it means that you can have 7 projects divide equally on your eight cores, or you can ask for one project to get 90% of your processing and spread the remaining 10% across the remainder and it all works.

Or, about 30% on one project and the remaining 70% spread across 60 or so projects...

... or any of an almost infinite number of combinations that don't exactly have a 1:1 ratio between projects and cores.
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 994
Credit: 106,612,157
RAC: 91,809
United States
Message 915096 - Posted: 7 Jul 2009, 0:41:17 UTC

John and Ned:

All I am asking for is that it keep user preferences in mind when loading up caches. In my case, as I stated in my 2nd post above, d/l 4 E WUs ( a 2 day cache by my desires), then do it again if it goes through faster (because in this case, SETI is having problems). That way, whatever SETI I get is not blocked by Einstein BECAUSE MY MACHINE IS OVERLOADED by incorrect (by my lights) amounts of work d/l for E and therefore in semi-permanent EDF. In such case, the workload would not have nearly as much "debt" over time between projects.

Obviously, it can't be perfect. For example, 1 Climate Prediction WU is many SETI WUs in duration. But it could (and should, IMO) be better than it currently is.
____________

Ianab
Volunteer tester
Send message
Joined: 11 Jun 08
Posts: 678
Credit: 12,776,263
RAC: 2,076
New Zealand
Message 915114 - Posted: 7 Jul 2009, 0:55:20 UTC - in response to Message 915096.

It IS keeping your user preference in mind.

You told it to cache 2 days work. Thats what it tries to do.

It went to SETI to get some work units, but there were none.

So it filled up the 2 day cache with what was available.

Next week there may be plenty of SETI workunits availble, and it will lay off the Einstien ones for a while to balance things out.

The system is just giving priority to the cache setting before the project balance.

You may also get a situation where it cant maintain the balance exactly as you wish, because one project cant deliver work regularly. Thats just the way things are. If it lets the cache run down and your machine goes idle to maintain the ratio you would be even more peeved right?

Ian

jravin
Send message
Joined: 25 Mar 02
Posts: 994
Credit: 106,612,157
RAC: 91,809
United States
Message 915130 - Posted: 7 Jul 2009, 1:14:11 UTC - in response to Message 915114.



The system is just giving priority to the cache setting before the project balance.

You may also get a situation where it cant maintain the balance exactly as you wish, because one project cant deliver work regularly. Thats just the way things are. If it lets the cache run down and your machine goes idle to maintain the ratio you would be even more peeved right?

Ian


Well, because of the current SETI probs, that's why I went to Einstein in the first place. But by ignoring my desired shares, the caching has overloaded my machine with Einstein and SETI can't even run... Does it help if I have to DETACH from E to get SETI to run in the next week or so? That causes more overall project problems (I think) than simply using my desired shares in d/l work in the first place. Certainly, if one project is temporarily dead, d/l more for the live one. But only by (cache x share). You can always do that again, when needed, as I already mentioned above. By not carrying (n) days cache for the entire machine for each project, instead of (n x share) days, you overload the d/l process for some projects, do you not?
____________

Ingleside
Volunteer developer
Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 4,339,814
RAC: 25
Norway
Message 915139 - Posted: 7 Jul 2009, 1:36:06 UTC - in response to Message 915130.
Last modified: 7 Jul 2009, 1:37:30 UTC

Well, because of the current SETI probs, that's why I went to Einstein in the first place. But by ignoring my desired shares, the caching has overloaded my machine with Einstein and SETI can't even run... Does it help if I have to DETACH from E to get SETI to run in the next week or so? That causes more overall project problems (I think) than simply using my desired shares in d/l work in the first place. Certainly, if one project is temporarily dead, d/l more for the live one. But only by (cache x share). You can always do that again, when needed, as I already mentioned above. By not carrying (n) days cache for the entire machine for each project, instead of (n x share) days, you overload the d/l process for some projects, do you not?

The computer is newly-attached to Einstein, right?

If so, Einstein@home starts with zero long-term-dept, and if SETI@home was the only other attached project, it also has zero long-term-dept. In this scenario, BOINC-client fills-up with work from all projects with zero long-term-dept, until your cache-settings is fullfilled. Since SETI@home didn't give you any, Einstein@home filled-up the whole 2-day cache.

Then your computer started crunching Einstein@home-work, fairly quickly the long-term-dept for Einstein dropped below zero. In this scenario, atleast my experience with BOINC v6.6.xx is, the BOINC-client will keep "Connect about every N days" from whatever project, but the "additional days" won't be filled with projects with negative long-term-dept, except if idle cpu...

So, appart from the intitial attach to a new project getting a full cache, it should work more or less like you wants it to do, with normally only getting more Einstein@home-work in case of idle cpu, if "Connect about every N days" is set to zero...
____________
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

Ianab
Volunteer tester
Send message
Joined: 11 Jun 08
Posts: 678
Credit: 12,776,263
RAC: 2,076
New Zealand
Message 915153 - Posted: 7 Jul 2009, 1:57:42 UTC - in response to Message 915130.

Ahh.. so Einstien has downloaded WAY too much work initially?

That will be a problem with their project giving you a wrong correction factor. SETI generally underestimates a machines performance, and the cache wont fill up properly untill some work units have been returned and things find their natural level. Others err the other way and send out more work then expected.

Either way your CPU should still be crunching, even if it's got a weeks unwanted Einstien cached.

As there is problems getting SETI work right now you may as well let it run. It shouldn't be affecting anything, and the worst that could happen is some Einstien dont get completed before deadline and get abandoned.

You can set the Einstien for no new work, and let it crunch for a few days and manually report the work back. That way you dont get any more sent to you. Once the correction factor gets more accurate it will work more like you wish. Hopefully by then SETI will have more work units and things will run as you wish.

If you still have a heap of Einstien units left, you can abort them, manually report them and they just get issued to someone else, no hassle. Detaching from the project will just set you back to square one. So I think you have to return some Einstien units to fix your problem.

Ian

jravin
Send message
Joined: 25 Mar 02
Posts: 994
Credit: 106,612,157
RAC: 91,809
United States
Message 915175 - Posted: 7 Jul 2009, 2:41:29 UTC - in response to Message 915153.

Ian:

Thanks for the suggestions. I will manually abort the WUs rather than detach, so others can get them quickly. I hate when I have no quick wingman for a WU - I have an AP WU from May 13 still pending because it was given at least once to someone who was apparently using a coal powered Pentium -1 to process SETI...
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 915191 - Posted: 7 Jul 2009, 3:28:33 UTC - in response to Message 915096.

John and Ned:

All I am asking for is that it keep user preferences in mind when loading up caches. In my case, as I stated in my 2nd post above, d/l 4 E WUs ( a 2 day cache by my desires), then do it again if it goes through faster (because in this case, SETI is having problems). That way, whatever SETI I get is not blocked by Einstein BECAUSE MY MACHINE IS OVERLOADED by incorrect (by my lights) amounts of work d/l for E and therefore in semi-permanent EDF. In such case, the workload would not have nearly as much "debt" over time between projects.

Obviously, it can't be perfect. For example, 1 Climate Prediction WU is many SETI WUs in duration. But it could (and should, IMO) be better than it currently is.

... and we're telling you that it is doing exactly what you want.
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 915192 - Posted: 7 Jul 2009, 3:30:27 UTC - in response to Message 915130.



The system is just giving priority to the cache setting before the project balance.

You may also get a situation where it cant maintain the balance exactly as you wish, because one project cant deliver work regularly. Thats just the way things are. If it lets the cache run down and your machine goes idle to maintain the ratio you would be even more peeved right?

Ian


Well, because of the current SETI probs, that's why I went to Einstein in the first place. But by ignoring my desired shares, the caching has overloaded my machine with Einstein and SETI can't even run... Does it help if I have to DETACH from E to get SETI to run in the next week or so? That causes more overall project problems (I think) than simply using my desired shares in d/l work in the first place. Certainly, if one project is temporarily dead, d/l more for the live one. But only by (cache x share). You can always do that again, when needed, as I already mentioned above. By not carrying (n) days cache for the entire machine for each project, instead of (n x share) days, you overload the d/l process for some projects, do you not?

Even if it has too much Einstein, based on incorrect estimates, BOINC will detect that, and will do more SETI later to make up for it.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,509,439
RAC: 40,763
Australia
Message 915232 - Posted: 7 Jul 2009, 6:00:25 UTC - in response to Message 915192.
Last modified: 7 Jul 2009, 6:02:41 UTC

If you run more than one project, and want your project shares to be honoured in the shorter term, then don't run with a cache.
Any cache when running more than one project will mean it will take a much longer time for project shares to be met.
____________
Grant
Darwin NT.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12476
Credit: 2,697,025
RAC: 1,440
Netherlands
Message 915261 - Posted: 7 Jul 2009, 10:05:45 UTC

What no one has explained yet, is that BOINC will learn how long tasks take. You attached that machine to Einstein on the 3rd of July, it takes a couple of tasks before BOINC has a good estimate of how long these tasks run, and then for one kind of tasks only.

At both Einstein and Seti there are two applications, two kinds of tasks, both of them with variable run-time. So, for example, doing a run of Seti Enhanced short tasks will teach BOINC that these are short. Then when you get a couple of long(er) tasks in, the estimated time to completion is off, as these tasks will just take longer than the shorties did.

So at that time you will get more in then you may have wanted, but if you leave BOINC run unhampered, without intrusions, it'll learn about these longer tasks again and not ask as many the next time it asks for work. Get an Astropulse in and the whole cycle starts from the beginning...

The same is at Einstein, where there are the normal tasks and the Arecibo Binary tasks, both with their own run time. Attaching newly to a project starts you off with no estimates whatsoever, other than those given by the project. Those estimates are taken broadly, as there are a lot of machines out there and none is the same as the next. It will also start you off with no debt, so it can happen that the project you were attached to already will do its work last, as it has the highest debt.

That depends on the deadlines, though. Each project has its own deadlines for tasks. When Seti has shorter deadlines than Einstein has, the Seti work will run first. Especially if you run a large cache.

As I said, BOINC will learn from its experience. It will keep track of how long tasks take and set the so called task duration correction factor (TDCF) according to the estimate that comes with the task and the time the last tasks took. With this information it can estimate more correctly how long tasks of the same range take.

Each project has only one TDCF, though, so when a project has two applications with varying task lengths, it has to switch that one TDCF between them. This may give problems in the long run. A way around it is to set your project preferences to accept work for one application only.

2 days cache is normally not considered a large cache... Unless you first start with a project when BOINC hasn't got a clue about what it should expect, are the estimates given by the project anything correctly, etc.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3461
Credit: 2,216,857
RAC: 1,088
Canada
Message 915293 - Posted: 7 Jul 2009, 12:04:18 UTC
Last modified: 7 Jul 2009, 12:08:44 UTC

JRavin, another way to look at what people are telling you is that the workshare will eventually sort out, but it may take weeks or even months.

In my case, I often take one of my crunchers with me on road trips, with irregular connections and crunching hours per day. In the last few months it seemed every time I could connect, SETI was having some sort of problem. My SETI RAC went way down, 700 to 300, but my backup projects RAC went up. I've been at home 2 whole weeks now, and even with the current upload/download problems my SETI cue has filled up. I'm now crunching just about full time SETI, my backup projects are running at way under their workshare, and not requesting new work.

If I look at the actual work split between projects over the last 48 hours, it looks nothing like what I set in my preferences. But if I look at the split over the last 6 months, it is pretty much spot on.

Edit - Another example of what this means: my one remaining Rosetta WU just went into High Priority mode. Because of the increased SETI crunching in the last few days, this WU was ignored until its deadline got too close. Now nothing will download new tasks until this WU finishes. Again, workshares and download shares get out of whack, but only until this WU finishes (probably today). Then things will gradually drift back to "normal"
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 994
Credit: 106,612,157
RAC: 91,809
United States
Message 915442 - Posted: 7 Jul 2009, 22:41:23 UTC

Thank you all for your help and suggestions (even snarky Ned).

What it all boils down to is that "that's the way it is" and that's that. Too bad (that's why it's called Software ENGINEERING - you make incremental changes as time goes on to correct (IMO) bugs - or unintended features).

I've only been a programmer since 1964 or so - so what do I know? But I will tell you that ego shouldn't be involved in programming decisions. ("That's the way it is" isn't a fact, just ego sticking out).

But I certainly understand BOINC better now for this thread, so, again, thanks.
____________

Profile alephnull
Volunteer tester
Send message
Joined: 16 Mar 03
Posts: 119
Credit: 162,003,725
RAC: 325
United States
Message 915455 - Posted: 7 Jul 2009, 23:02:31 UTC - in response to Message 915442.

Thank you all for your help and suggestions (even snarky Ned).

What it all boils down to is that "that's the way it is" and that's that. Too bad (that's why it's called Software ENGINEERING - you make incremental changes as time goes on to correct (IMO) bugs - or unintended features).

I've only been a programmer since 1964 or so - so what do I know? But I will tell you that ego shouldn't be involved in programming decisions. ("That's the way it is" isn't a fact, just ego sticking out).

But I certainly understand BOINC better now for this thread, so, again, thanks.


It doesn't sound to me like any of the items mentioned here are bugs or unintended features. If I understand all this correctly, BOINC is working as intended, its more a matter of temporary short-term imbalance due to insufficient work from one (or possibly several) projects and will even out as time goes on.

If all projects were sending out work in a perfect world, then the time allocation per project would be right-on I would imagine. Is this the case? If so, then this is a matter of some difficulties with one project (SETI in particular in this case) and will be resolved once those issues are fixed; therefore, this is not a bug nor is it an unintended feature. In so far as my computer is currently doing work, I can reasonably expect that SETI will "catch-up" later when all is well right? If so, I'm happy with all that.
____________

1 · 2 · Next

Message boards : Number crunching : BOINC doesn't get new work properly???

Copyright © 2014 University of California