BOINC 4.45 Overcommits Itself.

Message boards : Number crunching : BOINC 4.45 Overcommits Itself.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Martin Johnson

Send message
Joined: 9 Jun 01
Posts: 201
Credit: 224,995
RAC: 0
United Kingdom
Message 138684 - Posted: 19 Jul 2005, 0:46:59 UTC
Last modified: 19 Jul 2005, 1:04:53 UTC

Sorry if this has been discussed before. Did a keyword search for 4.45 and it found nothing.
I switched from 4.19 to 4.45 2 days ago. I had one unit each of Climate, Einstein, Protein and Seti in progress. I use dial-up connection, so only allow internet access twice a day. This new version said it was overcommitted till it had finished the Einstein unit, then went back to "round robin" processing.
After uploading, it downloaded one Einstein, said it was overcommitted, and requested 0 seconds for the others.
I have looked at the John McLeod VII formula. One Eisntein unit of 8 hours estimated processing time with a deadline a week away produces the result 0.05, and the Climate, 420 hours to go before next March, 0.08, which are way, way smaller than the 0.8 McLeod says is necessary to produce the overcommitted state.
So what is happenong?
ID: 138684 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 138692 - Posted: 19 Jul 2005, 1:01:54 UTC
Last modified: 19 Jul 2005, 1:03:02 UTC

Sorry to be so pessimistic, but give up. the boinc guys really don't care or pay attention here, despite what some pea brains on this project seem to think. Look at the absolute flop the past week has been in the minds of the client volunteers. A planned outage that started before my vacation has led to chaos for the last week, without published resolution.
May this Farce be with You
ID: 138692 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 138701 - Posted: 19 Jul 2005, 1:12:30 UTC - in response to Message 138684.  

I think what is happening is that the method to find out how much work to download was changed between v4.19 and v4.45. After a couple of days of working through the WU's already cached you should see the CC stay with the round-robin scheduling.

Jim

Sorry if this has been discussed before. Did a keyword search for 4.45 and it found nothing.
I switched from 4.19 to 4.45 2 days ago. I had one unit each of Climate, Einstein, Protein and Seti in progress. I use dial-up connection, so only allow internet access twice a day. This new version said it was overcommitted till it had finished the Einstein unit, then went back to "round robin" processing.
After uploading, it downloaded one Einstein, said it was overcommitted, and requested 0 seconds for the others.
I have looked at the John McLeod VII formula. One Eisntein unit of 8 hours estimated processing time with a deadline a week away produces the result 0.05, and the Climate, 420 hours to go before next March, 0.08, which are way, way smaller than the 0.8 McLeod says is necessary to produce the overcommitted state.
So what is happenong?


ID: 138701 · Report as offensive
Martin Johnson

Send message
Joined: 9 Jun 01
Posts: 201
Credit: 224,995
RAC: 0
United Kingdom
Message 138707 - Posted: 19 Jul 2005, 1:26:16 UTC

Well, I will wait and see. It does seem peculiar, though, and might put some people off continuing with Boinc.
ID: 138707 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 138708 - Posted: 19 Jul 2005, 1:28:56 UTC - in response to Message 138707.  

Peculiar? yes... but I'm glad you asked. At least we got one possible problem clarified.

I would ask that you report your finding in the next day or two. I'm interested to find out if I was correct.

Jim

Well, I will wait and see. It does seem peculiar, though, and might put some people off continuing with Boinc.


ID: 138708 · Report as offensive
Martin Johnson

Send message
Joined: 9 Jun 01
Posts: 201
Credit: 224,995
RAC: 0
United Kingdom
Message 138722 - Posted: 19 Jul 2005, 1:50:47 UTC

Things are happening - just a bit.
Afer processing the Einstein unit for 3 hours out of 8, it suddenly switched to round-robin doing Climate and back to over-committed doing Einstein three times in 4 seconds, and has now settled on the latter.
Yes, I will report back on progress.
ID: 138722 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 138724 - Posted: 19 Jul 2005, 1:54:49 UTC - in response to Message 138722.  
Last modified: 19 Jul 2005, 1:56:24 UTC

Another thing that has changed recently is the way that the ETC (estimated time to completion) is calculated. This too can make BOINC think it is over committed. This would also explain why it thinks it is ready for round robin one time and then thinks it is over committed the next time. The ETC could be in a state of change during the calculations.

I don't know if this includes the v4.45 or not. in fact, i'm not even sure if it is a BOINC issue or a project issue. I don't remember :( sorry. But, then end result is still the same, it can still make BOINC think it is overcommitted.

Jim

Things are happening - just a bit.
Afer processing the Einstein unit for 3 hours out of 8, it suddenly switched to round-robin doing Climate and back to over-committed doing Einstein three times in 4 seconds, and has now settled on the latter.
Yes, I will report back on progress.


ID: 138724 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 138728 - Posted: 19 Jul 2005, 1:58:24 UTC - in response to Message 138724.  

No, that feature is in CC4.7x only, as far as I know.
I think it's fair to think that it's just the adjustment to the new scheduler that is causing these strange things. Just don't change anything, but let Boinc work it out on itself.
ID: 138728 · Report as offensive
Profile RichaG
Volunteer tester
Avatar

Send message
Joined: 20 May 99
Posts: 1690
Credit: 19,287,294
RAC: 36
United States
Message 138752 - Posted: 19 Jul 2005, 2:26:05 UTC
Last modified: 19 Jul 2005, 2:26:25 UTC

ID: 138752 · Report as offensive
Martin Johnson

Send message
Joined: 9 Jun 01
Posts: 201
Credit: 224,995
RAC: 0
United Kingdom
Message 138761 - Posted: 19 Jul 2005, 2:37:10 UTC

No, but I have now, thanks. It all sounds a bit complicated. Where can I see these LTD figures? I have ny cache set to 0.75 days.
ID: 138761 · Report as offensive
eberndl
Avatar

Send message
Joined: 12 Oct 01
Posts: 539
Credit: 619,111
RAC: 3
Canada
Message 138764 - Posted: 19 Jul 2005, 2:39:35 UTC
Last modified: 19 Jul 2005, 2:43:47 UTC

One option is BoincDV... that's what I use. I believe that BOINCView also allows you to see the Long Term debt and Short term Debt (LTD & STD)
ID: 138764 · Report as offensive
Profile Keck_Komputers
Volunteer tester
Avatar

Send message
Joined: 4 Jul 99
Posts: 1575
Credit: 4,152,111
RAC: 1
United States
Message 138803 - Posted: 19 Jul 2005, 3:21:13 UTC

I think what is happening is that the way the time stats are kept was changed between 4.1x and 4.4x (not sure exactly when). In the earlier version active_frac was the percent time BOINC was running a project in total. In the later version active_frac is the percent of the time BOINC is on that it can run a project. And to get the same number as the older version active_frac must be multiplied by on_frac.
Some numeric examples:
The computer is on 12 hours a day, BOINC is always running.
Old active_frac = 0.5 on_frac = 0.5
new active_frac = 0.99 on_frac = 0.5
When multiplied the old produces 0.25 and the new produces ~0.5. So for awhile after upgrading this computer BOINC thinks it has half as much time to work on things as it really does.

The good news is it will correct itself you don't need to do anything.
BOINC WIKI

BOINCing since 2002/12/8
ID: 138803 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 138906 - Posted: 19 Jul 2005, 5:32:46 UTC

As John says it corrects itself over time. The 4.45 versions work a little different than what you are more accustom to with the 4.3x and lower. On my systems I saw interesting oddness for about a week. After that the BOINC Client settled into a "groove" and I have not had a problem with it keeping work.

ID: 138906 · Report as offensive
Martin Johnson

Send message
Joined: 9 Jun 01
Posts: 201
Credit: 224,995
RAC: 0
United Kingdom
Message 139424 - Posted: 20 Jul 2005, 3:33:23 UTC
Last modified: 20 Jul 2005, 3:34:40 UTC

The wierd wanderings of 4.45 part 2.
Last night's cliff-hanging state was continued after boot-up today.
Supposed to be running 4 projects, but it is convinced it is overcommitted with one Climate and one Einstein. Using earliest deadline, processed only the Einstein till it ran out, then switched to Climate for 30 mins. At last said it was allowing work fetch.
Did not ask Einstein, asked Protein but that was busy, asked SETI and got one unit. Immediately said it was overcommitted, but this time processed that and Climate alternately, although it made no mention of round-robin. Then, when SETI was 80% done, as it switched back to Climate, said it would allow work fetch.
At this stage, all STDs were zero except for Climate (+300), and LTDs were: Protein +25K, SETI + 4K, Climate -11K and Einstein -19K.
Requested and got one Protein unit, requested 0 from SETI (and uploaded all pending units!!!!), made no other requests.
Continued to process Climate for an odd time of 50 mins, switched to the new Protein unit for 10 secs, and switched to the dregs of the SETI unit, then immediately said no work fetch because overcommitted!
After 30 mins said will allow fetch, got a second Protein unit, but immediately said no more fetch because overcommitted.

Things are certainly happening, but are getting complicated, so perhaps I will log no more detailed reports, except perhaps a final one to say if I am happy with its progress.
ID: 139424 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 139425 - Posted: 20 Jul 2005, 3:39:26 UTC - in response to Message 139424.  

Ok.... you got me. It's beyond the very limited scope of my expertise.

Jim

The wierd wanderings of 4.45 part 2.
Last night's cliff-hanging state was continued after boot-up today.
Supposed to be running 4 projects, but it is convinced it is overcommitted with one Climate and one Einstein. Using earliest deadline, processed only the Einstein till it ran out, then switched to Climate for 30 mins. At last said it was allowing work fetch.
Did not ask Einstein, asked Protein but that was busy, asked SETI and got one unit. Immediately said it was overcommitted, but this time processed that and Climate alternately, although it made no mention of round-robin. Then, when SETI was 80% done, as it switched back to Climate, said it would allow work fetch.
At this stage, all STDs were zero except for Climate (+300), and LTDs were: Protein +25K, SETI + 4K, Climate -11K and Einstein -19K.
Requested and got one Protein unit, requested 0 from SETI (and uploaded all pending units!!!!), made no other requests.
Continued to process Climate for an odd time of 50 mins, switched to the new Protein unit for 10 secs, and switched to the dregs of the SETI unit, then immediately said no work fetch because overcommitted!
After 30 mins said will allow fetch, got a second Protein unit, but immediately said no more fetch because overcommitted.

Things are certainly happening, but are getting complicated, so perhaps I will log no more detailed reports, except perhaps a final one to say if I am happy with its progress.


ID: 139425 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 139524 - Posted: 20 Jul 2005, 10:47:27 UTC
Last modified: 20 Jul 2005, 11:01:28 UTC

Rubernicus
Could you please go to "your account", "General Preferences" and set your "Connect To" setting to 2 days instead of .75 and see how that works for you.

tony

[edit] Oh yeah,,,you'll need to do a project update on the project where you changed the preference. Watch your "messages" tab to make sure it has seen the preference update.
ID: 139524 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 139684 - Posted: 20 Jul 2005, 17:34:49 UTC - in response to Message 139524.  

Rubernicus
Could you please go to "your account", "General Preferences" and set your "Connect To" setting to 2 days instead of .75 and see how that works for you.

tony

[edit] Oh yeah,,,you'll need to do a project update on the project where you changed the preference. Watch your "messages" tab to make sure it has seen the preference update.

Increasing the connect to will not help.

The reason for the overcommittment is the time that the computer spends off. The more time it spends off, the less work that will fit in before deadlines.


BOINC WIKI
ID: 139684 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 139702 - Posted: 20 Jul 2005, 17:58:28 UTC - in response to Message 139684.  

Rubernicus
Could you please go to "your account", "General Preferences" and set your "Connect To" setting to 2 days instead of .75 and see how that works for you.

tony

[edit] Oh yeah,,,you'll need to do a project update on the project where you changed the preference. Watch your "messages" tab to make sure it has seen the preference update.

Increasing the connect to will not help.

The reason for the overcommittment is the time that the computer spends off. The more time it spends off, the less work that will fit in before deadlines.


This is why I keep my computer running 24/7, and only do any kind of reboot when I get some sort of update for my virus program or a windows update and it requires a reboot. Otherwise, that little hamster keeps on runnin.

Jeremy
ID: 139702 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 139854 - Posted: 20 Jul 2005, 22:34:59 UTC - in response to Message 139702.  
Last modified: 20 Jul 2005, 22:35:21 UTC

Imagine what that little hamster would look like. He would have a heart and lungs that are super super effcient. Legs of steel... shoot, he would be the BIONIC BOINC hamster!

(scarey thought)

Jim


This is why I keep my computer running 24/7, and only do any kind of reboot when I get some sort of update for my virus program or a windows update and it requires a reboot. Otherwise, that little hamster keeps on runnin.

Jeremy


ID: 139854 · Report as offensive
Martin Johnson

Send message
Joined: 9 Jun 01
Posts: 201
Credit: 224,995
RAC: 0
United Kingdom
Message 143189 - Posted: 26 Jul 2005, 19:21:28 UTC

I increased the "connect to" to 2 days and after more than a week there has been no change. I note that whenever it requests more work it asks for 172800 secs (= 2 days), but only ever gets one work unit.
So I grew impatient, and when it was allowing work fetch, I kept updating Protein Predictor, getting 1 unit at a time, till hey presto, the answer came up:
26.07.05 05-48-43|ProteinPredictorAtHome|Message from server: (won't finish in time) Computer on 31.0% of time, BOINC on 32.0% of that, this project gets 25.0% of that.
This is wrong, and is misleading the work calculation. This machine has been away for repair for four months, and this period is being included in the "off" time. The figures should read "Computer on 90% of time, BOINC on 100% of that, ...". How do I change these figues?
ID: 143189 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : BOINC 4.45 Overcommits Itself.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.