How does work fetch in 7.0.25 work?


log in

Advanced search

Message boards : Number crunching : How does work fetch in 7.0.25 work?

1 · 2 · Next
Author Message
Profile Karsten Vinding
Volunteer tester
Send message
Joined: 18 May 99
Posts: 140
Credit: 16,506,908
RAC: 2,981
Denmark
Message 1224868 - Posted: 29 Apr 2012, 12:43:37 UTC
Last modified: 29 Apr 2012, 12:44:58 UTC

I have been using 7.0.25 since it was released, and in general I have been pleased with it. Its more tidy in what work is running, and is better at finishing up jobs, instead of having a lot of tasks hanging in unfinished state (when running multiple projects).

For the last weeks I have been running Seti exclusively.
And on my non GPU computers it works perfectly, lots of Wu's cached, and the machines are allways crunching.

But on my two GPU machines not so well. Boinc normally keeps lots of Wu for the GPU's, but the CPU's often run dry.

Especially my fastest computer is having problems. Cache is set at 10/10 days, but still it often runs dry.

Right now I have 8 WU's crunching, and 2 in the cache. Enough for 2 hrs max. Reporting WU's back should make boinc fetch work (and lots of it, if it should last 10 days).
But boinc responds: Reporting 3 completed tasks, not requesting new tasks.

WHY?

I have been keeping an eye on it, and boinc will run the cache completely dry, leaving the CPU with nothing to do, or only work on a few of the CPU's, before it starts asking for work.

Sometimes its without CPU work for many hours.

It's not Wu's that are waiting to be downloaded that prevents fetching, my DL queue is empty.

If I enable another project with CPU crunching, it will start fetching immediatly for that project, and after this, start fetching for Seti, if I disable work fetch for the other project. But it only lasts for a short time, then nothing is fetched again.

This is very annoying behaviour, and it has been doing it for weeks. Letting things settle down, doesn't seem to help.

I think this is a serious bug in work fetch in 7.0.25. It should try to honour the cache settings for both GPU and CPU, when the systems crunches on both.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8459
Credit: 48,737,922
RAC: 83,036
United Kingdom
Message 1224873 - Posted: 29 Apr 2012, 12:55:42 UTC - in response to Message 1224868.

I recommend you enable, to start with, the

<sched_op_debug>

logging option described in client configuration. That will show you, in enough detail but without excess bloating, whether your issue is the client not requesting work, or the SETI server not supplying it.

Profile Karsten Vinding
Volunteer tester
Send message
Joined: 18 May 99
Posts: 140
Credit: 16,506,908
RAC: 2,981
Denmark
Message 1224878 - Posted: 29 Apr 2012, 13:05:25 UTC - in response to Message 1224873.
Last modified: 29 Apr 2012, 13:15:29 UTC

I have enabled that setting.

The issue is the client not asking for work.

Its not the "project has no work available" response I'm seeing, the client simply doesn't ask for any work.

I just asked boinc to report finished work. It's down to the last 8 wu's running, nothing in the CPU cache, lots in the GPU cache.

29-04-2012 15:05:48 | SETI@home | Reporting 4 completed tasks, not requesting new tasks
29-04-2012 15:05:48 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
29-04-2012 15:05:48 | SETI@home | [sched_op] ATI work request: 0.00 seconds; 0.00 devices
29-04-2012 15:06:02 | SETI@home | Scheduler request completed

In 15 minutes, two more Wu's will be finished, and the CPU will only be running on 6 out of eight cores. In 1h30m no Wu's will be crunching, except on GPU...
____________

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38850
Credit: 576,946,495
RAC: 523,438
United States
Message 1224881 - Posted: 29 Apr 2012, 13:13:54 UTC

It doesn't.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile Karsten Vinding
Volunteer tester
Send message
Joined: 18 May 99
Posts: 140
Credit: 16,506,908
RAC: 2,981
Denmark
Message 1224889 - Posted: 29 Apr 2012, 13:27:35 UTC
Last modified: 29 Apr 2012, 13:31:36 UTC

Now something is happening:

My CPU finished one of its Wu's, and boinc reacted on this.

29-04-2012 15:20:38 | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and ATI
29-04-2012 15:20:38 | SETI@home | [sched_op] CPU work request: 2744569.84 seconds; 0.00 devices
29-04-2012 15:20:38 | SETI@home | [sched_op] ATI work request: 82425.41 seconds; 0.00 devices
29-04-2012 15:20:42 | SETI@home | Scheduler request completed: got 7 new tasks

These 7 new tasks are all for the GPU, which has plenty of work. They are all stalled in download, meanwhile the CPU has finished one more WU, now only running on 6 cores. Boinc won't fetch as long as other WU's are downloading, and it could take some time to just get these 7 WU's downloaded.

This should not happen, with a cache thats supposed to be 10 days, or whatever max limit is set on number of WU's.....

Why hasn't boinc requested CPU work a long time ago?
____________

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23764
Credit: 32,547,472
RAC: 24,393
Germany
Message 1224890 - Posted: 29 Apr 2012, 13:32:56 UTC

Probably you need to readjust your preferences.
Your first setting is to high so boinc ask to late.
If you have 10 / 0.5 change to 0.5 / 10

____________

Profile Karsten Vinding
Volunteer tester
Send message
Joined: 18 May 99
Posts: 140
Credit: 16,506,908
RAC: 2,981
Denmark
Message 1224891 - Posted: 29 Apr 2012, 13:34:58 UTC - in response to Message 1224890.
Last modified: 29 Apr 2012, 13:39:43 UTC

Probably you need to readjust your preferences.
Your first setting is to high so boinc ask to late.
If you have 10 / 0.5 change to 0.5 / 10


I'll try that.

Why should it work? Shouldnt boinc try to keep work for 10 days anyway? Or does it somehow control how often it will ask for work?
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8459
Credit: 48,737,922
RAC: 83,036
United Kingdom
Message 1224898 - Posted: 29 Apr 2012, 13:45:28 UTC - in response to Message 1224891.

Probably you need to readjust your preferences.
Your first setting is to high so boinc ask to late.
If you have 10 / 0.5 change to 0.5 / 10

I'll try that.

Why should it work? Shouldnt boinc try to keep work for 10 days anyway? Or does it somehow control how often it will ask for work?

It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.

That works well for the majority of BOINC projects, where the limiting factor is the scheduler/database complex: it doesn't work so well here, when the limiting factor is so frequently the download server/communications complex.

As Joe predicted before v7.0.25 was made the recommended version, there will be a lot of users asking this question.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38850
Credit: 576,946,495
RAC: 523,438
United States
Message 1224900 - Posted: 29 Apr 2012, 13:47:51 UTC - in response to Message 1224898.
Last modified: 29 Apr 2012, 13:49:20 UTC

Probably you need to readjust your preferences.
Your first setting is to high so boinc ask to late.
If you have 10 / 0.5 change to 0.5 / 10

I'll try that.

Why should it work? Shouldnt boinc try to keep work for 10 days anyway? Or does it somehow control how often it will ask for work?

It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.

That works well for the majority of BOINC projects, where the limiting factor is the scheduler/database complex: it doesn't work so well here, when the limiting factor is so frequently the download server/communications complex.

As Joe predicted before v7.0.25 was made the recommended version, there will be a lot of users asking this question.


It's the new policy of catch as catch can.
Otherwise known as 'screw it up as much as possible'.

The latest innovation from 'whatsamatta u'.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile Karsten Vinding
Volunteer tester
Send message
Joined: 18 May 99
Posts: 140
Credit: 16,506,908
RAC: 2,981
Denmark
Message 1224902 - Posted: 29 Apr 2012, 13:55:00 UTC - in response to Message 1224898.
Last modified: 29 Apr 2012, 13:59:38 UTC


It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.


OK, that sort of explains it.

So setting boinc to ask for work often, will work better because the downloads are normally having a hard time getting through.

I still don't understand why the decission was made to give the GPU preference in getting work, when the CPU is asking for a LOT more work than the GPU.

My computer is getting work now, but only for GPU. Meanwhile only 4 cores are crunching. If boinc had requested for CPU only, because it needs work the most, this problem wouldn't exist.

While I was writing this, the CPU finally got work, and I can see boinc has started 6 new tasks on the CPU, meaning it was down to crunching on 2 cores.

I still think something is wrong with the way this works.

Its not easy to setup when the system does the opposite of what you would expect.
____________

Profile TRuEQ & TuVaLu
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 461
Credit: 17,824,563
RAC: 3,342
Sweden
Message 1224904 - Posted: 29 Apr 2012, 14:01:16 UTC - in response to Message 1224898.


It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.

That works well for the majority of BOINC projects, where the limiting factor is the scheduler/database complex: it doesn't work so well here, when the limiting factor is so frequently the download server/communications complex.


And as it does not work for me when my highest resource project has no work and asks for lots of work which it doesn't have alot of times.
And running several projects all on GPU will fill the cue/cache with tasks from all the projects with lower resource... Making the request for work for the highest resource project here not work that well....

I am going around this by using NNT when having enough tasks for the other projects and the cache is filled to maybe half(3days of 6).
That will keep the highest resource project asking for work more frequently.

And then I control it instead of having a program do it for me.

:)

____________
TRuEQ & TuVaLu

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8618
Credit: 23,654,093
RAC: 18,635
United Kingdom
Message 1224918 - Posted: 29 Apr 2012, 15:28:34 UTC - in response to Message 1224902.


It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.


OK, that sort of explains it.

So setting boinc to ask for work often, will work better because the downloads are normally having a hard time getting through.

I still don't understand why the decission was made to give the GPU preference in getting work, when the CPU is asking for a LOT more work than the GPU.

My computer is getting work now, but only for GPU. Meanwhile only 4 cores are crunching. If boinc had requested for CPU only, because it needs work the most, this problem wouldn't exist.

While I was writing this, the CPU finally got work, and I can see boinc has started 6 new tasks on the CPU, meaning it was down to crunching on 2 cores.

I still think something is wrong with the way this works.

Its not easy to setup when the system does the opposite of what you would expect.

BOINC just allocates to the most powerful device usually the GPU, not recogising that, say, the 4 cores of a CPU although each is less powerful the combined crunching capability is greater.

The previous versions would also do this, it is not a version 7 induced bug.
I complained about this at the same time that I posted the problem of APR and outliers.

Profile Karsten Vinding
Volunteer tester
Send message
Joined: 18 May 99
Posts: 140
Credit: 16,506,908
RAC: 2,981
Denmark
Message 1224960 - Posted: 29 Apr 2012, 17:22:53 UTC - in response to Message 1224918.
Last modified: 29 Apr 2012, 17:24:09 UTC

Well setting cache to 0,5/10 seems to have helped. I now have a large enough cache to last me for about 1 day and some hours.

Boinc reports that I have a 13 day cache, but thats not correct ( I have 119 Wu's + 2 AP, and I crunch ~4-5 WU's an hour (AP's takes ~ half a day)).

But anyway, if it fetches again before running out of work, I'll be more than happy.
____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3618
Credit: 48,515,086
RAC: 38,897
United States
Message 1224962 - Posted: 29 Apr 2012, 17:37:31 UTC - in response to Message 1224960.

Well setting cache to 0,5/10 seems to have helped. I now have a large enough cache to last me for about 1 day and some hours.

Boinc reports that I have a 13 day cache, but thats not correct ( I have 119 Wu's + 2 AP, and I crunch ~4-5 WU's an hour (AP's takes ~ half a day)).

But anyway, if it fetches again before running out of work, I'll be more than happy.


Take your cache that BOINC reports and divide by the number of cores you have.
____________

Profile Karsten Vinding
Volunteer tester
Send message
Joined: 18 May 99
Posts: 140
Credit: 16,506,908
RAC: 2,981
Denmark
Message 1224972 - Posted: 29 Apr 2012, 17:58:53 UTC - in response to Message 1224962.
Last modified: 29 Apr 2012, 18:01:36 UTC

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.
____________

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12377
Credit: 6,679,244
RAC: 8,663
United States
Message 1225017 - Posted: 29 Apr 2012, 19:19:29 UTC - in response to Message 1224972.

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.

Correct and under the new scheduler if any project doesn't send work, project back off controls the cache size. As the maximum back off is 24 hours, the average cache is now under 12 hours, essentially the setting is ignored.

If you want to cache a lot of work the only way now is to tell BOINC it can only connect to the net once a week. I haven't tried this with the report work immediately flag to see it that gets around DA's tampering and keeps your RAC high.

____________

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38850
Credit: 576,946,495
RAC: 523,438
United States
Message 1225024 - Posted: 29 Apr 2012, 19:32:41 UTC - in response to Message 1225017.

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.

Correct and under the new scheduler if any project doesn't send work, project back off controls the cache size. As the maximum back off is 24 hours, the average cache is now under 12 hours, essentially the setting is ignored.

If you want to cache a lot of work the only way now is to tell BOINC it can only connect to the net once a week. I haven't tried this with the report work immediately flag to see it that gets around DA's tampering and keeps your RAC high.

Tut tut.
Tampering with DA's fine craftsmanship?

I wouldn't dare even..........
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8618
Credit: 23,654,093
RAC: 18,635
United Kingdom
Message 1225092 - Posted: 29 Apr 2012, 21:19:09 UTC - in response to Message 1224972.

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.

with 0.5/10 you have asked for a 0.5 day cache with 10 days extra, try 10/0.5 that would be 10 day cache with upto 0.5 day extra.

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3618
Credit: 48,515,086
RAC: 38,897
United States
Message 1225151 - Posted: 30 Apr 2012, 0:12:16 UTC - in response to Message 1225092.

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.

with 0.5/10 you have asked for a 0.5 day cache with 10 days extra, try 10/0.5 that would be 10 day cache with upto 0.5 day extra.


Nope, he has 7.0.25 installed and that is what he had and was not getting work.
____________

Profile Alex Storey
Volunteer tester
Avatar
Send message
Joined: 14 Jun 04
Posts: 536
Credit: 1,642,324
RAC: 546
Greece
Message 1225169 - Posted: 30 Apr 2012, 1:21:57 UTC - in response to Message 1224902.

I still think something is wrong with the way this works.


That's because there is. To prove it, go back and reset your cache to 10 + 10 and then go and uncheck "Use GPU" in Seti prefs. Then go to Boinc Manager, hit update, and watch Boinc magically ask for a bunch of CPU tasks:)

And before you ask, I'm pretty sure no-one really knows why this works...

1 · 2 · Next

Message boards : Number crunching : How does work fetch in 7.0.25 work?

Copyright © 2014 University of California