How does work fetch in 7.0.25 work?

Message boards : Number crunching : How does work fetch in 7.0.25 work?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1224868 - Posted: 29 Apr 2012, 12:43:37 UTC
Last modified: 29 Apr 2012, 12:44:58 UTC

I have been using 7.0.25 since it was released, and in general I have been pleased with it. Its more tidy in what work is running, and is better at finishing up jobs, instead of having a lot of tasks hanging in unfinished state (when running multiple projects).

For the last weeks I have been running Seti exclusively.
And on my non GPU computers it works perfectly, lots of Wu's cached, and the machines are allways crunching.

But on my two GPU machines not so well. Boinc normally keeps lots of Wu for the GPU's, but the CPU's often run dry.

Especially my fastest computer is having problems. Cache is set at 10/10 days, but still it often runs dry.

Right now I have 8 WU's crunching, and 2 in the cache. Enough for 2 hrs max. Reporting WU's back should make boinc fetch work (and lots of it, if it should last 10 days).
But boinc responds: Reporting 3 completed tasks, not requesting new tasks.

WHY?

I have been keeping an eye on it, and boinc will run the cache completely dry, leaving the CPU with nothing to do, or only work on a few of the CPU's, before it starts asking for work.

Sometimes its without CPU work for many hours.

It's not Wu's that are waiting to be downloaded that prevents fetching, my DL queue is empty.

If I enable another project with CPU crunching, it will start fetching immediatly for that project, and after this, start fetching for Seti, if I disable work fetch for the other project. But it only lasts for a short time, then nothing is fetched again.

This is very annoying behaviour, and it has been doing it for weeks. Letting things settle down, doesn't seem to help.

I think this is a serious bug in work fetch in 7.0.25. It should try to honour the cache settings for both GPU and CPU, when the systems crunches on both.
ID: 1224868 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1224873 - Posted: 29 Apr 2012, 12:55:42 UTC - in response to Message 1224868.  

I recommend you enable, to start with, the

<sched_op_debug>

logging option described in client configuration. That will show you, in enough detail but without excess bloating, whether your issue is the client not requesting work, or the SETI server not supplying it.
ID: 1224873 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1224878 - Posted: 29 Apr 2012, 13:05:25 UTC - in response to Message 1224873.  
Last modified: 29 Apr 2012, 13:15:29 UTC

I have enabled that setting.

The issue is the client not asking for work.

Its not the "project has no work available" response I'm seeing, the client simply doesn't ask for any work.

I just asked boinc to report finished work. It's down to the last 8 wu's running, nothing in the CPU cache, lots in the GPU cache.

29-04-2012 15:05:48 | SETI@home | Reporting 4 completed tasks, not requesting new tasks
29-04-2012 15:05:48 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
29-04-2012 15:05:48 | SETI@home | [sched_op] ATI work request: 0.00 seconds; 0.00 devices
29-04-2012 15:06:02 | SETI@home | Scheduler request completed

In 15 minutes, two more Wu's will be finished, and the CPU will only be running on 6 out of eight cores. In 1h30m no Wu's will be crunching, except on GPU...
ID: 1224878 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1224881 - Posted: 29 Apr 2012, 13:13:54 UTC

It doesn't.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1224881 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1224889 - Posted: 29 Apr 2012, 13:27:35 UTC
Last modified: 29 Apr 2012, 13:31:36 UTC

Now something is happening:

My CPU finished one of its Wu's, and boinc reacted on this.

29-04-2012 15:20:38 | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and ATI
29-04-2012 15:20:38 | SETI@home | [sched_op] CPU work request: 2744569.84 seconds; 0.00 devices
29-04-2012 15:20:38 | SETI@home | [sched_op] ATI work request: 82425.41 seconds; 0.00 devices
29-04-2012 15:20:42 | SETI@home | Scheduler request completed: got 7 new tasks

These 7 new tasks are all for the GPU, which has plenty of work. They are all stalled in download, meanwhile the CPU has finished one more WU, now only running on 6 cores. Boinc won't fetch as long as other WU's are downloading, and it could take some time to just get these 7 WU's downloaded.

This should not happen, with a cache thats supposed to be 10 days, or whatever max limit is set on number of WU's.....

Why hasn't boinc requested CPU work a long time ago?
ID: 1224889 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1224890 - Posted: 29 Apr 2012, 13:32:56 UTC

Probably you need to readjust your preferences.
Your first setting is to high so boinc ask to late.
If you have 10 / 0.5 change to 0.5 / 10



With each crime and every kindness we birth our future.
ID: 1224890 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1224891 - Posted: 29 Apr 2012, 13:34:58 UTC - in response to Message 1224890.  
Last modified: 29 Apr 2012, 13:39:43 UTC

Probably you need to readjust your preferences.
Your first setting is to high so boinc ask to late.
If you have 10 / 0.5 change to 0.5 / 10


I'll try that.

Why should it work? Shouldnt boinc try to keep work for 10 days anyway? Or does it somehow control how often it will ask for work?
ID: 1224891 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1224898 - Posted: 29 Apr 2012, 13:45:28 UTC - in response to Message 1224891.  

Probably you need to readjust your preferences.
Your first setting is to high so boinc ask to late.
If you have 10 / 0.5 change to 0.5 / 10

I'll try that.

Why should it work? Shouldnt boinc try to keep work for 10 days anyway? Or does it somehow control how often it will ask for work?

It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.

That works well for the majority of BOINC projects, where the limiting factor is the scheduler/database complex: it doesn't work so well here, when the limiting factor is so frequently the download server/communications complex.

As Joe predicted before v7.0.25 was made the recommended version, there will be a lot of users asking this question.
ID: 1224898 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1224900 - Posted: 29 Apr 2012, 13:47:51 UTC - in response to Message 1224898.  
Last modified: 29 Apr 2012, 13:49:20 UTC

Probably you need to readjust your preferences.
Your first setting is to high so boinc ask to late.
If you have 10 / 0.5 change to 0.5 / 10

I'll try that.

Why should it work? Shouldnt boinc try to keep work for 10 days anyway? Or does it somehow control how often it will ask for work?

It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.

That works well for the majority of BOINC projects, where the limiting factor is the scheduler/database complex: it doesn't work so well here, when the limiting factor is so frequently the download server/communications complex.

As Joe predicted before v7.0.25 was made the recommended version, there will be a lot of users asking this question.


It's the new policy of catch as catch can.
Otherwise known as 'screw it up as much as possible'.

The latest innovation from 'whatsamatta u'.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1224900 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1224902 - Posted: 29 Apr 2012, 13:55:00 UTC - in response to Message 1224898.  
Last modified: 29 Apr 2012, 13:59:38 UTC


It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.


OK, that sort of explains it.

So setting boinc to ask for work often, will work better because the downloads are normally having a hard time getting through.

I still don't understand why the decission was made to give the GPU preference in getting work, when the CPU is asking for a LOT more work than the GPU.

My computer is getting work now, but only for GPU. Meanwhile only 4 cores are crunching. If boinc had requested for CPU only, because it needs work the most, this problem wouldn't exist.

While I was writing this, the CPU finally got work, and I can see boinc has started 6 new tasks on the CPU, meaning it was down to crunching on 2 cores.

I still think something is wrong with the way this works.

Its not easy to setup when the system does the opposite of what you would expect.
ID: 1224902 · Report as offensive
Profile TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 505
Credit: 69,523,653
RAC: 10
Sweden
Message 1224904 - Posted: 29 Apr 2012, 14:01:16 UTC - in response to Message 1224898.  


It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.

That works well for the majority of BOINC projects, where the limiting factor is the scheduler/database complex: it doesn't work so well here, when the limiting factor is so frequently the download server/communications complex.


And as it does not work for me when my highest resource project has no work and asks for lots of work which it doesn't have alot of times.
And running several projects all on GPU will fill the cue/cache with tasks from all the projects with lower resource... Making the request for work for the highest resource project here not work that well....

I am going around this by using NNT when having enough tasks for the other projects and the cache is filled to maybe half(3days of 6).
That will keep the highest resource project asking for work more frequently.

And then I control it instead of having a program do it for me.

:)

TRuEQ & TuVaLu
ID: 1224904 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1224918 - Posted: 29 Apr 2012, 15:28:34 UTC - in response to Message 1224902.  


It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts.


OK, that sort of explains it.

So setting boinc to ask for work often, will work better because the downloads are normally having a hard time getting through.

I still don't understand why the decission was made to give the GPU preference in getting work, when the CPU is asking for a LOT more work than the GPU.

My computer is getting work now, but only for GPU. Meanwhile only 4 cores are crunching. If boinc had requested for CPU only, because it needs work the most, this problem wouldn't exist.

While I was writing this, the CPU finally got work, and I can see boinc has started 6 new tasks on the CPU, meaning it was down to crunching on 2 cores.

I still think something is wrong with the way this works.

Its not easy to setup when the system does the opposite of what you would expect.

BOINC just allocates to the most powerful device usually the GPU, not recogising that, say, the 4 cores of a CPU although each is less powerful the combined crunching capability is greater.

The previous versions would also do this, it is not a version 7 induced bug.
I complained about this at the same time that I posted the problem of APR and outliers.
ID: 1224918 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1224960 - Posted: 29 Apr 2012, 17:22:53 UTC - in response to Message 1224918.  
Last modified: 29 Apr 2012, 17:24:09 UTC

Well setting cache to 0,5/10 seems to have helped. I now have a large enough cache to last me for about 1 day and some hours.

Boinc reports that I have a 13 day cache, but thats not correct ( I have 119 Wu's + 2 AP, and I crunch ~4-5 WU's an hour (AP's takes ~ half a day)).

But anyway, if it fetches again before running out of work, I'll be more than happy.
ID: 1224960 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1224962 - Posted: 29 Apr 2012, 17:37:31 UTC - in response to Message 1224960.  

Well setting cache to 0,5/10 seems to have helped. I now have a large enough cache to last me for about 1 day and some hours.

Boinc reports that I have a 13 day cache, but thats not correct ( I have 119 Wu's + 2 AP, and I crunch ~4-5 WU's an hour (AP's takes ~ half a day)).

But anyway, if it fetches again before running out of work, I'll be more than happy.


Take your cache that BOINC reports and divide by the number of cores you have.

ID: 1224962 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1224972 - Posted: 29 Apr 2012, 17:58:53 UTC - in response to Message 1224962.  
Last modified: 29 Apr 2012, 18:01:36 UTC

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.
ID: 1224972 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30608
Credit: 53,134,872
RAC: 32
United States
Message 1225017 - Posted: 29 Apr 2012, 19:19:29 UTC - in response to Message 1224972.  

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.

Correct and under the new scheduler if any project doesn't send work, project back off controls the cache size. As the maximum back off is 24 hours, the average cache is now under 12 hours, essentially the setting is ignored.

If you want to cache a lot of work the only way now is to tell BOINC it can only connect to the net once a week. I haven't tried this with the report work immediately flag to see it that gets around DA's tampering and keeps your RAC high.

ID: 1225017 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1225024 - Posted: 29 Apr 2012, 19:32:41 UTC - in response to Message 1225017.  

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.

Correct and under the new scheduler if any project doesn't send work, project back off controls the cache size. As the maximum back off is 24 hours, the average cache is now under 12 hours, essentially the setting is ignored.

If you want to cache a lot of work the only way now is to tell BOINC it can only connect to the net once a week. I haven't tried this with the report work immediately flag to see it that gets around DA's tampering and keeps your RAC high.

Tut tut.
Tampering with DA's fine craftsmanship?

I wouldn't dare even..........
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1225024 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1225092 - Posted: 29 Apr 2012, 21:19:09 UTC - in response to Message 1224972.  

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.

with 0.5/10 you have asked for a 0.5 day cache with 10 days extra, try 10/0.5 that would be 10 day cache with upto 0.5 day extra.
ID: 1225092 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1225151 - Posted: 30 Apr 2012, 0:12:16 UTC - in response to Message 1225092.  

Correct.

But then that isnt a 10 day cache, which I have told it to maintain... :)

So it should still be requesting work, and its not.

with 0.5/10 you have asked for a 0.5 day cache with 10 days extra, try 10/0.5 that would be 10 day cache with upto 0.5 day extra.


Nope, he has 7.0.25 installed and that is what he had and was not getting work.

ID: 1225151 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1225169 - Posted: 30 Apr 2012, 1:21:57 UTC - in response to Message 1224902.  

I still think something is wrong with the way this works.


That's because there is. To prove it, go back and reset your cache to 10 + 10 and then go and uncheck "Use GPU" in Seti prefs. Then go to Boinc Manager, hit update, and watch Boinc magically ask for a bunch of CPU tasks:)

And before you ask, I'm pretty sure no-one really knows why this works...
ID: 1225169 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : How does work fetch in 7.0.25 work?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.