Message boards :
Number crunching :
How does work fetch in 7.0.25 work?
Message board moderation
Author | Message |
---|---|
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
I have been using 7.0.25 since it was released, and in general I have been pleased with it. Its more tidy in what work is running, and is better at finishing up jobs, instead of having a lot of tasks hanging in unfinished state (when running multiple projects). For the last weeks I have been running Seti exclusively. And on my non GPU computers it works perfectly, lots of Wu's cached, and the machines are allways crunching. But on my two GPU machines not so well. Boinc normally keeps lots of Wu for the GPU's, but the CPU's often run dry. Especially my fastest computer is having problems. Cache is set at 10/10 days, but still it often runs dry. Right now I have 8 WU's crunching, and 2 in the cache. Enough for 2 hrs max. Reporting WU's back should make boinc fetch work (and lots of it, if it should last 10 days). But boinc responds: Reporting 3 completed tasks, not requesting new tasks. WHY? I have been keeping an eye on it, and boinc will run the cache completely dry, leaving the CPU with nothing to do, or only work on a few of the CPU's, before it starts asking for work. Sometimes its without CPU work for many hours. It's not Wu's that are waiting to be downloaded that prevents fetching, my DL queue is empty. If I enable another project with CPU crunching, it will start fetching immediatly for that project, and after this, start fetching for Seti, if I disable work fetch for the other project. But it only lasts for a short time, then nothing is fetched again. This is very annoying behaviour, and it has been doing it for weeks. Letting things settle down, doesn't seem to help. I think this is a serious bug in work fetch in 7.0.25. It should try to honour the cache settings for both GPU and CPU, when the systems crunches on both. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I recommend you enable, to start with, the <sched_op_debug> logging option described in client configuration. That will show you, in enough detail but without excess bloating, whether your issue is the client not requesting work, or the SETI server not supplying it. |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
I have enabled that setting. The issue is the client not asking for work. Its not the "project has no work available" response I'm seeing, the client simply doesn't ask for any work. I just asked boinc to report finished work. It's down to the last 8 wu's running, nothing in the CPU cache, lots in the GPU cache. 29-04-2012 15:05:48 | SETI@home | Reporting 4 completed tasks, not requesting new tasks 29-04-2012 15:05:48 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices 29-04-2012 15:05:48 | SETI@home | [sched_op] ATI work request: 0.00 seconds; 0.00 devices 29-04-2012 15:06:02 | SETI@home | Scheduler request completed In 15 minutes, two more Wu's will be finished, and the CPU will only be running on 6 out of eight cores. In 1h30m no Wu's will be crunching, except on GPU... |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
It doesn't. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
Now something is happening: My CPU finished one of its Wu's, and boinc reacted on this. 29-04-2012 15:20:38 | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and ATI 29-04-2012 15:20:38 | SETI@home | [sched_op] CPU work request: 2744569.84 seconds; 0.00 devices 29-04-2012 15:20:38 | SETI@home | [sched_op] ATI work request: 82425.41 seconds; 0.00 devices 29-04-2012 15:20:42 | SETI@home | Scheduler request completed: got 7 new tasks These 7 new tasks are all for the GPU, which has plenty of work. They are all stalled in download, meanwhile the CPU has finished one more WU, now only running on 6 cores. Boinc won't fetch as long as other WU's are downloading, and it could take some time to just get these 7 WU's downloaded. This should not happen, with a cache thats supposed to be 10 days, or whatever max limit is set on number of WU's..... Why hasn't boinc requested CPU work a long time ago? |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Probably you need to readjust your preferences. Your first setting is to high so boinc ask to late. If you have 10 / 0.5 change to 0.5 / 10 With each crime and every kindness we birth our future. |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
Probably you need to readjust your preferences. I'll try that. Why should it work? Shouldnt boinc try to keep work for 10 days anyway? Or does it somehow control how often it will ask for work? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Probably you need to readjust your preferences. It's a new policy called "hysteresis work fetch" - the plan being to request work less often, but in larger amounts. That works well for the majority of BOINC projects, where the limiting factor is the scheduler/database complex: it doesn't work so well here, when the limiting factor is so frequently the download server/communications complex. As Joe predicted before v7.0.25 was made the recommended version, there will be a lot of users asking this question. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Probably you need to readjust your preferences. It's the new policy of catch as catch can. Otherwise known as 'screw it up as much as possible'. The latest innovation from 'whatsamatta u'. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
OK, that sort of explains it. So setting boinc to ask for work often, will work better because the downloads are normally having a hard time getting through. I still don't understand why the decission was made to give the GPU preference in getting work, when the CPU is asking for a LOT more work than the GPU. My computer is getting work now, but only for GPU. Meanwhile only 4 cores are crunching. If boinc had requested for CPU only, because it needs work the most, this problem wouldn't exist. While I was writing this, the CPU finally got work, and I can see boinc has started 6 new tasks on the CPU, meaning it was down to crunching on 2 cores. I still think something is wrong with the way this works. Its not easy to setup when the system does the opposite of what you would expect. |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
And as it does not work for me when my highest resource project has no work and asks for lots of work which it doesn't have alot of times. And running several projects all on GPU will fill the cue/cache with tasks from all the projects with lower resource... Making the request for work for the highest resource project here not work that well.... I am going around this by using NNT when having enough tasks for the other projects and the cache is filled to maybe half(3days of 6). That will keep the highest resource project asking for work more frequently. And then I control it instead of having a program do it for me. :) TRuEQ & TuVaLu |
W-K 666 Send message Joined: 18 May 99 Posts: 19060 Credit: 40,757,560 RAC: 67 |
BOINC just allocates to the most powerful device usually the GPU, not recogising that, say, the 4 cores of a CPU although each is less powerful the combined crunching capability is greater. The previous versions would also do this, it is not a version 7 induced bug. I complained about this at the same time that I posted the problem of APR and outliers. |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
Well setting cache to 0,5/10 seems to have helped. I now have a large enough cache to last me for about 1 day and some hours. Boinc reports that I have a 13 day cache, but thats not correct ( I have 119 Wu's + 2 AP, and I crunch ~4-5 WU's an hour (AP's takes ~ half a day)). But anyway, if it fetches again before running out of work, I'll be more than happy. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Well setting cache to 0,5/10 seems to have helped. I now have a large enough cache to last me for about 1 day and some hours. Take your cache that BOINC reports and divide by the number of cores you have. |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
Correct. But then that isnt a 10 day cache, which I have told it to maintain... :) So it should still be requesting work, and its not. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30649 Credit: 53,134,872 RAC: 32 |
Correct. Correct and under the new scheduler if any project doesn't send work, project back off controls the cache size. As the maximum back off is 24 hours, the average cache is now under 12 hours, essentially the setting is ignored. If you want to cache a lot of work the only way now is to tell BOINC it can only connect to the net once a week. I haven't tried this with the report work immediately flag to see it that gets around DA's tampering and keeps your RAC high. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Correct. Tut tut. Tampering with DA's fine craftsmanship? I wouldn't dare even.......... "Freedom is just Chaos, with better lighting." Alan Dean Foster |
W-K 666 Send message Joined: 18 May 99 Posts: 19060 Credit: 40,757,560 RAC: 67 |
Correct. with 0.5/10 you have asked for a 0.5 day cache with 10 days extra, try 10/0.5 that would be 10 day cache with upto 0.5 day extra. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Correct. Nope, he has 7.0.25 installed and that is what he had and was not getting work. |
shizaru Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0 |
I still think something is wrong with the way this works. That's because there is. To prove it, go back and reset your cache to 10 + 10 and then go and uncheck "Use GPU" in Seti prefs. Then go to Boinc Manager, hit update, and watch Boinc magically ask for a bunch of CPU tasks:) And before you ask, I'm pretty sure no-one really knows why this works... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.