it's the AP Splitter processes killing the Scheduler |
![]() |
| log in |
Message boards : Number crunching : it's the AP Splitter processes killing the Scheduler
1 · 2 · 3 · 4 . . . 6 · Next
| Author | Message |
|---|---|
|
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. | |
| ID: 1304755 · | |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. We need to Celebrate finaly a light at the end of the tunnel! AP-split stoped... all returns to work normal... Seti is alive again! ____________ | |
| ID: 1304758 · | |
|
I haven't had too much trouble reporting, but I just checked the log of one of my machines (UNIMATRIX02) and the last download was on Nov. 5 (yes, a WEEK ago), and I still have 959 ghosts on the machine. And have had since at least Nov. 7, when I got the Ghost Detector. | |
| ID: 1304799 · | |
I I don't know. What I do know is that the ghosts will start downloading as "resent lost task" as soon as you start downloading again. There's no need to delete the ghosts. You will get them on a subsequent download. | |
| ID: 1304805 · | |
|
@tbret: | |
| ID: 1304806 · | |
I haven't had too much trouble reporting, but I just checked the log of one of my machines (UNIMATRIX02) and the last download was on Nov. 5 (yes, a WEEK ago), and I still have 959 ghosts on the machine. And have had since at least Nov. 7, when I got the Ghost Detector. Your host 6750873 which hasn't gotten new work lately is shown as having 1567 tasks in progress. If only 959 of those are ghosts, there must be 608 in your cache. That's considerably above the limits which are in effect. Once the host completes and reports enough work that the Scheduler will consider sending more, the ghosts should be resent. Consider them tasks in the bank, even if the splitters die and don't produce any for awhile those WUs are already split and available for download. IIRC a detach/reattach (aka Remove/Add with BOINC 7.0.x) would indeed change their status to "Abandoned". That action is totally separate from any consideration of whether the WUs actually were downloaded since the first step deletes the project directory and everything in it, as well as the client_state entries for the project. Joe | |
| ID: 1305091 · | |
I Not true. If server thinks I have 959 more than I actually do he will not ever send me any when I actually get to 0. Right? ____________ | |
| ID: 1305314 · | |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. We might be fixing that shortly. The Lab has Synergy loaded down heavy here of late. ____________ Executive Director GPU Users Group Inc. - brad@gpuug.org | |
| ID: 1305322 · | |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. Ah. Then can you stress to them, please - and with some force - that there is no need to gallop through splitting the tapes for AP so fast. In the short term, like before the next fresh tape appears in the queue, they could experiment with disabling the AP splitters on Synergy, and see how Lando gets on on its own. I did suggest that myself a week ago, but they chose not to act on it. | |
| ID: 1305324 · | |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. Wilco. ____________ Executive Director GPU Users Group Inc. - brad@gpuug.org | |
| ID: 1305326 · | |
I Wrong, the way it works is that what you actually have on your system is checked before a download and if you have less than 200 (both your systems have CPU and GPU) it will send you more WUs up to a total of 100 for each, CPU and GPU. It will keep sending lost tasks until they are all gone. I can tell you it works that way as I am now getting close to only 2000 ghosts on one of my computers. ____________ | |
| ID: 1305327 · | |
|
@Bill G: | |
| ID: 1305341 · | |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. With the feeder now holding 200 tasks at a time, i question the wisdom of allowing 180+ tasks to be sent out in one contact, and that subsequently times out, i was getting timeouts with 80 tasks sent when the feeder was still holding 100 tasks, then having to get them resent 20 at a time, (or 10 at a time at Seti Beta) Best to limit the tasks sent to something like 60, so the scheduler contacts are smaller, and more likely to get though, and so lessen the database lookups. Claggy | |
| ID: 1305352 · | |
|
Just make the right thing, keep the AP-splitters stoped and rise the limits, everything will be ok in few days. | |
| ID: 1305354 · | |
|
Well, Unimatrix02 is down to 1212 In Progress, 253 On Board (959 Ghosts) now, so should find out today what the Servers think of sending some actual WUs. | |
| ID: 1305752 · | |
@Bill G: Does look like you were right - the machine that had 959 ghosts now has 956 in progress, which means he has been getting some ghosts resent, or he'd be out of work. Let's hope SETI can keep up with his hunger for WUs - 100/day isn't going to make it - he's a GPU only machine. And he eats about 200-250/day. ____________ | |
| ID: 1305978 · | |
And they chose not to act on it again. All AP splitters are running after maintenance. Actually splitters from Lando were not running after maintenance, but they are running again... It did look just before maintenance break in Cricket and crunchers did have their work units without timeouts from server... ____________ | |
| ID: 1306120 · | |
|
Due to the limits on the number of tasks, and the fact it isn't possible to get new work & almost impossible to even report work while the AP splitters are running i have run out of GPU work on both of my systems, will run out of CPU on one of them in the next 40 minutes & by the end of the day will have no work on either of my systems. | |
| ID: 1306164 · | |
|
Grant you are a long playing record that has got stuck, and a very wrong oner at that. | |
| ID: 1306168 · | |
Over the weekend there was NO AP PRODUCTION, and the servers were behaving just as bad as they are now with AP production. Over the weekend i didn't run out of work. There were still some Scheduler timeouts, but not every request resulted in one. Overnight, it turns out the AP splitters were cranking out the work again- and every single request resulted in a timeout. It may not be the cause, but with such a high correlation there's a pretty good chance it's related. ____________ Grant Darwin NT. | |
| ID: 1306169 · | |
Message boards : Number crunching : it's the AP Splitter processes killing the Scheduler
| Copyright © 2013 University of California |