Message boards :
Number crunching :
it's the AP Splitter processes killing the Scheduler
Message board moderation
Author | Message |
---|---|
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. Richard, I think you hit the nail on the head. 96GB of RAM isn't enough to keep Synergy from flogging the disks when it's running all of those processes. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. We need to Celebrate finaly a light at the end of the tunnel! AP-split stoped... all returns to work normal... Seti is alive again! |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
I haven't had too much trouble reporting, but I just checked the log of one of my machines (UNIMATRIX02) and the last download was on Nov. 5 (yes, a WEEK ago), and I still have 959 ghosts on the machine. And have had since at least Nov. 7, when I got the Ghost Detector. My other machine (FERMIBOX2), which has 0 ghosts gets at least an occasional d/l. But not enough. Is this ever going to be fixed???? Can I get rid of the ghosts (once I run down to empty except for ghosts) by doing a detach/ re-attach to SETI? Or what will happen if I do that? |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
I I don't know. What I do know is that the ghosts will start downloading as "resent lost task" as soon as you start downloading again. There's no need to delete the ghosts. You will get them on a subsequent download. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
@tbret: I hope you are right. But I'm not sanguine about the prospects. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I haven't had too much trouble reporting, but I just checked the log of one of my machines (UNIMATRIX02) and the last download was on Nov. 5 (yes, a WEEK ago), and I still have 959 ghosts on the machine. And have had since at least Nov. 7, when I got the Ghost Detector. Your host 6750873 which hasn't gotten new work lately is shown as having 1567 tasks in progress. If only 959 of those are ghosts, there must be 608 in your cache. That's considerably above the limits which are in effect. Once the host completes and reports enough work that the Scheduler will consider sending more, the ghosts should be resent. Consider them tasks in the bank, even if the splitters die and don't produce any for awhile those WUs are already split and available for download. IIRC a detach/reattach (aka Remove/Add with BOINC 7.0.x) would indeed change their status to "Abandoned". That action is totally separate from any consideration of whether the WUs actually were downloaded since the first step deletes the project directory and everything in it, as well as the client_state entries for the project. Joe |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
I Not true. If server thinks I have 959 more than I actually do he will not ever send me any when I actually get to 0. Right? |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. We might be fixing that shortly. The Lab has Synergy loaded down heavy here of late. Executive Director GPU Users Group Inc. - brad@gpuug.org |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. Ah. Then can you stress to them, please - and with some force - that there is no need to gallop through splitting the tapes for AP so fast. In the short term, like before the next fresh tape appears in the queue, they could experiment with disabling the AP splitters on Synergy, and see how Lando gets on on its own. I did suggest that myself a week ago, but they chose not to act on it. |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. Wilco. Executive Director GPU Users Group Inc. - brad@gpuug.org |
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
I Wrong, the way it works is that what you actually have on your system is checked before a download and if you have less than 200 (both your systems have CPU and GPU) it will send you more WUs up to a total of 100 for each, CPU and GPU. It will keep sending lost tasks until they are all gone. I can tell you it works that way as I am now getting close to only 2000 ghosts on one of my computers. SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
@Bill G: I sure hope you are right - soon I will know for sure... |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I just reported over 1,300 tasks with a max per report of 250, (in other words, six Scheduler contacts) without a hang, a timeout, or a wait. With the feeder now holding 200 tasks at a time, i question the wisdom of allowing 180+ tasks to be sent out in one contact, and that subsequently times out, i was getting timeouts with 80 tasks sent when the feeder was still holding 100 tasks, then having to get them resent 20 at a time, (or 10 at a time at Seti Beta) Best to limit the tasks sent to something like 60, so the scheduler contacts are smaller, and more likely to get though, and so lessen the database lookups. Claggy |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Just make the right thing, keep the AP-splitters stoped and rise the limits, everything will be ok in few days. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Well, Unimatrix02 is down to 1212 In Progress, 253 On Board (959 Ghosts) now, so should find out today what the Servers think of sending some actual WUs. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
@Bill G: Does look like you were right - the machine that had 959 ghosts now has 956 in progress, which means he has been getting some ghosts resent, or he'd be out of work. Let's hope SETI can keep up with his hunger for WUs - 100/day isn't going to make it - he's a GPU only machine. And he eats about 200-250/day. |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
And they chose not to act on it again. All AP splitters are running after maintenance. Actually splitters from Lando were not running after maintenance, but they are running again... It did look just before maintenance break in Cricket and crunchers did have their work units without timeouts from server... "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Due to the limits on the number of tasks, and the fact it isn't possible to get new work & almost impossible to even report work while the AP splitters are running i have run out of GPU work on both of my systems, will run out of CPU on one of them in the next 40 minutes & by the end of the day will have no work on either of my systems. Please, please, [i]please[i/] can someone let the satff know that limiting the number of tasks hasn't helped in the slightest. When it does start to help- it will only be becasue everyone is out of work. Until the Scheduler is fixed they need to stop all AP production & distribution. They need to fix the Scheduler problem. EDIT- this problem only started 3 (or was it 4?) weeks ago ofter the weekly outage. Whatever changes they did then to cause the problem, please undo them. Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22189 Credit: 416,307,556 RAC: 380 |
Grant you are a long playing record that has got stuck, and a very wrong oner at that. Over the weekend there was NO AP PRODUCTION, and the servers were behaving just as bad as they are now with AP production. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Over the weekend there was NO AP PRODUCTION, and the servers were behaving just as bad as they are now with AP production. Over the weekend i didn't run out of work. There were still some Scheduler timeouts, but not every request resulted in one. Overnight, it turns out the AP splitters were cranking out the work again- and every single request resulted in a timeout. It may not be the cause, but with such a high correlation there's a pretty good chance it's related. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.