Message boards :
Number crunching :
Panic Mode On (78) Server Problems?
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 22 · Next
Author | Message |
---|---|
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Anybody got any idea whether we're being haunted by Astropulse ghosts, to the same degree? I would just stop the AP splitters. MB was chugging along just fine until the AP work entered the picture again. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Fred E. Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0 |
The question is - is stopping the MB splitters enough, or should I ask them to stop AP as well? Numbers for AP are not as large, but I have 8 lost AP that were not sent when scheduler gave me new work instead. "Results" out in field are pretty big for MB - 10,685,000 and 137k for AP. Suggest they try to shut off both. Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
One rig couldn't connect to the SETI servers and displayed the message: don't need a network connection?! After reinstalling BOINC several times, my account setting in SETI were altered?! And in Malaria and Rosetta, too. My BOOT-drive (C:) was also shared. In too many cases, it's not a good idea to run multiple projects, on 1 host. Apollogies to my wingmen for a few hundred MB WUs, which could not be uploaded, due to this network issue and were timed-out. |
fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0 |
As Marcel said..."Just shoot up here amongst us, one of us has got to have some relief." :) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14645 Credit: 200,643,578 RAC: 874 |
I would just stop the AP splitters. I sympathise with that assessment of the possible trigger, but I think we've got beyond that point now. The database is horrendously bloated - well over 10.5 million tasks supposedly 'out in the field', which is 50% more than usual. The biggest problem right now isn't communications - when you get resends, they come down quite smoothly - but the slow "thinking time" response of the scheduler. I don't think simply stopping AP production on its own will free up enough scheduler and database resources to stop the timeouts and the creation of new ghosts. On the other hand, I agree stopping MB production is a drastic step, and it will impact loads of volunteers with hosts like my little one-core server - which has been plinking along exactly as designed, getting new tasks as needed (and rotating through three different projects). No surplus fat to live off there - one task in progress it says, and I can see it running now. But my gut feeling is saying, quite strongly, that recovery from this problem is going to take an outage of some sort - and the sooner we start it, the shorter it will be. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I would just stop the AP splitters. Well, I would not be the one to second guess your intuition, Richard. You seldom are off base. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13715 Credit: 208,696,464 RAC: 304 |
There are going to 10s of thousands of timeouts when we run out of shorties & all the VLAR resends start going to NVidia GPUs. Overnight i got a few resends on my systems, but it's still mostly timeouts. There's been the odd HTTP error, couldn't reach Scheduler, and when it was reached Project has no tasks available. But mostly it's still timeouts. Been that way pretty much since a few hours after the last weekly outage, and i notice the database queries are still over 1,000/s where less than 800/s is the usual number. Grant Darwin NT |
Highlander Send message Joined: 5 Oct 99 Posts: 167 Credit: 37,987,668 RAC: 16 |
According to AP Ghosts: on my host http://setiathome.berkeley.edu/show_host_detail.php?hostid=5553346 which only do AP, there are till now no ghost units at all. My other host which is setup as MB only have some, around 100. And only for this host i do manual suspend of network communication (once a day open for 2-3 hours) cause i dont wanna put extra load on the schedular because of NNT. - Performance is not a simple linear function of the number of CPUs you throw at the problem. - |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
What I'm seeing now. This is about MB units, I don't do APs. Now that the "ready to send" queue has dried up, I've been able to get schedule requests completed even when allowing new tasks. Not getting any new units but I would expect that with the ready queue running on vapors. However Ghost Detector shows I currently have 196 ghost units that hadn't been downloaded but assigned and I should be getting those, eventually. All ATI, almost all shorties BTW. Problem is that's nearly half of my "in progress" units. And the vast majority of the ones I do have are shorties. I'm guessing I'm down to a couple of days worth of units locally right now. Que sera sera. "Life is just nature's way of keeping meat fresh." - The Doctor |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14645 Credit: 200,643,578 RAC: 874 |
I've just had a note back from Eric: I've stopped the splitters and doubled the httpd timeout... The splitters are already showing red/orange on the server status page, and 'ready to send' is as near zero as makes no difference (there'll always be a few errors and timeouts to resend). So I'm going to turn off NNT and see what happens - let's see if we can help get this beast back under control. |
fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0 |
I've just had a note back from Eric: I have NNT turned off on three machines with empty caches and lots of ghost tasks. So far all I get is Project has no tasks available. |
Fred E. Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0 |
I decided to try to get some of my lost tasks since the splitters are disabled and there is no new work available. However, I didn't get any of my 467 lost tasks. I got the "no tasks available" message on two attempts. Will have to see how this plays out. I'm back to NNT for now. Database queries are still high but I got fast responses on those requests- 7 and 8 seconds for work requests ain't bad. Bet Synergy would always run well without the splitting duties. Need a pool to guess how many ghosts are out there, but guess we won't see the number. Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. |
rob smith Send message Joined: 7 Mar 03 Posts: 22149 Credit: 416,307,556 RAC: 380 |
Lost tasks come back "automagically"..... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
"In too many cases, it's not a good idea to run multiple projects, on 1 host." What data do you have to support that assumption? And your personal experiences are too small a data point to be statistically significant. I offset them with mine of running 8 projects without having any of the problems you have. At the moment I've been out of SETI gpu work for a while, running Einstein instead. I'm down to 56 cpu APs (about 8 days worth) that are alternating with LHC and Rosetta. When the servers first started screwing up this weekend I went NNT on SETI. Call it foresight, experience, luck, whatever, but I have 0 ghosts and continuous work to do on my cruncher. When the lab boys get this little snafu fixed I'll be waiting because I have what seems to be in short supply around here patience. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14645 Credit: 200,643,578 RAC: 874 |
Lost tasks come back "automagically"..... Not yet, they haven't. I'm with Fred and fscheel so far - really quick turnround on requests, but always "Project has no tasks available". I'll let them keep asking, and see what happens over the next few hours. |
AllanB Send message Joined: 2 Sep 12 Posts: 282 Credit: 425,090 RAC: 0 |
I had set NNT so to run down my cache to switch off my machine. When I looked after it had finished crunching I had 20 or so ghosts, so I unset NNT for a while to try and see if I could get them. I now appear to have 329 tasks I haven't' got. This is much more than my cache is set for,will I still get them? PS I have received none so far as well. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14645 Credit: 200,643,578 RAC: 874 |
I had set NNT so to run down my cache to switch off my machine. When I looked after it had finished crunching I had 20 or so ghosts, so I unset NNT for a while to try and see if I could get them. I now appear to have 329 tasks I haven't' got. You should get them as you request work over the next few days, no more than 20 tasks per request. Your computer won't be suddenly overwhelmed with work, and if you don't get through them all in time, don't worry - they'll get sent to somebody else instead. PS I have received none so far as well. One of my requests has got a resend now, but other machines are still dry. Never mind, the journey of a thousand WUs starts with a single crunch... |
Fred E. Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0 |
had set NNT so to run down my cache to switch off my machine. When I looked after it had finished crunching I had 20 or so ghosts, so I unset NNT for a while to try and see if I could get them. I now appear to have 329 tasks I haven't' got. When they start flowing again, you should get some until BOINC stops asking for work due to the cache setting. No request = no work. You may have to increase that setting, or just be patient and get them over a period of days. Edit: Richard beat me to it. Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
Well there was a dip in the cricket graphs and about then I got 20 members of my Ghost Army downloaded. However now that the cricket graphs are back up, I'm getting scheduler timeouts again as it's trying to get my ghosts and report newly done units. :( "Life is just nature's way of keeping meat fresh." - The Doctor |
AllanB Send message Joined: 2 Sep 12 Posts: 282 Credit: 425,090 RAC: 0 |
Just got two lots of 20!! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.