Message boards :
Number crunching :
Panic Mode On (8) Server problems
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 15 · Next
Author | Message |
---|---|
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
"Results out in the field" is very high also and has been going up the past couple of hours at least. I don't think I've ever seen network traffic that high except during recovery and/or when they are backing-up data offsite. There seems to be lot of something going out. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
Not looking good- ready to send is down to 60,000 & result creation rate is down to .62 I'm also now unable to return completed work (system connect errors), i expect being due to the server load. Grant Darwin NT |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Ready to send is now down to 0, so work will only be available as produced. That's bad news in the sense that many requests will return a no work available reply, but good news for getting, returning, and reporting since the Cricket graphs have dropped considerably. I had a dozen downloads which had been struggling for 2 hours getting little bits and pieces then timing out. They took off ten minutes ago and downloaded at my full dial-up rate. Joe |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
....the Cricket graphs have dropped considerably. I had a dozen downloads which had been struggling for 2 hours getting little bits and pieces then timing out. That's because the ready to send queue is now down to 0 & the splitters still aren't spliting; result creation rate is less than 2. With the workload at the time it ran out it'd have to be around 20-25/s just to meet demand, let alone try & build up some reserves. I've suspended my network activity till things settle down again. No point hammering the servers when they've got nothing to serve... Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
A couple of the splitters that weren't running now are, but the result creation rate is only about 10/s. Looks like there's something still gumming up the works. Grant Darwin NT |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
In the ongoing saga of BOINC project issues this past week, Rosetta appears to have totally lost its connectivity to the internet as of about 20 minutes ago. Nada, vapor. They (like SETI) were having a problem generating work, that seemed to have been solved yesterday, but now they in vapor land. That makes three projects in trouble at the moment -- Einstein (which may resurface this coming week), SETI (with no work available for the past couple of days) and Rosetta. For me, it just pushes work share to functioning projects. At the beginning of this year, my accumulated work split was: Einstein - 26.7% SETI -- 19.5% World Grid - 17.2% Rosetta - 13.8% Climate - 13.7% Spinhenge - 5.4% Predictor - 2.2% Climate BBC - 1.5% My weekly split back then was Spinhenge 28.2% Rosetta 22.1% Climate 17.0% SETI 14.1% World Grid 10.1% Einstein 8.5% In the interim, I added Malaria (replacing the dead Predictor project) So my cumulative credit split 8 months later: Einstein - 19.4% SETI - 17.1% Climate - 15.7% Rosetta -- 15.5% World Grid - 14.0% Spinhenge - 13.0% Malaria - 2.7% Predictor - 1.5% Climate BBC 1.1% My current weekly splits are: Spinhenge: 25.2% Malaria 19.9% Climate 19.8% Rosetta 14.6% SETI 10.4% World Grid 5.9% Einstein 4.2% What drives resource share sets for me are project reliability, clean work units, and project status communication. Spinhenge has been very solid, and as a small project, it quite good at letting folks know when they have issues. Malaria is a new project for me, it has run quite well (just periodic 1 hours bumps over the past few months). Climate has occassional problems -- but mostly with getting trickles reported, as a long cycle project those problems resolve on themselves, plus they have solid forum moderation. Rosetta has been quite solid (notwithstanding the recent issues), but has had some problems with work units that generated false positives for some AV software. SETI, is the 'big boy' project -- and the workload sometimes makes it difficult to keep it running regularly. It has (in my view) the best admin support communication, plus of course the largest (and very active) user community. But I can't run it on workstations which I don't monitor closely because of the periodic 'black hole' workunits (which particularly bother multi-core workstations it seems). So I can only run it on the locally accessed farm. World Grid seems to run well, but it is a rather different kettle of fish -- I don't participate in its newsgroups. Then there is Einstein -- I added it as my second project years ago when SETI was in teething problems, and it ran very solid for a long time. But over time, the switch to their mid-range work units has been less happy for me. so I've reduced its share of cycles. Also, whenever they complete one major batch of work units and move to a newer 'formulation' the transition has not been pretty. We are seeing that now -- and I may not reactivate for it until the dust settles (at a guess in two weeks). |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
I have my prefs set to .1 and 4 days. Same here, got about 2 days worth left to go. Grant Darwin NT |
Andre Howard Send message Joined: 16 May 99 Posts: 124 Credit: 217,463,217 RAC: 0 |
I have my prefs set to .1 and 4 days. Seems to be a popular preferences setting, same here too. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Well I hope that the boyz (Eric and Matt) can squeeze some more horsepower out of the splitters tomorrow or by Tuesday's outrage at the latest. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
I have my prefs set to .1 and 4 days. Since running with a 4 day cache i think i've only run out of work twice in the last 3-4 years, and one of those times was only for a few hours. Most outages are only a day or so, but a 4 day buffer gives you enough work for these extended ones. Grant Darwin NT |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
It makes sense to set for a higher work buffer if you running only one or two projects. For me, a buffer of 1.5 days works ok -- when a project has problems sending work, one of the other 4 active projects tends to pick up the slack. What I did do recently is increase my 'three project' workstations to four or five as an insurance. My machines still have about 2 or 3 days of work left on them so I am not in any trouble yet. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
It makes sense to set for a higher work buffer if you running only one or two projects. For me, a buffer of 1.5 days works ok -- when a project has problems sending work, one of the other 4 active projects tends to pick up the slack. The only problem with having a work buffer when connected to more than one project is that people tend to get in a lather when BOINC starts paying back any long term debt that's accumulated & doesn't download or process any work for their other projects for a while. The consternation & anguish can be considerable. Grant Darwin NT |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
One of the things that is different with the current lack of work problem here is that, like Einstein, this one is something of a self-generated problem. Typically, outages here are do to various surprises -- a particular server spilling guts, a connectivity problem, some previously unknown code problem, that sort of thing. It seems in this case, that in working on getting out the new style work units, something didn't get 'volume tested' and so the roll out has caused a fairly extensive (several days now) problem. I'm hoping it gets resolved soon though. Einstein has gotten bitten (even worse) with their new work unit rollout -- they have been pretty much offline for a week now. I am really glad that the other projects I've attached to have been running reasonably well this past week. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
Could the storm be over? Outgoing traffic down to 68Mb/s, incoming down to 10.6Mb/s. Ready to send buffer is full & best of all the buffer is full while the result creation rate is around 15. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Lookin' good to me.......at least on the surface. I am sure there are database issues that are still being dealt with..as well as AP rollout issues. But at least hosts asking for work are likely to get it now, and uploads no longer seem to be an issue. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Dorphas Send message Joined: 16 May 99 Posts: 118 Credit: 8,007,247 RAC: 0 |
grrrrrrrrrrrrrrr.. :( |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Hmmmm.......something for the boyz to look at in the morning..... The Scarecrow graphs show workunits awaiting validation steadily on the rise...... The status page shows ap_validate on bruno disabled....not sure if these are all AP WUs awaiting validation or if something else is afoot. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
Been getting occasional upload/download errors, Ok on 2nd or 3rd attempt. Had a look at the network graphs & it shows a lot of traffic spikes, mostly download but a few upload in there as well. And the overall traffic trend appears to be upward.... Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.