Message boards :
Number crunching :
Cogent down, May 7th, midnight
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
betonklaus Send message Joined: 28 Feb 03 Posts: 10 Credit: 31,836,074 RAC: 19 |
However it is - keep smiling :-) Lebe den Tag - es könnte dein letzter sein..... Live the Day - maybe it`s your last..... |
slavko.sk Send message Joined: 27 Jun 00 Posts: 346 Credit: 417,028 RAC: 0 |
|
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
I wonder what is going on at Cogent. By their own statistics (http://www.cogentco.com/htdocs/stats.php) their Mean Time to Repair pretty nearly doubled to 4.13 hours in March (last month that figures are given). This outage has now lasted 14 hours. If they carry on, with a MTBF of just over two days (from our recent experiences), it wont be long before MTTR exceeds MTBF. The future of the internet may not be rosy if, as they claim, Cogent is in the top 10 of internet backbone providers. |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
Before anyone says 'Increase you cache', I have an odd requirement. Some weekends I get hold of other PCs - I may do as I wish over the weekend, including BOINC. But I only have a 1 hour download window on Friday night, and a few hours on Sunday night/Monday morning to return the work. So I d/l 3 days work process it then I cannot upload it. Also, BOINC must be uninstalled and totally deleted (including any work) by monday morning. So it's sorry to all you chaps waiting for credit for work I've done but deleted before update. > You COULD just backup all the projects to a CD, each computer with its own directory, and then copy then back the next weekend finishing them and returning the ones you couldn't do. Is more of a pain but at least YOU won't be the problem with the unfinished units anymore. If you are a programmer you even automate the process. Make the directories by copying them off of the cd onto each computer Friday evening, then copy them back to a cd and delete the directories on Sunday night, Monday morning. It would ALMOST be like a backup process. |
JRinJRZY Send message Joined: 15 Mar 03 Posts: 15 Credit: 79,489 RAC: 0 |
<blockquote><blockquote>I spoke with cogent at 11:30 EDT asking for any kind of update or info..evidently the number posted on here is the network backbone operations department and they were not too pleased that John Q. Citizen called to ask the status of the outage. Not happy at all. </blockquote> It's not so much "leave it to the pros" as "we are not their customers" If everyone called Cogent we'd be launching a Denial of Service attack against their NOC personnel. We don't want to do that, we want them to work the problem. Besides, the outage could be on the campus side of the circuit.</blockquote> That is exactly what the Cogent techician told me when I called..if this continues it will result in a denial of service attack |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
I'm beginning to wonder what with this extended Cogent outage if some consideration and planning for an alternative link with fail-over capability might begin to make sense. I admit this might well be overkill, and it definitely would be overkill if the primary link had proven ongoing high reliability.... Multiple 12 hour plus outages in a week should not be something that passes without consideration of alternatives. |
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
True, we are not the Cogent customer. Just as importantly, we dont know what Service Level Agreement they have with Berkeley. These problems - although we percieve them as unacceptable - may be within the terms of the SLA. (e.g. no w/e, 24hr response rather than 4hr etc) I have no idea what Cogent offers, but it is reasonable to assume that if there are any cost-saveings to be had, then given the limited project funds, they would have been taken if reasonable in the context of a non-commercial operation. I have to agree that we must not involve ourselves, because we are ignorant of the details. That said, after 16 hours, someone must know what the fault is even if no repair time has been set. It would be nice to know. |
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
<blockquote>I'm beginning to wonder what with this extended Cogent outage if some consideration and planning for an alternative link with fail-over capability might begin to make sense. I admit this might well be overkill, and it definitely would be overkill if the primary link had proven ongoing high reliability.... Multiple 12 hour plus outages in a week should not be something that passes without consideration of alternatives. </blockquote> I dont see that ever happening Barry. Cogent was chosen, presumably, for their low rates. The costs of an alternate route would almost certainly exceed the cost of picking a more reliable provider in the first place. Remember this is not a commercial operation, and no-one loses anything (besides good karma) through these outages. |
Miklos M. Send message Joined: 5 May 99 Posts: 955 Credit: 136,115,648 RAC: 73 |
<blockquote>I appreciate the desire to call Cogent and complain, but all parties involved are aware of the problem and are looking into it. That's all we know so far, and basically all we can do is wait. - Matt</blockquote> Matt, thank you for the update. Any more news on when we could be connecting again? Nick |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
You are probably correct, but low rates for unreliable connectivity doesn't amount to a bargain. </blockquote> I dont see that ever happening Barry. Cogent was chosen, presumably, for their low rates. The costs of an alternate route would almost certainly exceed the cost of picking a more reliable provider in the first place. Remember this is not a commercial operation, and no-one loses anything (besides good karma) through these outages. </blockquote> |
Graeme of Boinc UK Send message Joined: 25 Nov 02 Posts: 114 Credit: 1,250,273 RAC: 0 |
<blockquote>just a reminder i belive i heard some whare that the Cogent line is donated to them so costs them nothing . So dont all call and piss Cogent off to point they pull the connection witch is fully in there rights if it is only donated by them and seti is not a paying customer.</blockquote> Come on you lot! Email an information mail not pick up the phone and blast the person on the end of the line! That generally gets you nowhere. Seti@Home are now aware of the problem so there is no need for any further action on our part. There is a huge difference between been made aware and complaining! Hopefully normal service will be restored soon. I do seem to remember S@H mentioning that they have to pay for bandwith so it may be that Cogent are not donating anything to the project. If this is the case then I believe that Seti@Home has every right to take this matter up with Cogent through the correct channels of communication regarding the poor service provision to a paying customer. Out of work now so the pc is going to have a rest. I hope the hard drive does not stall when I come to restart it, it has been running constantly for almost ten months! Regards, Graeme. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
<blockquote>I'm beginning to wonder what with this extended Cogent outage if some consideration and planning for an alternative link with fail-over capability might begin to make sense.</blockquote> Backups and fail-overs are expensive, and an essential part of the BOINC design is to support "millions of users with a modest server complex" and I would not call a site that is multihomed or with a backup/fail-over link modest. (see http://boinc.berkeley.edu/grid_paper_04.pdf for a paper on the design goals). An essential part of that is a BOINC client that can handle an outage, or even multiple outages. ... and so far, folks with an appropriate cache get along just fine. For those running BOINC 4.3x the client will crunch SETI exclusively for a while after the outage to "make up" lost time. That doesn't apply to SETI-only crunchers, they need to set their cache appropriately. |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
</blockquote> Backups and fail-overs are expensive, and an essential part of the BOINC design is to support "millions of users with a modest server complex" and I would not call a site that is multihomed or with a backup/fail-over link modest. An essential part of that is a BOINC client that can handle an outage, or even multiple outages. </blockquote> I understand -- I don't know what the costs are involving a fail over configuration of the sort designed to support thousands ('millions') of users. Though it strikes me that a design to support millions of internet users which has multiple single point of failure constraints is one which of course MUST be designed to handle multiple outages. The client and its caching design do address that to some degree. I don't know what the available funding is, and what sort of trades are available. I do suspect the on site folks are by now (18 hours into an outage), in *close* contact with problem solvers at Cogent trying not only to get a handle on the problem, but also on reliable solutions. 90% reliability is not something you design for. |
ABT Chuck P Send message Joined: 15 May 99 Posts: 91 Credit: 316,669 RAC: 0 |
<blockquote><blockquote> ... and so far, folks with an appropriate cache get along just fine. For those running BOINC 4.3x the client will crunch SETI exclusively for a while after the outage to "make up" lost time. That doesn't apply to SETI-only crunchers, they need to set their cache appropriately.</blockquote> ==================== Provided the 4.3x client will let the 2nd or more project(s) try to connect for something else. My HT machine finished all Seti units early in the AM. Finished one of 2 Einstein units this afternoon and made no attempts to grab more Einstein work. Machine ran one processer till that unit finished and still refused to DL more Einstein work. Finally rolled back to cc 4.19 and increased Einstein connect from 0.3 to 0.5 before I got 3 Einies. |
fcumglen Send message Joined: 15 Mar 02 Posts: 14 Credit: 736,862 RAC: 0 |
<blockquote>I wonder what is going on at Cogent. By their own statistics (http://www.cogentco.com/htdocs/stats.php) their Mean Time to Repair pretty nearly doubled to 4.13 hours in March (last month that figures are given). This outage has now lasted 14 hours. If they carry on, with a MTBF of just over two days (from our recent experiences), it wont be long before MTTR exceeds MTBF. The future of the internet may not be rosy if, as they claim, Cogent is in the top 10 of internet backbone providers. </blockquote> i think we all have to ask how cogent got to be where they are now it seems very "fishy" as we say here |
fcumglen Send message Joined: 15 Mar 02 Posts: 14 Credit: 736,862 RAC: 0 |
<blockquote>Here we go again then! This might be a cheaper service but if it doesn't work....what's the point?</blockquote> have a close look who owns cogent and who finances a company who only loses money!!!!!!!!!!!!!!!!! |
John Cropper Send message Joined: 3 May 00 Posts: 444 Credit: 416,933 RAC: 0 |
<blockquote>have a close look who owns cogent and who finances a company who only loses money!!!!!!!!!!!!!!!!!</blockquote> Don't blame the French...they traded some perfectly good wine and cheese to the Australian consortium that USED to own it. I just hope that Berkeley is getting a HEFTY discount for all these outages. Most commercial connectivity contracts I've been associated with have a penalty clause for outages. Stewie: So, is there any tread left on the tires? Or at this point would it be like throwing a hot dog down a hallway? Fox Sunday (US) at 9PM ET/PT |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
Pessimism mode on: I have this sinking feeling that this outage may not get worked on by the folks remaining at Cogent until sometime on Monday, resulting in the link being up either late on Monday or even later, and, as a result of a multiple day outage, seeing a 24 hour plus recovery period at Berkeley (perhaps just in time for another Cogent outage). Pessimism mode off On the other hand, this outage has provided me with a reason to set up an Einstein account. |
Captain Avatar Send message Joined: 17 May 99 Posts: 15133 Credit: 529,088 RAC: 0 |
On the other hand, this outage has provided me with a reason to set up an Einstein account.</blockquote> Bout Time Barry! |
KB7RZF Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2 |
Same with me, just joined up with einstein as well, allocated 90.91%resources to seti and 9.09 to einstein. Changed my cache sizes for seti from 10 days to 6, and set einstein to 3. Hopefully I set it ok, will see. Jeremy KB7RZF |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.