Cogent down, May 7th, midnight


log in

Advanced search

Message boards : Number crunching : Cogent down, May 7th, midnight

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author Message
Profile betonklaus
Avatar
Send message
Joined: 28 Feb 03
Posts: 10
Credit: 9,559,576
RAC: 10,175
Germany
Message 108622 - Posted: 7 May 2005, 19:46:58 UTC

However it is - keep smiling :-)
____________
Lebe den Tag - es könnte dein letzter sein.....

Live the Day - maybe it`s your last.....

Profile slavko.sk
Avatar
Send message
Joined: 27 Jun 00
Posts: 346
Credit: 408,100
RAC: 0
Slovakia
Message 108623 - Posted: 7 May 2005, 19:51:04 UTC

Connection is still down.
____________
ALL GLORY TO THE HYPNOTOAD!
Potrebujete pomoc?
My Stats

Profile MikeSW17
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 1603
Credit: 2,700,523
RAC: 0
United Kingdom
Message 108642 - Posted: 7 May 2005, 21:13:29 UTC
Last modified: 7 May 2005, 21:14:48 UTC

I wonder what is going on at Cogent.

By their own statistics (http://www.cogentco.com/htdocs/stats.php) their Mean Time to Repair pretty nearly doubled to 4.13 hours in March (last month that figures are given).

This outage has now lasted 14 hours.

If they carry on, with a MTBF of just over two days (from our recent experiences), it wont be long before MTTR exceeds MTBF.

The future of the internet may not be rosy if, as they claim, Cogent is in the top 10 of internet backbone providers.

____________

Profile mikey
Volunteer tester
Avatar
Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 108644 - Posted: 7 May 2005, 21:39:46 UTC - in response to Message 108501.

Before anyone says 'Increase you cache', I have an odd requirement.
Some weekends I get hold of other PCs - I may do as I wish over the weekend, including BOINC. But I only have a 1 hour download window on Friday night, and a few hours on Sunday night/Monday morning to return the work. So I d/l 3 days work process it then I cannot upload it. Also, BOINC must be uninstalled and totally deleted (including any work) by monday morning. So it's sorry to all you chaps waiting for credit for work I've done but deleted before update.
>
You COULD just backup all the projects to a CD, each computer with its own directory, and then copy then back the next weekend finishing them and returning the ones you couldn't do. Is more of a pain but at least YOU won't be the problem with the unfinished units anymore.
If you are a programmer you even automate the process. Make the directories by copying them off of the cd onto each computer Friday evening, then copy them back to a cd and delete the directories on Sunday night, Monday morning.
It would ALMOST be like a backup process.

____________

JRinJRZY
Send message
Joined: 15 Mar 03
Posts: 15
Credit: 79,489
RAC: 0
United States
Message 108650 - Posted: 7 May 2005, 22:00:05 UTC - in response to Message 108612.

<blockquote><blockquote>I spoke with cogent at 11:30 EDT asking for any kind of update or info..evidently the number posted on here is the network backbone operations department and they were not too pleased that John Q. Citizen called to ask the status of the outage. Not happy at all.
</blockquote>
It's not so much "leave it to the pros" as "we are not their customers"

If everyone called Cogent we'd be launching a Denial of Service attack against their NOC personnel.

We don't want to do that, we want them to work the problem.

Besides, the outage could be on the campus side of the circuit.</blockquote>

That is exactly what the Cogent techician told me when I called..if this continues it will result in a denial of service attack
____________

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,452,087
RAC: 3,265
United States
Message 108653 - Posted: 7 May 2005, 22:11:41 UTC - in response to Message 108650.

I'm beginning to wonder what with this extended Cogent outage if some consideration and planning for an alternative link with fail-over capability might begin to make sense.

I admit this might well be overkill, and it definitely would be overkill if the primary link had proven ongoing high reliability....

Multiple 12 hour plus outages in a week should not be something that passes without consideration of alternatives.

____________

Profile MikeSW17
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 1603
Credit: 2,700,523
RAC: 0
United Kingdom
Message 108661 - Posted: 7 May 2005, 22:50:25 UTC
Last modified: 7 May 2005, 22:51:07 UTC

True, we are not the Cogent customer.

Just as importantly, we dont know what Service Level Agreement they have with Berkeley.
These problems - although we percieve them as unacceptable - may be within the terms of the SLA. (e.g. no w/e, 24hr response rather than 4hr etc)

I have no idea what Cogent offers, but it is reasonable to assume that if there are any cost-saveings to be had, then given the limited project funds, they would have been taken if reasonable in the context of a non-commercial operation.

I have to agree that we must not involve ourselves, because we are ignorant of the details.

That said, after 16 hours, someone must know what the fault is even if no repair time has been set. It would be nice to know.



____________

Profile MikeSW17
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 1603
Credit: 2,700,523
RAC: 0
United Kingdom
Message 108662 - Posted: 7 May 2005, 22:58:59 UTC - in response to Message 108653.

<blockquote>I'm beginning to wonder what with this extended Cogent outage if some consideration and planning for an alternative link with fail-over capability might begin to make sense.

I admit this might well be overkill, and it definitely would be overkill if the primary link had proven ongoing high reliability....

Multiple 12 hour plus outages in a week should not be something that passes without consideration of alternatives.
</blockquote>

I dont see that ever happening Barry. Cogent was chosen, presumably, for their low rates. The costs of an alternate route would almost certainly exceed the cost of picking a more reliable provider in the first place.
Remember this is not a commercial operation, and no-one loses anything (besides good karma) through these outages.

____________

Miklos M.
Send message
Joined: 5 May 99
Posts: 769
Credit: 16,991,193
RAC: 14,496
United States
Message 108668 - Posted: 7 May 2005, 23:19:57 UTC - in response to Message 108615.

<blockquote>I appreciate the desire to call Cogent and complain, but all parties involved are aware of the problem and are looking into it. That's all we know so far, and basically all we can do is wait.

- Matt</blockquote>

Matt, thank you for the update. Any more news on when we could be connecting again?

Nick
____________

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,452,087
RAC: 3,265
United States
Message 108672 - Posted: 7 May 2005, 23:35:30 UTC - in response to Message 108662.

You are probably correct, but low rates for unreliable connectivity doesn't amount to a bargain.


</blockquote>
I dont see that ever happening Barry. Cogent was chosen, presumably, for their low rates. The costs of an alternate route would almost certainly exceed the cost of picking a more reliable provider in the first place.
Remember this is not a commercial operation, and no-one loses anything (besides good karma) through these outages.
</blockquote>
____________

Profile Graeme of Boinc UK
Send message
Joined: 25 Nov 02
Posts: 114
Credit: 1,250,273
RAC: 0
United Kingdom
Message 108673 - Posted: 7 May 2005, 23:45:20 UTC - in response to Message 108539.

<blockquote>just a reminder i belive i heard some whare that the Cogent line is donated to them so costs them nothing . So dont all call and piss Cogent off to point they pull the connection witch is fully in there rights if it is only donated by them and seti is not a paying customer.</blockquote>

Come on you lot!
Email an information mail not pick up the phone and blast the person on the end
of the line!
That generally gets you nowhere.
Seti@Home are now aware of the problem so there is no need for any further action on our part.
There is a huge difference between been made aware and complaining!
Hopefully normal service will be restored soon.
I do seem to remember S@H mentioning that they have to pay for bandwith so it may be that Cogent are not donating anything to the project.
If this is the case then I believe that Seti@Home has every right to take this matter up with Cogent through the correct channels of communication regarding the poor service provision to a paying customer.

Out of work now so the pc is going to have a rest.
I hope the hard drive does not stall when I come to restart it, it has been running constantly for almost ten months!

Regards,
Graeme.

____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 108679 - Posted: 7 May 2005, 23:59:18 UTC - in response to Message 108653.

<blockquote>I'm beginning to wonder what with this extended Cogent outage if some consideration and planning for an alternative link with fail-over capability might begin to make sense.</blockquote>
Backups and fail-overs are expensive, and an essential part of the BOINC design is to support "millions of users with a modest server complex" and I would not call a site that is multihomed or with a backup/fail-over link modest. (see http://boinc.berkeley.edu/grid_paper_04.pdf for a paper on the design goals).

An essential part of that is a BOINC client that can handle an outage, or even multiple outages.

... and so far, folks with an appropriate cache get along just fine. For those running BOINC 4.3x the client will crunch SETI exclusively for a while after the outage to "make up" lost time.

That doesn't apply to SETI-only crunchers, they need to set their cache appropriately.
____________

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,452,087
RAC: 3,265
United States
Message 108681 - Posted: 8 May 2005, 0:08:34 UTC - in response to Message 108679.

</blockquote>
Backups and fail-overs are expensive, and an essential part of the BOINC design is to support "millions of users with a modest server complex" and I would not call a site that is multihomed or with a backup/fail-over link modest.

An essential part of that is a BOINC client that can handle an outage, or even multiple outages.
</blockquote>

I understand -- I don't know what the costs are involving a fail over configuration of the sort designed to support thousands ('millions') of users. Though it strikes me that a design to support millions of internet users which has multiple single point of failure constraints is one which of course MUST be designed to handle multiple outages. The client and its caching design do address that to some degree.

I don't know what the available funding is, and what sort of trades are available.

I do suspect the on site folks are by now (18 hours into an outage), in *close* contact with problem solvers at Cogent trying not only to get a handle on the problem, but also on reliable solutions.

90% reliability is not something you design for.


____________

ABT Chuck P
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 91
Credit: 316,669
RAC: 0
United States
Message 108686 - Posted: 8 May 2005, 0:20:23 UTC - in response to Message 108679.

<blockquote><blockquote>

... and so far, folks with an appropriate cache get along just fine. For those running BOINC 4.3x the client will crunch SETI exclusively for a while after the outage to "make up" lost time.

That doesn't apply to SETI-only crunchers, they need to set their cache appropriately.</blockquote>

====================
Provided the 4.3x client will let the 2nd or more project(s) try to connect for something else. My HT machine finished all Seti units early in the AM. Finished one of 2 Einstein units this afternoon and made no attempts to grab more Einstein work. Machine ran one processer till that unit finished and still refused to DL more Einstein work.

Finally rolled back to cc 4.19 and increased Einstein connect from 0.3 to 0.5 before I got 3 Einies.
____________

fcumglen
Avatar
Send message
Joined: 15 Mar 02
Posts: 14
Credit: 736,862
RAC: 0
France
Message 108698 - Posted: 8 May 2005, 0:34:44 UTC - in response to Message 108642.

<blockquote>I wonder what is going on at Cogent.

By their own statistics (http://www.cogentco.com/htdocs/stats.php) their Mean Time to Repair pretty nearly doubled to 4.13 hours in March (last month that figures are given).

This outage has now lasted 14 hours.

If they carry on, with a MTBF of just over two days (from our recent experiences), it wont be long before MTTR exceeds MTBF.

The future of the internet may not be rosy if, as they claim, Cogent is in the top 10 of internet backbone providers.
</blockquote>

i think we all have to ask how cogent got to be where they are now
it seems very "fishy" as we say here

fcumglen
Avatar
Send message
Joined: 15 Mar 02
Posts: 14
Credit: 736,862
RAC: 0
France
Message 108704 - Posted: 8 May 2005, 0:42:38 UTC - in response to Message 108438.

<blockquote>Here we go again then! This might be a cheaper service but if it doesn't work....what's the point?</blockquote>


have a close look who owns cogent and who finances a company who only loses money!!!!!!!!!!!!!!!!!

Profile John Cropper
Avatar
Send message
Joined: 3 May 00
Posts: 444
Credit: 416,933
RAC: 0
United States
Message 108716 - Posted: 8 May 2005, 1:06:42 UTC - in response to Message 108704.

<blockquote>have a close look who owns cogent and who finances a company who only loses money!!!!!!!!!!!!!!!!!</blockquote>

Don't blame the French...they traded some perfectly good wine and cheese to the Australian consortium that USED to own it.

I just hope that Berkeley is getting a HEFTY discount for all these outages. Most commercial connectivity contracts I've been associated with have a penalty clause for outages.
____________

Stewie: So, is there any tread left on the tires? Or at this point would it be like throwing a hot dog down a hallway?

Fox Sunday (US) at 9PM ET/PT

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,452,087
RAC: 3,265
United States
Message 108726 - Posted: 8 May 2005, 1:23:23 UTC - in response to Message 108716.

Pessimism mode on:

I have this sinking feeling that this outage may not get worked on by the folks remaining at Cogent until sometime on Monday, resulting in the link being up either late on Monday or even later, and, as a result of a multiple day outage, seeing a 24 hour plus recovery period at Berkeley (perhaps just in time for another Cogent outage).

Pessimism mode off

On the other hand, this outage has provided me with a reason to set up an Einstein account.
____________

Profile Captain Avatar
Volunteer tester
Avatar
Send message
Joined: 17 May 99
Posts: 15133
Credit: 516,716
RAC: 0
United States
Message 108740 - Posted: 8 May 2005, 1:50:17 UTC - in response to Message 108726.



On the other hand, this outage has provided me with a reason to set up an Einstein account.</blockquote>

Bout Time Barry!
____________

KB7RZF
Volunteer tester
Avatar
Send message
Joined: 15 Aug 99
Posts: 9464
Credit: 3,112,738
RAC: 10
United States
Message 108743 - Posted: 8 May 2005, 1:56:14 UTC

Same with me, just joined up with einstein as well, allocated 90.91%resources to seti and 9.09 to einstein. Changed my cache sizes for seti from 10 days to 6, and set einstein to 3. Hopefully I set it ok, will see.

Jeremy
KB7RZF
____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Cogent down, May 7th, midnight

Copyright © 2014 University of California