X-Git-Url: http://pere.pagekite.me/gitweb/homepage.git/blobdiff_plain/155ed3e2a8fe981b6b447721a485a67be8771f2e..3882048f60f47ce7edb89cc3816978cff32551f9:/blog/data/2012-10-26-system-downtime.txt diff --git a/blog/data/2012-10-26-system-downtime.txt b/blog/data/2012-10-26-system-downtime.txt index bce5c87b0e..fa25a3e772 100644 --- a/blog/data/2012-10-26-system-downtime.txt +++ b/blog/data/2012-10-26-system-downtime.txt @@ -1,5 +1,5 @@ Title: 12 years of outages - summarised by Stuart Kendrick -Tags: english, nuug, standard +Tags: english, nuug, standard, usenix Date: 2012-10-26 14:20
I work at the University of Oslo @@ -19,7 +19,7 @@ it every time.
article by Stuart Kendrick from Fred Hutchinson Cancer Research Center titled "What -Takes Us Down" (also +Takes Us Down" (longer version also available from his own site), where he report what he found when he processed the outage reports (both planned and unplanned) from the @@ -28,10 +28,11 @@ etc etc. The article is a good read to get some empirical data on what kind of problems affect a data centre, but what really inspired me was the kind of reporting they had put in place since 2000.-
The centre set up a mailing list, and send fairly standardised -messages to this list when a outage was planned or when it already -occurred. Here is the two example from the article: First the -unplanned outage: +
The centre set up a mailing list, and started to send fairly +standardised messages to this list when a outage was planned or when +it already occurred, to announce the plan and get feedback on the +assumtions on scope and user impact. Here is the two example from the +article: First the unplanned outage:
Subject: Exchange 2003 Cluster Issues