Generated.

[homepage.git] / blog / data / 2012-10-26-system-downtime.txt
diff --git a/blog/data/2012-10-26-system-downtime.txt b/blog/data/2012-10-26-system-downtime.txt

index 821c05bb5666ccfcd875f7c2feb76f9507c0a754..fa25a3e772f77819d66780cb2abfe16220aa05f1 100644 (file)
--- a/blog/data/2012-10-26-system-downtime.txt
+++ b/blog/data/2012-10-26-system-downtime.txt
@@ -1,29 +1,26 @@
  Title: 12 years of outages - summarised by Stuart Kendrick
-Tags: english, nuug, standard
-Date: 2012-10-26 10:20
+Tags: english, nuug, standard, usenix
+Date: 2012-10-26 14:20
  
-<p>I work at the <ahref="http://www.uio.no/">University of Oslo</a>
+<p>I work at the <a href="http://www.uio.no/">University of Oslo</a>
  looking after the computers, mostly on the unix side, but in general
  all over the place.  I am also a member (and currently leader) of
-<ahref="http://www.nuug.no/">the NUUG association</a>, which in turn
-make me a member of <ahref="http://www.usenix.org/">USENIX</a>.  NUUG
+<a href="http://www.nuug.no/">the NUUG association</a>, which in turn
+make me a member of <a href="http://www.usenix.org/">USENIX</a>.  NUUG
  is an member organisation for us in Norway interested in free
  software, open standards and unix like operating systems, and USENIX
-is a US based member organisation with similar targets.  I tend to
-distill it down to the simple statement that all the skilled computer
-people are members of NUUG, which while a goal is still not quite
-reflected in reality.  And thanks to these memberships, I get all
-issues of the great USENIX magazine
-<ahref="https://www.usenix.org/publications/login">;login:</a> in the
+is a US based member organisation with similar targets.  And thanks to
+these memberships, I get all issues of the great USENIX magazine
+<a href="https://www.usenix.org/publications/login">;login:</a> in the
  mail several times a year.  The magazine is great, and I read most of
  it every time.</p>
  
  <p>In the last issue of the USENIX magazine ;login:, there is an
-article by <ahref="http://www.skendric.com/">Stuart Kendrick</a> from
+article by <a href="http://www.skendric.com/">Stuart Kendrick</a> from
  Fred Hutchinson Cancer Research Center titled
-<ahref="https://www.usenix.org/publications/login/october-2012-volume-37-number-5/what-takes-us-down">What
-Takes Us Down</a> (also
-<ahref="http://www.skendric.com/problem/incident-analysis/2012-06-30/What-Takes-Us-Down.pdf">available
+"<a href="https://www.usenix.org/publications/login/october-2012-volume-37-number-5/what-takes-us-down">What
+Takes Us Down</a>" (longer version also
+<a href="http://www.skendric.com/problem/incident-analysis/2012-06-30/What-Takes-Us-Down.pdf">available
  from his own site</a>), where he report what he found when he
  processed the outage reports (both planned and unplanned) from the
  last twelve years and classified them according to cause, time of day,
@@ -31,10 +28,11 @@ etc etc.  The article is a good read to get some empirical data on
  what kind of problems affect a data centre, but what really inspired
  me was the kind of reporting they had put in place since 2000.<p>
  
-<p>The centre set up a mailing list, and send fairly standardised
-messages to this list when a outage was planned or when it already
-occurred.  Here is the two example from the article: First the
-unplanned outage:
+<p>The centre set up a mailing list, and started to send fairly
+standardised messages to this list when a outage was planned or when
+it already occurred, to announce the plan and get feedback on the
+assumtions on scope and user impact.  Here is the two example from the
+article: First the unplanned outage:
  
  <blockquote><pre>
  Subject:     Exchange 2003 Cluster Issues
@@ -67,20 +65,20 @@ User Impact: All users on H2 will be isolated from the network during
              this work. Afterward, they will have gigabit
              connectivity.
  Technician:  [xxx]
-<blockquote><pre>
+</pre></blockquote>
  
  <p>He notes in his article that the date formats and other fields have
  been a bit too free form to make it easy to automatically process them
  into a database for further analysis, and I would have used ISO 8601
  dates myself to make it easier to process (in other words I would ask
-people to write '2012-06-16 06:00' instead of the start time format
-listed above).  There are also other issues with the format that could
-be improved, read the article for the details.</p>
+people to write '2012-06-16 06:00 +0000' instead of the start time
+format listed above).  There are also other issues with the format
+that could be improved, read the article for the details.</p>
  
  <p>I find the idea of standardising outage messages seem to be such a
  good idea that I would like to get it implemented here at the
  university too.  We do register
-<ahref="http://www.uio.no/tjenester/it/aktuelt/planlagte-tjenesteavbrudd/">planned
+<a href="http://www.uio.no/tjenester/it/aktuelt/planlagte-tjenesteavbrudd/">planned
  changes and outages in a calendar</a>, and report the to a mailing
  list, but we do not do so in a structured format and there is not a
  report to the same location for unplanned outages.  Perhaps something