Bla bla.

[homepage.git] / blog / data / 2012-10-26-system-downtime.txt
diff --git a/blog/data/2012-10-26-system-downtime.txt b/blog/data/2012-10-26-system-downtime.txt

index aa75978d166972cf95d94e2f1ea1f798c74ad0c3..77be821575d4edb1cf0951e9eaba2aeed2e01820 100644 (file)
--- a/blog/data/2012-10-26-system-downtime.txt
+++ b/blog/data/2012-10-26-system-downtime.txt
@@ -19,7 +19,7 @@ it every time.</p>
  article by <a href="http://www.skendric.com/">Stuart Kendrick</a> from
  Fred Hutchinson Cancer Research Center titled
  "<a href="https://www.usenix.org/publications/login/october-2012-volume-37-number-5/what-takes-us-down">What
-Takes Us Down</a>" (also
+Takes Us Down</a>" (longer version also
  <a href="http://www.skendric.com/problem/incident-analysis/2012-06-30/What-Takes-Us-Down.pdf">available
  from his own site</a>), where he report what he found when he
  processed the outage reports (both planned and unplanned) from the
@@ -28,10 +28,11 @@ etc etc.  The article is a good read to get some empirical data on
  what kind of problems affect a data centre, but what really inspired
  me was the kind of reporting they had put in place since 2000.<p>
  
-<p>The centre set up a mailing list, and send fairly standardised
-messages to this list when a outage was planned or when it already
-occurred.  Here is the two example from the article: First the
-unplanned outage:
+<p>The centre set up a mailing list, and started to send fairly
+standardised messages to this list when a outage was planned or when
+it already occurred, to announce the plan and get feedback on the
+assumtions on scope and user impact.  Here is the two example from the
+article: First the unplanned outage:
  
  <blockquote><pre>
  Subject:     Exchange 2003 Cluster Issues
@@ -70,9 +71,9 @@ Technician:  [xxx]
  been a bit too free form to make it easy to automatically process them
  into a database for further analysis, and I would have used ISO 8601
  dates myself to make it easier to process (in other words I would ask
-people to write '2012-06-16 06:00' instead of the start time format
-listed above).  There are also other issues with the format that could
-be improved, read the article for the details.</p>
+people to write '2012-06-16 06:00 +0000' instead of the start time
+format listed above).  There are also other issues with the format
+that could be improved, read the article for the details.</p>
  
  <p>I find the idea of standardising outage messages seem to be such a
  good idea that I would like to get it implemented here at the