+Title: Some notes on fault tolerant storage systems
+Tags: english, sysadmin, raid
+Date: 2017-11-01 15:30
+
+<p>If you care about how fault tolerant your storage is, you might
+find these articles and papers interesting. They have formed how I
+think of when designing a storage system.</p>
+
+<ul>
+
+<li>USENIX :login; <a
+href="https://www.usenix.org/publications/login/summer2017/ganesan">Redundancy
+Does Not Imply Fault Tolerance. Analysis of Distributed Storage
+Reactions to Single Errors and Corruptions</a> by Aishwarya Ganesan,
+Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi
+H. Arpaci-Dusseau</li>
+
+
+<li>ZDNet
+<a href="http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/">Why
+RAID 5 stops working in 2009</a> by Robin Harris</li>
+
+<li>ZDNet
+<a href="http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/">Why
+RAID 6 stops working in 2019</a> by Robin Harris</li>
+
+<li>USENIX FAST'07
+<a href="http://research.google.com/archive/disk_failures.pdf">Failure
+Trends in a Large Disk Drive Population</a> by Eduardo Pinheiro,
+Wolf-Dietrich Weber and Luiz AndreĢ Barroso</li>
+
+<li>USENIX ;login: <a
+href="https://www.usenix.org/system/files/login/articles/hughes12-04.pdf">Data
+Integrity. Finding Truth in a World of Guesses and Lies</a> by Doug
+Hughes</li>
+
+<li>USENIX FAST'08
+<a href="https://www.usenix.org/events/fast08/tech/full_papers/bairavasundaram/bairavasundaram_html/">An
+cAnalysis of Data Corruption in the Storage Stack</a> -
+L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C.
+Arpaci-Dusseau, and R. H. Arpaci-Dusseau</li>
+
+<li>USENIX FAST'07 <a
+href="https://www.usenix.org/legacy/events/fast07/tech/schroeder/schroeder_html/">Disk
+failures in the real world: what does an MTTF of 1,000,000 hours mean
+to you?</a> by B. Schroeder and G. A. Gibson.</li>
+
+<li>USENIX ;login: <a
+href="https://www.usenix.org/events/fast08/tech/full_papers/jiang/jiang_html/">Are
+Disks the Dominant Contributor for Storage Failures? A Comprehensive
+Study of Storage Subsystem Failure Characteristics</a> by Weihang
+Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky</li>
+
+<li>SIGMETRICS 2007
+<a href="http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.pdf">An
+analysis of latent sector errors in disk drives</a> -
+L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler</li>
+
+</ul>
+
+<p>Several of these research papers are based on data collected from
+hundred thousands or millions of disk, and their findings are eye
+opening. The short story is simply do not implicitly trust RAID or
+redundant storage systems. Details matter.</p>