-<p>As usual, if you use Bitcoin and want to show your support of my
-activities, please send Bitcoin donations to my address
-<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>
-</description>
- </item>
-
- <item>
- <title>Some notes on fault tolerant storage systems</title>
- <link>http://people.skolelinux.org/pere/blog/Some_notes_on_fault_tolerant_storage_systems.html</link>
- <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Some_notes_on_fault_tolerant_storage_systems.html</guid>
- <pubDate>Wed, 1 Nov 2017 15:35:00 +0100</pubDate>
- <description><p>If you care about how fault tolerant your storage is, you might
-find these articles and papers interesting. They have formed how I
-think of when designing a storage system.</p>
-
-<ul>
-
-<li>USENIX :login; <a
-href="https://www.usenix.org/publications/login/summer2017/ganesan">Redundancy
-Does Not Imply Fault Tolerance. Analysis of Distributed Storage
-Reactions to Single Errors and Corruptions</a> by Aishwarya Ganesan,
-Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi
-H. Arpaci-Dusseau</li>
-
-<li>ZDNet
-<a href="http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/">Why
-RAID 5 stops working in 2009</a> by Robin Harris</li>
-
-<li>ZDNet
-<a href="http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/">Why
-RAID 6 stops working in 2019</a> by Robin Harris</li>
-
-<li>USENIX FAST'07
-<a href="http://research.google.com/archive/disk_failures.pdf">Failure
-Trends in a Large Disk Drive Population</a> by Eduardo Pinheiro,
-Wolf-Dietrich Weber and Luiz André Barroso</li>
-
-<li>USENIX ;login: <a
-href="https://www.usenix.org/system/files/login/articles/hughes12-04.pdf">Data
-Integrity. Finding Truth in a World of Guesses and Lies</a> by Doug
-Hughes</li>
-
-<li>USENIX FAST'08
-<a href="https://www.usenix.org/events/fast08/tech/full_papers/bairavasundaram/bairavasundaram_html/">An
-Analysis of Data Corruption in the Storage Stack</a> by
-L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C.
-Arpaci-Dusseau, and R. H. Arpaci-Dusseau</li>
-
-<li>USENIX FAST'07 <a
-href="https://www.usenix.org/legacy/events/fast07/tech/schroeder/schroeder_html/">Disk
-failures in the real world: what does an MTTF of 1,000,000 hours mean
-to you?</a> by B. Schroeder and G. A. Gibson.</li>
-
-<li>USENIX ;login: <a
-href="https://www.usenix.org/events/fast08/tech/full_papers/jiang/jiang_html/">Are
-Disks the Dominant Contributor for Storage Failures? A Comprehensive
-Study of Storage Subsystem Failure Characteristics</a> by Weihang
-Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky</li>
-
-<li>SIGMETRICS 2007
-<a href="http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.pdf">An
-analysis of latent sector errors in disk drives</a> by
-L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler</li>
-
-</ul>
-
-<p>Several of these research papers are based on data collected from
-hundred thousands or millions of disk, and their findings are eye
-opening. The short story is simply do not implicitly trust RAID or
-redundant storage systems. Details matter. And unfortunately there
-are few options on Linux addressing all the identified issues. Both
-ZFS and Btrfs are doing a fairly good job, but have legal and
-practical issues on their own. I wonder how cluster file systems like
-Ceph do in this regard. After all, there is an old saying, you know
-you have a distributed system when the crash of a computer you have
-never heard of stops you from getting any work done. The same holds
-true if fault tolerance do not work.</p>
-
-<p>Just remember, in the end, it do not matter how redundant, or how
-fault tolerant your storage is, if you do not continuously monitor its
-status to detect and replace failed disks.</p>
-