Petter Reinholdtsen

Some notes on fault tolerant storage systems
1st November 2017

If you care about how fault tolerant your storage is, you might find these articles and papers interesting. They have formed how I think of when designing a storage system.

Several of these research papers are based on data collected from hundred thousands or millions of disk, and their findings are eye opening. The short story is simply do not implicitly trust RAID or redundant storage systems. Details matter. And unfortunately there are few options on Linux addressing all the identified issues. Both ZFS and Btrfs are doing a fairly good job, but have legal and practical issues on their own. I wonder how cluster file systems like Ceph do in this regard.

Just remember, in the end, it do not matter how redundant, or how fault tolerant your storage is, if you do not continuously monitor its status to detect and replace failed disks.

Tags: english, raid, sysadmin.

Created by Chronicle v4.6