X-Git-Url: https://pere.pagekite.me/gitweb/homepage.git/blobdiff_plain/99cdd493109edaa5a3f6cb3e7b259b30725fee12..8136fa4ace9c41f8859841f7936029597a0a20bd:/blog/tags/raid/index.html diff --git a/blog/tags/raid/index.html b/blog/tags/raid/index.html index 13e4bd660b..88d5feff3c 100644 --- a/blog/tags/raid/index.html +++ b/blog/tags/raid/index.html @@ -20,6 +20,100 @@

Entries tagged "raid".

+
+
+ Some notes on fault tolerant storage systems +
+
+ 1st November 2017 +
+
+

If you care about how fault tolerant your storage is, you might +find these articles and papers interesting. They have formed how I +think of when designing a storage system.

+ + + +

Several of these research papers are based on data collected from +hundred thousands or millions of disk, and their findings are eye +opening. The short story is simply do not implicitly trust RAID or +redundant storage systems. Details matter. And unfortunately there +are few options on Linux addressing all the identified issues. Both +ZFS and Btrfs are doing a fairly good job, but have legal and +practical issues on their own. I wonder how cluster file systems like +Ceph do in this regard. After all, there is an old saying, you know +you have a distributed system when the crash of a computer you have +never heard of stops you from getting any work done. The same holds +true if fault tolerance do not work.

+ +

Just remember, in the end, it do not matter how redundant, or how +fault tolerant your storage is, if you do not continuously monitor its +status to detect and replace failed disks.

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

+ +
+
+ + + Tags: english, raid, sysadmin. + + +
+
+
+
How to figure out which RAID disk to replace when it fail @@ -98,6 +192,67 @@ disk(s) is failing when the RAID is running short on disks.

Archive