X-Git-Url: https://pere.pagekite.me/gitweb/homepage.git/blobdiff_plain/17a7cc8c1a60f7341f92e0f757b83700189144a7..53083125605a1321aafb179d30cb62de65f67094:/blog/tags/sysadmin/index.html diff --git a/blog/tags/sysadmin/index.html b/blog/tags/sysadmin/index.html index 3735b9cd85..4ba881fc4e 100644 --- a/blog/tags/sysadmin/index.html +++ b/blog/tags/sysadmin/index.html @@ -20,6 +20,100 @@

Entries tagged "sysadmin".

+
+
+ Some notes on fault tolerant storage systems +
+
+ 1st November 2017 +
+
+

If you care about how fault tolerant your storage is, you might +find these articles and papers interesting. They have formed how I +think of when designing a storage system.

+ + + +

Several of these research papers are based on data collected from +hundred thousands or millions of disk, and their findings are eye +opening. The short story is simply do not implicitly trust RAID or +redundant storage systems. Details matter. And unfortunately there +are few options on Linux addressing all the identified issues. Both +ZFS and Btrfs are doing a fairly good job, but have legal and +practical issues on their own. I wonder how cluster file systems like +Ceph do in this regard. After all, there is an old saying, you know +you have a distributed system when the crash of a computer you have +never heard of stops you from getting any work done. The same holds +true if fault tolerance do not work.

+ +

Just remember, in the end, it do not matter how redundant, or how +fault tolerant your storage is, if you do not continuously monitor its +status to detect and replace failed disks.

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

+ +
+
+ + + Tags: english, raid, sysadmin. + + +
+
+
+
Detecting NFS hangs on Linux without hanging yourself... @@ -330,6 +424,19 @@ and DHCP updates I wanted to do.

Archive