blog/data/2014-04-09-nice-s3ql.txt

   1 Title: s3ql, a locally mounted cloud file system - nice free software
   2 Tags: english, debian, personvern, sikkerhet
   3 Date: 2014-04-09 11:20
   4
   5 <p>For a while now, I have been looking for a sensible off site backup
   6 solution to use at home.  My requirements are cheap and locally
   7 encrypted (in other words, I keep the keys, the storage provider do
   8 not have access to my private files).  One idea me and my friends have
   9 had over the years have been to use Google mail as storage, writing a
  10 Linux block device storing blocks as emails in the mail service
  11 provided by Google, and thus get heaps of free space.  On top of this
  12 one can add encryption, RAID and volume management to have lots of
  13 (fairly slow, I admit that) cheap and encrypted storage.  But I never
  14 found time to implement such system.  But the last few weeks I have
  15 looked at a system called
  16 <a href="https://bitbucket.org/nikratio/s3ql/">S3QL</a>, a locally
  17 mounted network backed file system with the features I need.</p>
  18
  19 <p>S3QL is a fuse file system with a local cache and cloud storage,
  20 handling several different storage providers, any with Amazon S3,
  21 Google Drive or OpenStack API.  There are heaps of such providers.  It
  22 can also use a local directory as storage, which combined with sshfs
  23 allow for file storage on any ssh server.  S3QL include support for
  24 encryption, compression, de-duplication, snapshots and immutable file
  25 systems, allowing me to mount the remote storage as a local mount
  26 point, look at and use the files as if they were local, while the
  27 content is stored in the cloud as well.  This allow me to have a
  28 backup that should survive fire.  The file system can not be shared
  29 between several machines at the same time, as only one can mount it at
  30 the time, but any machine with the encryption key and access to the
  31 storage service can mount it if it is unmounted.</p>
  32
  33 <p>It is simple to use.  I'm using it on Debian Wheezy, where the
  34 package is included already.  So to get started, run <tt>apt-get
  35 install s3ql</tt>.  Next, pick a storage provider.  I ended up picking
  36 Greenqloud, after reading their nice recipe on
  37 <a href="https://greenqloud.zendesk.com/entries/44611757-How-To-Use-S3QL-to-mount-a-StorageQloud-bucket-on-Debian-Wheezy">how
  38 to use s3ql with their Amazon S3 service</a>, because I trust the laws
  39 in Iceland more than those in USA when it come to keeping my data safe
  40 and private, and thus would rather spend money on a company in
  41 Iceland.  Another nice recipe is available from the article
  42 <a href="http://www.admin-magazine.com/HPC/Articles/HPC-Cloud-Storage">S3QL
  43 Filesystem for HPC Storage</a> by Jeff Layton in the HPC section of
  44 Admin magazine.  When the provider is picked, figure out how to get
  45 the API key needed to connect to the storage API.  With Greencloud,
  46 the key did not show up until I had added payment details to my
  47 account.</p>
  48
  49 <p>Armed with the API access details, it is time to create the file
  50 system.  First, create a new bucket in the cloud.  This bucket is the
  51 file system storage area.  I picked a bucket name reflecting the
  52 machine that was going to store data there, but any name will do.
  53 I'll refer to it as <tt>bucket-name</tt> below.  In addition, one need
  54 the API login and password, and a locally created password.  Store it
  55 all in ~root/.s3ql/authinfo2 like this:
  56
  57 <p><blockquote><pre>
  58 [s3c]
  59 storage-url: s3c://s.greenqloud.com:443/bucket-name
  60 backend-login: API-login
  61 backend-password: API-password
  62 fs-passphrase: local-password
  63 </pre></blockquote></p>
  64
  65 <p>I create my local passphrase using <tt>pwget 50</tt> or similar,
  66 but any sensible way to create a fairly random password should do it.
  67 Armed with these details, it is now time to run mkfs, entering the API
  68 details and password to create it:</p>
  69
  70 <p><blockquote><pre>
  71 # mkdir -m 700 /var/lib/s3ql-cache
  72 # mkfs.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2   --ssl s3c://s.greenqloud.com:443/bucket-name
  73 Enter backend login:
  74 Enter backend password:
  75 Before using S3QL, make sure to read the user's guide, especially
  76 the 'Important Rules to Avoid Loosing Data' section.
  77 Enter encryption password:
  78 Confirm encryption password:
  79 Generating random encryption key...
  80 Creating metadata tables...
  81 Dumping metadata...
  82 ..objects..
  83 ..blocks..
  84 ..inodes..
  85 ..inode_blocks..
  86 ..symlink_targets..
  87 ..names..
  88 ..contents..
  89 ..ext_attributes..
  90 Compressing and uploading metadata...
  91 Wrote 0.00 MB of compressed metadata.
  92 # </pre></blockquote></p>
  93
  94 <p>The next step is mounting the file system to make the storage available.
  95
  96 <p><blockquote><pre>
  97 # mount.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 --ssl --allow-root s3c://s.greenqloud.com:443/bucket-name /s3ql
  98 Using 4 upload threads.
  99 Downloading and decompressing metadata...
 100 Reading metadata...
 101 ..objects..
 102 ..blocks..
 103 ..inodes..
 104 ..inode_blocks..
 105 ..symlink_targets..
 106 ..names..
 107 ..contents..
 108 ..ext_attributes..
 109 Mounting filesystem...
 110 # df -h /mnt
 111 Filesystem                              Size  Used Avail Use% Mounted on
 112 s3c://s.greenqloud.com:443/bucket-name  1.0T     0  1.0T   0% /s3ql
 113 #
 114 </pre></blockquote></p>
 115
 116 <p>The file system is now ready for use.  I use rsync to store my
 117 backups in it, and as the metadata used by rsync is downloaded at
 118 mount time, no network traffic (and storage cost) is triggered by
 119 running rsync.  To unmount, one should not use the normal umount
 120 command, as this will not flush the cache to the cloud storage, but
 121 instead running the umount.s3ql command like this:
 122
 123 <p><blockquote><pre>
 124 # umount.s3ql /s3ql
 125 #
 126 </pre></blockquote></p>
 127
 128 <p>There is a fsck command available to check the file system and
 129 correct any problems detected.  This can be used if the local server
 130 crashes while the file system is mounted, to reset the "already
 131 mounted" flag.  This is what it look like when processing a working
 132 file system:</p>
 133
 134 <p><blockquote><pre>
 135 # fsck.s3ql --force --ssl s3c://s.greenqloud.com:443/bucket-name
 136 Using cached metadata.
 137 File system seems clean, checking anyway.
 138 Checking DB integrity...
 139 Creating temporary extra indices...
 140 Checking lost+found...
 141 Checking cached objects...
 142 Checking names (refcounts)...
 143 Checking contents (names)...
 144 Checking contents (inodes)...
 145 Checking contents (parent inodes)...
 146 Checking objects (reference counts)...
 147 Checking objects (backend)...
 148 ..processed 5000 objects so far..
 149 ..processed 10000 objects so far..
 150 ..processed 15000 objects so far..
 151 Checking objects (sizes)...
 152 Checking blocks (referenced objects)...
 153 Checking blocks (refcounts)...
 154 Checking inode-block mapping (blocks)...
 155 Checking inode-block mapping (inodes)...
 156 Checking inodes (refcounts)...
 157 Checking inodes (sizes)...
 158 Checking extended attributes (names)...
 159 Checking extended attributes (inodes)...
 160 Checking symlinks (inodes)...
 161 Checking directory reachability...
 162 Checking unix conventions...
 163 Checking referential integrity...
 164 Dropping temporary indices...
 165 Backing up old metadata...
 166 Dumping metadata...
 167 ..objects..
 168 ..blocks..
 169 ..inodes..
 170 ..inode_blocks..
 171 ..symlink_targets..
 172 ..names..
 173 ..contents..
 174 ..ext_attributes..
 175 Compressing and uploading metadata...
 176 Wrote 0.89 MB of compressed metadata.
 177 #
 178 </pre></blockquote></p>
 179
 180 <p>Thanks to the cache, working on files that fit in the cache is very
 181 quick, about the same speed as local file access.  Uploading large
 182 amount of data is to me limited by the bandwidth out of and into my
 183 house.  Uploading 685 MiB with a 100 MiB cache gave me 305 kiB/s,
 184 which is very close to my upload speed, and downloading the same
 185 Debian installation ISO gave me 610 kiB/s, close to my download speed.
 186 Both were measured using <tt>dd</tt>.  So for me, the bottleneck is my
 187 network, not the file system code.</p>
 188
 189 I mentioned that only one machine can mount the file system at the
 190 time.  If another machine try, it is told that the file system is
 191 busy:
 192
 193 <p><blockquote><pre>
 194 # mount.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 --ssl --allow-root s3c://s.greenqloud.com:443/bucket-name /s3ql
 195 Using 8 upload threads.
 196 Backend reports that fs is still mounted elsewhere, aborting.
 197 #
 198 </pre></blockquote></p>
 199
 200 <p>The file content is uploaded when the cache is full, while the
 201 metadata is uploaded once every 24 hour by default.  To ensure the
 202 file system content is flushed to the cloud, one can either umount the
 203 file system, or ask s3ql to flush the cache and metadata using
 204 s3qlctrl:
 205
 206 <p><blockquote><pre>
 207 # s3qlctrl upload-meta /s3ql
 208 # s3qlctrl flushcache /s3ql
 209 #
 210 </pre></blockquote></p>
 211
 212 <p>If you are curious about how much space your data uses in the
 213 cloud, and how much compression and deduplication cut down on the
 214 storage usage, you can use s3qlstat on the mounted file system to get
 215 a report:</p>
 216
 217 <p><blockquote><pre>
 218 # s3qlstat /s3ql
 219 Directory entries:    9141
 220 Inodes:               9143
 221 Data blocks:          8851
 222 Total data size:      22049.38 MB
 223 After de-duplication: 21955.46 MB (99.57% of total)
 224 After compression:    21877.28 MB (99.22% of total, 99.64% of de-duplicated)
 225 Database size:        2.39 MB (uncompressed)
 226 (some values do not take into account not-yet-uploaded dirty blocks in cache)
 227 #
 228 </pre></blockquote></p>
 229
 230 <p>I mentioned earlier that there are several possible suppliers of
 231 storage.  I did not try to locate them all, but am aware of at least
 232 <a href="https://www.greenqloud.com/">Greenqloud</a>,
 233 <a href="http://drive.google.com/">Google Drive</a>,
 234 <a href="http://aws.amazon.com/s3/">Amazon S3 web serivces</a>,
 235 <a href="http://www.rackspace.com/">Rackspace</a> and
 236 <a href="http://crowncloud.net/">Crowncloud</A>.  The latter even
 237 accept payment in Bitcoin.  Pick one that suit your need.  Some of
 238 them provide several GiB of free storage, but the prize models are
 239 quire different and you will have to figure out what suit you
 240 best.</p>
 241
 242 <p>While researching this blog post, I had a look at research papers
 243 and posters discussing the S3QL file system.  There are several, which
 244 told me that the file system is getting a critical check by the
 245 science community and increased my confidence in using it.  One nice
 246 poster is titled
 247 "<a href="http://www.lanl.gov/orgs/adtsc/publications/science_highlights_2013/docs/pg68_69.pdf">An
 248 Innovative Parallel Cloud Storage System using OpenStack’s SwiftObject
 249 Store and Transformative Parallel I/O Approach</a>" by Hsing-Bung
 250 Chen, Benjamin McClelland, David Sherrill, Alfred Torrez, Parks Fields
 251 and Pamela Smith.  Please  have a look.</p>
 252
 253 <p>If you do not want a locally file system, and want something that
 254 work without the Linux fuse file system, I would like to mention the
 255 <a href="http://www.tarsnap.com/">Tarsnap service</a>, which also
 256 provide locally encrypted backup using a command line client.  It have
 257 a nicer access control system, where one can split out read and write
 258 access, allowing some systems to write to the backup and others to
 259 only read from it.</p>
 260
 261 <p>As usual, if you use Bitcoin and want to show your support of my
 262 activities, please send Bitcoin donations to my address
 263 <b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&label=PetterReinholdtsenBlog">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>