1 Title: s3ql, a locally mounted cloud file system - nice free software
2 Tags: english, debian, personvern, sikkerhet
5 <p>For a while now, I have been looking for a sensible off site backup
6 solution to use at home. My requirements are cheap and locally
7 encrypted (in other words, I keep the keys, the storage provider do
8 not have access to my private files). One idea me and my friends have
9 had over the years have been to use Google mail as storage, writing a
10 Linux block device storing blocks as emails in the mail service
11 provided by Google, and thus get heaps of free space. On top of this
12 one can add encryption, RAID and volume management to have lots of
13 (fairly slow, I admit that) cheap and encrypted storage. But I never
14 found time to implement such system. But the last few weeks I have
15 looked at a system called
16 <a href="https://bitbucket.org/nikratio/s3ql/">S3QL</a>, a locally
17 mounted network backed file system with the features I need.</p>
19 <p>S3QL is a fuse file system with a local cache and cloud storage,
20 handling several different storage providers, any with Amazon S3,
21 Google Drive or OpenStack API. There are heaps of such providers. It
22 can also use a local directory as storage, which combined with sshfs
23 allow for file storage on any ssh server. S3QL include support for
24 encryption, compression, de-duplication, snapshots and immutable file
25 systems, allowing me to mount the remote storage as a local mount
26 point, look at and use the files as if they were local, while the
27 content is stored in the cloud as well. This allow me to have a
28 backup that should survive fire. The file system can not be shared
29 between several machines at the same time, as only one can mount it at
30 the time, but any machine with the encryption key and access to the
31 storage service can mount it if it is unmounted.</p>
33 <p>It is simple to use. I'm using it on Debian Wheezy, where the
34 package is included already. So to get started, run <tt>apt-get
35 install s3ql</tt>. Next, pick a storage provider. I ended up picking
36 Greenqloud, after reading their nice recipe on
37 <a href="https://greenqloud.zendesk.com/entries/44611757-How-To-Use-S3QL-to-mount-a-StorageQloud-bucket-on-Debian-Wheezy">how
38 to use s3ql with their Amazon S3 service</a>, because I trust the laws
39 in Iceland more than those in USA when it come to keeping my data safe
40 and private, and thus would rather spend money on a company in
41 Iceland. Another nice recipe is available from the article
42 <a href="http://www.admin-magazine.com/HPC/Articles/HPC-Cloud-Storage">S3QL
43 Filesystem for HPC Storage</a> by Jeff Layton in the HPC section of
44 Admin magazine. When the provider is picked, figure out how to get
45 the API key needed to connect to the storage API. With Greencloud,
46 the key did not show up until I had added payment details to my
49 <p>Armed with the API access details, it is time to create the file
50 system. First, create a new bucket in the cloud. This bucket is the
51 file system storage area. I picked a bucket name reflecting the
52 machine that was going to store data there, but any name will do.
53 I'll refer to it as <tt>bucket-name</tt> below. In addition, one need
54 the API login and password, and a locally created password. Store it
55 all in ~root/.s3ql/authinfo2 like this:
59 storage-url: s3c://s.greenqloud.com:443/bucket-name
60 backend-login: API-login
61 backend-password: API-password
62 fs-passphrase: local-password
63 </pre></blockquote></p>
65 <p>I create my local passphrase using <tt>pwget 50</tt> or similar,
66 but any sensible way to create a fairly random password should do it.
67 Armed with these details, it is now time to run mkfs, entering the API
68 details and password to create it:</p>
71 # mkdir -m 700 /var/lib/s3ql-cache
72 # mkfs.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 --ssl s3c://s.greenqloud.com:443/bucket-name
74 Enter backend password:
75 Before using S3QL, make sure to read the user's guide, especially
76 the 'Important Rules to Avoid Loosing Data' section.
77 Enter encryption password:
78 Confirm encryption password:
79 Generating random encryption key...
80 Creating metadata tables...
90 Compressing and uploading metadata...
91 Wrote 0.00 MB of compressed metadata.
92 # </pre></blockquote></p>
94 <p>The next step is mounting the file system to make the storage available.
97 # mount.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 --ssl --allow-root s3c://s.greenqloud.com:443/bucket-name /s3ql
98 Using 4 upload threads.
99 Downloading and decompressing metadata...
109 Mounting filesystem...
111 Filesystem Size Used Avail Use% Mounted on
112 s3c://s.greenqloud.com:443/bucket-name 1.0T 0 1.0T 0% /s3ql
114 </pre></blockquote></p>
116 <p>The file system is now ready for use. I use rsync to store my
117 backups in it, and as the metadata used by rsync is downloaded at
118 mount time, no network traffic (and storage cost) is triggered by
119 running rsync. To unmount, one should not use the normal umount
120 command, as this will not flush the cache to the cloud storage, but
121 instead running the umount.s3ql command like this:
126 </pre></blockquote></p>
128 <p>There is a fsck command available to check the file system and
129 correct any problems detected. This can be used if the local server
130 crashes while the file system is mounted, to reset the "already
131 mounted" flag. This is what it look like when processing a working
135 # fsck.s3ql --force --ssl s3c://s.greenqloud.com:443/bucket-name
136 Using cached metadata.
137 File system seems clean, checking anyway.
138 Checking DB integrity...
139 Creating temporary extra indices...
140 Checking lost+found...
141 Checking cached objects...
142 Checking names (refcounts)...
143 Checking contents (names)...
144 Checking contents (inodes)...
145 Checking contents (parent inodes)...
146 Checking objects (reference counts)...
147 Checking objects (backend)...
148 ..processed 5000 objects so far..
149 ..processed 10000 objects so far..
150 ..processed 15000 objects so far..
151 Checking objects (sizes)...
152 Checking blocks (referenced objects)...
153 Checking blocks (refcounts)...
154 Checking inode-block mapping (blocks)...
155 Checking inode-block mapping (inodes)...
156 Checking inodes (refcounts)...
157 Checking inodes (sizes)...
158 Checking extended attributes (names)...
159 Checking extended attributes (inodes)...
160 Checking symlinks (inodes)...
161 Checking directory reachability...
162 Checking unix conventions...
163 Checking referential integrity...
164 Dropping temporary indices...
165 Backing up old metadata...
175 Compressing and uploading metadata...
176 Wrote 0.89 MB of compressed metadata.
178 </pre></blockquote></p>
180 <p>Thanks to the cache, working on files that fit in the cache is very
181 quick, about the same speed as local file access. Uploading large
182 amount of data is to me limited by the bandwidth out of and into my
183 house. Uploading 685 MiB with a 100 MiB cache gave me 305 kiB/s,
184 which is very close to my upload speed, and downloading the same
185 Debian installation ISO gave me 610 kiB/s, close to my download speed.
186 Both were measured using <tt>dd</tt>. So for me, the bottleneck is my
187 network, not the file system code.</p>
189 I mentioned that only one machine can mount the file system at the
190 time. If another machine try, it is told that the file system is
194 # mount.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 --ssl --allow-root s3c://s.greenqloud.com:443/bucket-name /s3ql
195 Using 8 upload threads.
196 Backend reports that fs is still mounted elsewhere, aborting.
198 </pre></blockquote></p>
200 <p>The file content is uploaded when the cache is full, while the
201 metadata is uploaded once every 24 hour by default. To ensure the
202 file system content is flushed to the cloud, one can either umount the
203 file system, or ask s3ql to flush the cache and metadata using
207 # s3qlctrl upload-meta /s3ql
208 # s3qlctrl flushcache /s3ql
210 </pre></blockquote></p>
212 <p>If you are curious about how much space your data uses in the
213 cloud, and how much compression and deduplication cut down on the
214 storage usage, you can use s3qlstat on the mounted file system to get
219 Directory entries: 9141
222 Total data size: 22049.38 MB
223 After de-duplication: 21955.46 MB (99.57% of total)
224 After compression: 21877.28 MB (99.22% of total, 99.64% of de-duplicated)
225 Database size: 2.39 MB (uncompressed)
226 (some values do not take into account not-yet-uploaded dirty blocks in cache)
228 </pre></blockquote></p>
230 <p>I mentioned earlier that there are several possible suppliers of
231 storage. I did not try to locate them all, but am aware of at least
232 <a href="https://www.greenqloud.com/">Greenqloud</a>,
233 <a href="http://drive.google.com/">Google Drive</a>,
234 <a href="http://aws.amazon.com/s3/">Amazon S3 web serivces</a>,
235 <a href="http://www.rackspace.com/">Rackspace</a> and
236 <a href="http://crowncloud.net/">Crowncloud</A>. The latter even
237 accept payment in Bitcoin. Pick one that suit your need. Some of
238 them provide several GiB of free storage, but the prize models are
239 quire different and you will have to figure out what suit you
242 <p>While researching this blog post, I had a look at research papers
243 and posters discussing the S3QL file system. There are several, which
244 told me that the file system is getting a critical check by the
245 science community and increased my confidence in using it. One nice
247 "<a href="http://www.lanl.gov/orgs/adtsc/publications/science_highlights_2013/docs/pg68_69.pdf">An
248 Innovative Parallel Cloud Storage System using OpenStack’s SwiftObject
249 Store and Transformative Parallel I/O Approach</a>" by Hsing-Bung
250 Chen, Benjamin McClelland, David Sherrill, Alfred Torrez, Parks Fields
251 and Pamela Smith. Please have a look.</p>
253 <p>If you do not want a locally file system, and want something that
254 work without the Linux fuse file system, I would like to mention the
255 <a href="http://www.tarsnap.com/">Tarsnap service</a>, which also
256 provide locally encrypted backup using a command line client. It have
257 a nicer access control system, where one can split out read and write
258 access, allowing some systems to write to the backup and others to
259 only read from it.</p>
261 <p>As usual, if you use Bitcoin and want to show your support of my
262 activities, please send Bitcoin donations to my address
263 <b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&label=PetterReinholdtsenBlog">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>