Generated.

[homepage.git] / blog / index.rss
diff --git a/blog/index.rss b/blog/index.rss

index 70f6ecb37a14481db35b5ba212be17b012217315..c5fcc608cb617f0d34ce95189ddd2668a097d7f2 100644 (file)
--- a/blog/index.rss
+++ b/blog/index.rss
@@ -6,6 +6,288 @@
                  <link>http://people.skolelinux.org/pere/blog/</link>
                  <atom:link href="http://people.skolelinux.org/pere/blog/index.rss" rel="self" type="application/rss+xml" />
         
+       <item>
+               <title>s3ql, a locally mounted cloud file system - nice free software</title>
+               <link>http://people.skolelinux.org/pere/blog/s3ql__a_locally_mounted_cloud_file_system___nice_free_software.html</link>
+               <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/s3ql__a_locally_mounted_cloud_file_system___nice_free_software.html</guid>
+                <pubDate>Wed, 9 Apr 2014 11:30:00 +0200</pubDate>
+               <description>&lt;p&gt;For a while now, I have been looking for a sensible offsite backup
+solution for use at home.  My requirements are simple, it must be
+cheap and locally encrypted (in other words, I keep the encryption
+keys, the storage provider do not have access to my private files).
+One idea me and my friends have had many years ago, before the cloud
+storage providers showed up, have been to use Google mail as storage,
+writing a Linux block device storing blocks as emails in the mail
+service provided by Google, and thus get heaps of free space.  On top
+of this one can add encryption, RAID and volume management to have
+lots of (fairly slow, I admit that) cheap and encrypted storage.  But
+I never found time to implement such system.  But the last few weeks I
+have looked at a system called
+&lt;a href=&quot;https://bitbucket.org/nikratio/s3ql/&quot;&gt;S3QL&lt;/a&gt;, a locally
+mounted network backed file system with the features I need.&lt;/p&gt;
+
+&lt;p&gt;S3QL is a fuse file system with a local cache and cloud storage,
+handling several different storage providers, any with Amazon S3,
+Google Drive or OpenStack API.  There are heaps of such storage
+providers.  S3QL can also use a local directory as storage, which
+combined with sshfs allow for file storage on any ssh server.  S3QL
+include support for encryption, compression, de-duplication, snapshots
+and immutable file systems, allowing me to mount the remote storage as
+a local mount point, look at and use the files as if they were local,
+while the content is stored in the cloud as well.  This allow me to
+have a backup that should survive fire.  The file system can not be
+shared between several machines at the same time, as only one can
+mount it at the time, but any machine with the encryption key and
+access to the storage service can mount it if it is unmounted.&lt;/p&gt;
+
+&lt;p&gt;It is simple to use.  I&#39;m using it on Debian Wheezy, where the
+package is included already.  So to get started, run &lt;tt&gt;apt-get
+install s3ql&lt;/tt&gt;.  Next, pick a storage provider.  I ended up picking
+Greenqloud, after reading their nice recipe on
+&lt;a href=&quot;https://greenqloud.zendesk.com/entries/44611757-How-To-Use-S3QL-to-mount-a-StorageQloud-bucket-on-Debian-Wheezy&quot;&gt;how
+to use s3ql with their Amazon S3 service&lt;/a&gt;, because I trust the laws
+in Iceland more than those in USA when it come to keeping my personal
+data safe and private, and thus would rather spend money on a company
+in Iceland.  Another nice recipe is available from the article
+&lt;a href=&quot;http://www.admin-magazine.com/HPC/Articles/HPC-Cloud-Storage&quot;&gt;S3QL
+Filesystem for HPC Storage&lt;/a&gt; by Jeff Layton in the HPC section of
+Admin magazine.  When the provider is picked, figure out how to get
+the API key needed to connect to the storage API.  With Greencloud,
+the key did not show up until I had added payment details to my
+account.&lt;/p&gt;
+
+&lt;p&gt;Armed with the API access details, it is time to create the file
+system.  First, create a new bucket in the cloud.  This bucket is the
+file system storage area.  I picked a bucket name reflecting the
+machine that was going to store data there, but any name will do.
+I&#39;ll refer to it as &lt;tt&gt;bucket-name&lt;/tt&gt; below.  In addition, one need
+the API login and password, and a locally created password.  Store it
+all in ~root/.s3ql/authinfo2 like this:
+
+&lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
+[s3c]
+storage-url: s3c://s.greenqloud.com:443/bucket-name
+backend-login: API-login
+backend-password: API-password
+fs-passphrase: local-password
+&lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
+
+&lt;p&gt;I create my local passphrase using &lt;tt&gt;pwget 50&lt;/tt&gt; or similar,
+but any sensible way to create a fairly random password should do it.
+Armed with these details, it is now time to run mkfs, entering the API
+details and password to create it:&lt;/p&gt;
+
+&lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
+# mkdir -m 700 /var/lib/s3ql-cache
+# mkfs.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 \
+  --ssl s3c://s.greenqloud.com:443/bucket-name
+Enter backend login: 
+Enter backend password: 
+Before using S3QL, make sure to read the user&#39;s guide, especially
+the &#39;Important Rules to Avoid Loosing Data&#39; section.
+Enter encryption password: 
+Confirm encryption password: 
+Generating random encryption key...
+Creating metadata tables...
+Dumping metadata...
+..objects..
+..blocks..
+..inodes..
+..inode_blocks..
+..symlink_targets..
+..names..
+..contents..
+..ext_attributes..
+Compressing and uploading metadata...
+Wrote 0.00 MB of compressed metadata.
+# &lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
+
+&lt;p&gt;The next step is mounting the file system to make the storage available.
+
+&lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
+# mount.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 \
+  --ssl --allow-root s3c://s.greenqloud.com:443/bucket-name /s3ql
+Using 4 upload threads.
+Downloading and decompressing metadata...
+Reading metadata...
+..objects..
+..blocks..
+..inodes..
+..inode_blocks..
+..symlink_targets..
+..names..
+..contents..
+..ext_attributes..
+Mounting filesystem...
+# df -h /mnt
+Filesystem                              Size  Used Avail Use% Mounted on
+s3c://s.greenqloud.com:443/bucket-name  1.0T     0  1.0T   0% /s3ql
+#
+&lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
+
+&lt;p&gt;The file system is now ready for use.  I use rsync to store my
+backups in it, and as the metadata used by rsync is downloaded at
+mount time, no network traffic (and storage cost) is triggered by
+running rsync.  To unmount, one should not use the normal umount
+command, as this will not flush the cache to the cloud storage, but
+instead running the umount.s3ql command like this:
+
+&lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
+# umount.s3ql /s3ql
+# 
+&lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
+
+&lt;p&gt;There is a fsck command available to check the file system and
+correct any problems detected.  This can be used if the local server
+crashes while the file system is mounted, to reset the &quot;already
+mounted&quot; flag.  This is what it look like when processing a working
+file system:&lt;/p&gt;
+
+&lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
+# fsck.s3ql --force --ssl s3c://s.greenqloud.com:443/bucket-name
+Using cached metadata.
+File system seems clean, checking anyway.
+Checking DB integrity...
+Creating temporary extra indices...
+Checking lost+found...
+Checking cached objects...
+Checking names (refcounts)...
+Checking contents (names)...
+Checking contents (inodes)...
+Checking contents (parent inodes)...
+Checking objects (reference counts)...
+Checking objects (backend)...
+..processed 5000 objects so far..
+..processed 10000 objects so far..
+..processed 15000 objects so far..
+Checking objects (sizes)...
+Checking blocks (referenced objects)...
+Checking blocks (refcounts)...
+Checking inode-block mapping (blocks)...
+Checking inode-block mapping (inodes)...
+Checking inodes (refcounts)...
+Checking inodes (sizes)...
+Checking extended attributes (names)...
+Checking extended attributes (inodes)...
+Checking symlinks (inodes)...
+Checking directory reachability...
+Checking unix conventions...
+Checking referential integrity...
+Dropping temporary indices...
+Backing up old metadata...
+Dumping metadata...
+..objects..
+..blocks..
+..inodes..
+..inode_blocks..
+..symlink_targets..
+..names..
+..contents..
+..ext_attributes..
+Compressing and uploading metadata...
+Wrote 0.89 MB of compressed metadata.
+# 
+&lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
+
+&lt;p&gt;Thanks to the cache, working on files that fit in the cache is very
+quick, about the same speed as local file access.  Uploading large
+amount of data is to me limited by the bandwidth out of and into my
+house.  Uploading 685 MiB with a 100 MiB cache gave me 305 kiB/s,
+which is very close to my upload speed, and downloading the same
+Debian installation ISO gave me 610 kiB/s, close to my download speed.
+Both were measured using &lt;tt&gt;dd&lt;/tt&gt;.  So for me, the bottleneck is my
+network, not the file system code.  I do not know what a good cache
+size would be, but suspect that the cache should e larger than your
+working set.&lt;/p&gt;
+
+&lt;p&gt;I mentioned that only one machine can mount the file system at the
+time.  If another machine try, it is told that the file system is
+busy:&lt;/p&gt;
+
+&lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
+# mount.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 \
+  --ssl --allow-root s3c://s.greenqloud.com:443/bucket-name /s3ql
+Using 8 upload threads.
+Backend reports that fs is still mounted elsewhere, aborting.
+#
+&lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
+
+&lt;p&gt;The file content is uploaded when the cache is full, while the
+metadata is uploaded once every 24 hour by default.  To ensure the
+file system content is flushed to the cloud, one can either umount the
+file system, or ask s3ql to flush the cache and metadata using
+s3qlctrl:
+
+&lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
+# s3qlctrl upload-meta /s3ql
+# s3qlctrl flushcache /s3ql
+# 
+&lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
+
+&lt;p&gt;If you are curious about how much space your data uses in the
+cloud, and how much compression and deduplication cut down on the
+storage usage, you can use s3qlstat on the mounted file system to get
+a report:&lt;/p&gt;
+
+&lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
+# s3qlstat /s3ql
+Directory entries:    9141
+Inodes:               9143
+Data blocks:          8851
+Total data size:      22049.38 MB
+After de-duplication: 21955.46 MB (99.57% of total)
+After compression:    21877.28 MB (99.22% of total, 99.64% of de-duplicated)
+Database size:        2.39 MB (uncompressed)
+(some values do not take into account not-yet-uploaded dirty blocks in cache)
+#
+&lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
+
+&lt;p&gt;I mentioned earlier that there are several possible suppliers of
+storage.  I did not try to locate them all, but am aware of at least
+&lt;a href=&quot;https://www.greenqloud.com/&quot;&gt;Greenqloud&lt;/a&gt;,
+&lt;a href=&quot;http://drive.google.com/&quot;&gt;Google Drive&lt;/a&gt;,
+&lt;a href=&quot;http://aws.amazon.com/s3/&quot;&gt;Amazon S3 web serivces&lt;/a&gt;,
+&lt;a href=&quot;http://www.rackspace.com/&quot;&gt;Rackspace&lt;/a&gt; and
+&lt;a href=&quot;http://crowncloud.net/&quot;&gt;Crowncloud&lt;/A&gt;.  The latter even
+accept payment in Bitcoin.  Pick one that suit your need.  Some of
+them provide several GiB of free storage, but the prize models are
+quire different and you will have to figure out what suit you
+best.&lt;/p&gt;
+
+&lt;p&gt;While researching this blog post, I had a look at research papers
+and posters discussing the S3QL file system.  There are several, which
+told me that the file system is getting a critical check by the
+science community and increased my confidence in using it.  One nice
+poster is titled
+&quot;&lt;a href=&quot;http://www.lanl.gov/orgs/adtsc/publications/science_highlights_2013/docs/pg68_69.pdf&quot;&gt;An
+Innovative Parallel Cloud Storage System using OpenStack’s SwiftObject
+Store and Transformative Parallel I/O Approach&lt;/a&gt;&quot; by Hsing-Bung
+Chen, Benjamin McClelland, David Sherrill, Alfred Torrez, Parks Fields
+and Pamela Smith.  Please  have a look.&lt;/p&gt;
+
+&lt;p&gt;Given my problems with different file systems earlier, I decided to
+check out the mounted S3QL file system to see if it would be usable as
+a home directory (in other word, that it provided POSIX semantics when
+it come to locking and umask handling etc).  Running
+&lt;a href=&quot;http://people.skolelinux.org/pere/blog/Testing_if_a_file_system_can_be_used_for_home_directories___.html&quot;&gt;my
+test code to check file system semantics, I was happy to discover that
+no error was found.  So the file system can be used for home
+directories, if one chooses to do so.&lt;/p&gt;
+
+&lt;p&gt;If you do not want a locally file system, and want something that
+work without the Linux fuse file system, I would like to mention the
+&lt;a href=&quot;http://www.tarsnap.com/&quot;&gt;Tarsnap service&lt;/a&gt;, which also
+provide locally encrypted backup using a command line client.  It have
+a nicer access control system, where one can split out read and write
+access, allowing some systems to write to the backup and others to
+only read from it.&lt;/p&gt;
+
+&lt;p&gt;As usual, if you use Bitcoin and want to show your support of my
+activities, please send Bitcoin donations to my address
+&lt;b&gt;&lt;a href=&quot;bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&amp;label=PetterReinholdtsenBlog&quot;&gt;15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&lt;/a&gt;&lt;/b&gt;.&lt;/p&gt;
+</description>
+       </item>
+       
         <item>
                 <title>EU-domstolen bekreftet i dag at datalagringsdirektivet er ulovlig</title>
                 <link>http://people.skolelinux.org/pere/blog/EU_domstolen_bekreftet_i_dag_at_datalagringsdirektivet_er_ulovlig.html</link>
@@ -616,97 +898,5 @@ workstation, LTSP client or LTSP server.&lt;/p&gt;
  </description>
         </item>
         
-       <item>
-               <title>Hvordan bør RFC 822-formattert epost lagres i en NOARK5-database?</title>
-               <link>http://people.skolelinux.org/pere/blog/Hvordan_b_r_RFC_822_formattert_epost_lagres_i_en_NOARK5_database_.html</link>
-               <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Hvordan_b_r_RFC_822_formattert_epost_lagres_i_en_NOARK5_database_.html</guid>
-                <pubDate>Fri, 7 Mar 2014 15:20:00 +0100</pubDate>
-               <description>&lt;p&gt;For noen uker siden ble NXCs fri programvarelisenserte
-NOARK5-løsning
-&lt;a href=&quot;http://www.nuug.no/aktiviteter/20140211-noark/&quot;&gt;presentert hos
-NUUG&lt;/a&gt; (video
-&lt;a href=&quot;https://www.youtube.com/watch?v=JCb_dNS3MHQ&quot;&gt;på youtube
-foreløbig&lt;/a&gt;), og det fikk meg til å titte litt mer på NOARK5,
-standarden for arkivhåndtering i det offentlige Norge.  Jeg lurer på
-om denne kjernen kan være nyttig i et par av mine prosjekter, og for ett
-av dem er det mest aktuelt å lagre epost.  Jeg klarte ikke finne noen
-anbefaling om hvordan RFC 822-formattert epost (aka Internett-epost)
-burde lagres i NOARK5, selv om jeg vet at noen arkiver tar
-PDF-utskrift av eposten med sitt epostprogram og så arkiverer PDF-en
-(eller enda værre, tar papirutskrift og lagrer bildet av eposten som
-PDF i arkivet).&lt;/p&gt;
-
-&lt;p&gt;Det er ikke så mange formater som er akseptert av riksarkivet til
-langtidsoppbevaring av offentlige arkiver, og PDF og XML er de mest
-aktuelle i så måte.  Det slo meg at det måtte da finnes en eller annen
-egnet XML-representasjon og at det kanskje var enighet om hvilken som
-burde brukes, så jeg tok mot til meg og spurte
-&lt;a href=&quot;http://samdok.com/&quot;&gt;SAMDOK&lt;/a&gt;, en gruppe tilknyttet
-arkivverket som ser ut til å jobbe med NOARK-samhandling, om de hadde
-noen anbefalinger:
-
-&lt;p&gt;&lt;blockquote&gt;
-&lt;p&gt;Hei.&lt;/p&gt;
-
-&lt;p&gt;Usikker på om dette er riktig forum å ta opp mitt spørsmål, men jeg
-lurer på om det er definert en anbefaling om hvordan RFC
-822-formatterte epost (aka vanlig Internet-epost) bør lages håndteres
-i NOARK5, slik at en bevarer all informasjon i eposten
-(f.eks. Received-linjer).  Finnes det en anbefalt XML-mapping ala den
-som beskrives på
-&amp;lt;URL: &lt;a href=&quot;https://www.informit.com/articles/article.aspx?p=32074&quot;&gt;https://www.informit.com/articles/article.aspx?p=32074&lt;/a&gt; &amp;gt;?  Mitt
-mål er at det skal være mulig å lagre eposten i en NOARK5-kjerne og
-kunne få ut en identisk formattert kopi av opprinnelig epost ved
-behov.&lt;/p&gt;
-&lt;/blockquote&gt;&lt;/p&gt;
-
-&lt;p&gt;Postmottaker hos SAMDOK mente spørsmålet heller burde stilles
-direkte til riksarkivet, og jeg fikk i dag svar derfra formulert av
-seniorrådgiver Geir Ivar Tungesvik:&lt;/p&gt;
-
-&lt;p&gt;&lt;blockquote&gt;
-&lt;p&gt;Riksarkivet har ingen anbefalinger når det gjelder konvertering fra
-e-post til XML.  Det står arkivskaper fritt å eventuelt definere/bruke
-eget format.  Inklusive da - som det spørres om - et format der det er
-mulig å re-etablere e-post format ut fra XML-en.  XML (e-post)
-dokumenter må være referert i arkivstrukturen, og det må vedlegges et
-gyldig XML skjema (.xsd) for XML-filene. Arkivskaper står altså fritt
-til å gjøre hva de vil, bare det dokumenteres og det kan dannes et
-utrekk ved avlevering til depot.&lt;/p&gt;
-
-&lt;p&gt;De obligatoriske kravene i Noark 5 standarden må altså oppfylles -
-etter dialog med Riksarkivet i forbindelse med godkjenning. For
-offentlige arkiv er det særlig viktig med filene loependeJournal.xml
-og offentligJournal.xml. Private arkiv som vil forholde seg til Noark
-5 standarden er selvsagt frie til å bruke det som er relevant for dem
-av obligatoriske krav.&lt;/p&gt;
-&lt;/blockquote&gt;&lt;/p&gt;
-
-&lt;p&gt;Det ser dermed ut for meg som om det er et lite behov for å
-standardisere XML-lagring av RFC-822-formatterte meldinger.  Noen som
-vet om god spesifikasjon i så måte?  I tillegg til den omtalt over,
-har jeg kommet over flere aktuelle beskrivelser (søk på &quot;rfc 822
-xml&quot;, så finner du aktuelle alternativer).&lt;/p&gt;
-
-&lt;ul&gt;
-
-&lt;li&gt;&lt;a href=&quot;http://www.openhealth.org/xmtp/&quot;&gt;XML MIME Transformation
-protocol (XMTP)&lt;/a&gt; fra OpenHealth, sist oppdatert 2001.&lt;/li&gt;
-
-&lt;li&gt;&lt;a href=&quot;https://tools.ietf.org/html/draft-klyne-message-rfc822-xml-03&quot;&gt;An
-XML format for mail and other messages&lt;/a&gt; utkast fra IETF datert
-2001.&lt;/li&gt;
-
-&lt;li&gt;&lt;a href=&quot;http://www.informit.com/articles/article.aspx?p=32074&quot;&gt;xMail:
-E-mail as XML&lt;/a&gt; en artikkel fra 2003 som beskriver python-modulen
-rfc822 som gir ut XML-representasjon av en RFC 822-formattert epost.&lt;/li&gt;
-
-&lt;/ul&gt;
-
-&lt;p&gt;Finnes det andre og bedre spesifikasjoner for slik lagring?  Send
-meg en epost hvis du har innspill.&lt;/p&gt;
-</description>
-       </item>
-       
          </channel>
  </rss>