X-Git-Url: http://pere.pagekite.me/gitweb/homepage.git/blobdiff_plain/6f2eff6f2c1badf27a0a32707a40d70c77c7b149..b6b6575e368fa0e8d3ac34a3a09aa1e21132be0e:/blog/index.html diff --git a/blog/index.html b/blog/index.html index 33b50e52e1..7cee2eb902 100644 --- a/blog/index.html +++ b/blog/index.html @@ -19,6 +19,502 @@ +
+
Some notes on fault tolerant storage systems
+
1st November 2017
+

If you care about how fault tolerant your storage is, you might +find these articles and papers interesting. They have formed how I +think of when designing a storage system.

+ + + +

Several of these research papers are based on data collected from +hundred thousands or millions of disk, and their findings are eye +opening. The short story is simply do not implicitly trust RAID or +redundant storage systems. Details matter. And unfortunately there +are few options on Linux addressing all the identified issues. Both +ZFS and Btrfs are doing a fairly good job, but have legal and +practical issues on their own. I wonder how cluster file systems like +Ceph do in this regard. After, all the old saying, you know you have +a distributed system when the crash of a compyter you have never heard +of stops you from getting any work done. The same holds true if fault +tolerance do not work.

+ +

Just remember, in the end, it do not matter how redundant, or how +fault tolerant your storage is, if you do not continuously monitor its +status to detect and replace failed disks.

+
+
+ + + Tags: english, raid, sysadmin. + + +
+
+
+ +
+
Web services for writing academic LaTeX papers as a team
+
31st October 2017
+

I was surprised today to learn that a friend in academia did not +know there are easily available web services available for writing +LaTeX documents as a team. I thought it was common knowledge, but to +make sure at least my readers are aware of it, I would like to mention +these useful services for writing LaTeX documents. Some of them even +provide a WYSIWYG editor to ease writing even further.

+ +

There are two commercial services available, +ShareLaTeX and +Overleaf. They are very easy to +use. Just start a new document, select which publisher to write for +(ie which LaTeX style to use), and start writing. Note, these two +have announced their intention to join forces, so soon it will only be +one joint service. I've used both for different documents, and they +work just fine. While +ShareLaTeX is free +software, while the latter is not. According to a +announcement from Overleaf, they plan to keep the ShareLaTeX code +base maintained as free software.

+ +But these two are not the only alternatives. +Fidus Writer is another free +software solution with the +source available on github. I have not used it myself. Several +others can be found on the nice +alterntiveTo +web service. + +

If you like Google Docs or Etherpad, but would like to write +documents in LaTeX, you should check out these services. You can even +host your own, if you want to. :)

+ +
+
+ + + Tags: english. + + +
+
+
+ +
+
Locating IMDB IDs of movies in the Internet Archive using Wikidata
+
25th October 2017
+

Recently, I needed to automatically check the copyright status of a +set of The Internet Movie database +(IMDB) entries, to figure out which one of the movies they refer +to can be freely distributed on the Internet. This proved to be +harder than it sounds. IMDB for sure list movies without any +copyright protection, where the copyright protection has expired or +where the movie is lisenced using a permissive license like one from +Creative Commons. These are mixed with copyright protected movies, +and there seem to be no way to separate these classes of movies using +the information in IMDB.

+ +

First I tried to look up entries manually in IMDB, +Wikipedia and +The Internet Archive, to get a +feel how to do this. It is hard to know for sure using these sources, +but it should be possible to be reasonable confident a movie is "out +of copyright" with a few hours work per movie. As I needed to check +almost 20,000 entries, this approach was not sustainable. I simply +can not work around the clock for about 6 years to check this data +set.

+ +

I asked the people behind The Internet Archive if they could +introduce a new metadata field in their metadata XML for IMDB ID, but +was told that they leave it completely to the uploaders to update the +metadata. Some of the metadata entries had IMDB links in the +description, but I found no way to download all metadata files in bulk +to locate those ones and put that approach aside.

+ +

In the process I noticed several Wikipedia articles about movies +had links to both IMDB and The Internet Archive, and it occured to me +that I could use the Wikipedia RDF data set to locate entries with +both, to at least get a lower bound on the number of movies on The +Internet Archive with a IMDB ID. This is useful based on the +assumption that movies distributed by The Internet Archive can be +legally distributed on the Internet. With some help from the RDF +community (thank you DanC), I was able to come up with this query to +pass to the SPARQL interface on +Wikidata: + +

+SELECT ?work ?imdb ?ia ?when ?label
+WHERE
+{
+  ?work wdt:P31/wdt:P279* wd:Q11424.
+  ?work wdt:P345 ?imdb.
+  ?work wdt:P724 ?ia.
+  OPTIONAL {
+        ?work wdt:P577 ?when.
+        ?work rdfs:label ?label.
+        FILTER(LANG(?label) = "en").
+  }
+}
+

+ +

If I understand the query right, for every film entry anywhere in +Wikpedia, it will return the IMDB ID and The Internet Archive ID, and +when the movie was released and its English title, if either or both +of the latter two are available. At the moment the result set contain +2338 entries. Of course, it depend on volunteers including both +correct IMDB and The Internet Archive IDs in the wikipedia articles +for the movie. It should be noted that the result will include +duplicates if the movie have entries in several languages. There are +some bogus entries, either because The Internet Archive ID contain a +typo or because the movie is not available from The Internet Archive. +I did not verify the IMDB IDs, as I am unsure how to do that +automatically.

+ +

I wrote a small python script to extract the data set from Wikidata +and check if the XML metadata for the movie is available from The +Internet Archive, and after around 1.5 hour it produced a list of 2097 +free movies and their IMDB ID. In total, 171 entries in Wikidata lack +the refered Internet Archive entry. I assume the 70 "disappearing" +entries (ie 2338-2097-171) are duplicate entries.

+ +

This is not too bad, given that The Internet Archive report to +contain 5331 +feature films at the moment, but it also mean more than 3000 +movies are missing on Wikipedia or are missing the pair of references +on Wikipedia.

+ +

I was curious about the distribution by release year, and made a +little graph to show how the amount of free movies is spread over the +years:

+ +

+ +

I expect the relative distribution of the remaining 3000 movies to +be similar.

+ +

If you want to help, and want to ensure Wikipedia can be used to +cross reference The Internet Archive and The Internet Movie Database, +please make sure entries like this are listed under the "External +links" heading on the Wikipedia article for the movie:

+ +

+* {{Internet Archive film|id=FightingLady}}
+* {{IMDb title|id=0036823|title=The Fighting Lady}}
+

+ +

Please verify the links on the final page, to make sure you did not +introduce a typo.

+ +

Here is the complete list, if you want to correct the 171 +identified Wikipedia entries with broken links to The Internet +Archive: Q1140317, +Q458656, +Q458656, +Q470560, +Q743340, +Q822580, +Q480696, +Q128761, +Q1307059, +Q1335091, +Q1537166, +Q1438334, +Q1479751, +Q1497200, +Q1498122, +Q865973, +Q834269, +Q841781, +Q841781, +Q1548193, +Q499031, +Q1564769, +Q1585239, +Q1585569, +Q1624236, +Q4796595, +Q4853469, +Q4873046, +Q915016, +Q4660396, +Q4677708, +Q4738449, +Q4756096, +Q4766785, +Q880357, +Q882066, +Q882066, +Q204191, +Q204191, +Q1194170, +Q940014, +Q946863, +Q172837, +Q573077, +Q1219005, +Q1219599, +Q1643798, +Q1656352, +Q1659549, +Q1660007, +Q1698154, +Q1737980, +Q1877284, +Q1199354, +Q1199354, +Q1199451, +Q1211871, +Q1212179, +Q1238382, +Q4906454, +Q320219, +Q1148649, +Q645094, +Q5050350, +Q5166548, +Q2677926, +Q2698139, +Q2707305, +Q2740725, +Q2024780, +Q2117418, +Q2138984, +Q1127992, +Q1058087, +Q1070484, +Q1080080, +Q1090813, +Q1251918, +Q1254110, +Q1257070, +Q1257079, +Q1197410, +Q1198423, +Q706951, +Q723239, +Q2079261, +Q1171364, +Q617858, +Q5166611, +Q5166611, +Q324513, +Q374172, +Q7533269, +Q970386, +Q976849, +Q7458614, +Q5347416, +Q5460005, +Q5463392, +Q3038555, +Q5288458, +Q2346516, +Q5183645, +Q5185497, +Q5216127, +Q5223127, +Q5261159, +Q1300759, +Q5521241, +Q7733434, +Q7736264, +Q7737032, +Q7882671, +Q7719427, +Q7719444, +Q7722575, +Q2629763, +Q2640346, +Q2649671, +Q7703851, +Q7747041, +Q6544949, +Q6672759, +Q2445896, +Q12124891, +Q3127044, +Q2511262, +Q2517672, +Q2543165, +Q426628, +Q426628, +Q12126890, +Q13359969, +Q13359969, +Q2294295, +Q2294295, +Q2559509, +Q2559912, +Q7760469, +Q6703974, +Q4744, +Q7766962, +Q7768516, +Q7769205, +Q7769988, +Q2946945, +Q3212086, +Q3212086, +Q18218448, +Q18218448, +Q18218448, +Q6909175, +Q7405709, +Q7416149, +Q7239952, +Q7317332, +Q7783674, +Q7783704, +Q7857590, +Q3372526, +Q3372642, +Q3372816, +Q3372909, +Q7959649, +Q7977485, +Q7992684, +Q3817966, +Q3821852, +Q3420907, +Q3429733, +Q774474

+
+
+ + + Tags: english, opphavsrett. + + +
+
+
+ +
+
A one-way wall on the border?
+
14th October 2017
+

I find it fascinating how many of the people being locked inside +the proposed border wall between USA and Mexico support the idea. The +proposal to keep Mexicans out reminds me of +the +propaganda twist from the East Germany government calling the wall +the “Antifascist Bulwark” after erecting the Berlin Wall, claiming +that the wall was erected to keep enemies from creeping into East +Germany, while it was obvious to the people locked inside it that it +was erected to keep the people from escaping.

+ +

Do the people in USA supporting this wall really believe it is a +one way wall, only keeping people on the outside from getting in, +while not keeping people in the inside from getting out?

+
+
+ + + Tags: english. + + +
+
+
+ +
+
Generating 3D prints in Debian using Cura and Slic3r(-prusa)
+
9th October 2017
+

At my nearby maker space, +Sonen, I heard the story that it +was easier to generate gcode files for theyr 3D printers (Ultimake 2+) +on Windows and MacOS X than Linux, because the software involved had +to be manually compiled and set up on Linux while premade packages +worked out of the box on Windows and MacOS X. I found this annoying, +as the software involved, +Cura, is free software +and should be trivial to get up and running on Linux if someone took +the time to package it for the relevant distributions. I even found +a request for adding into +Debian from 2013, which had seem some activity over the years but +never resulted in the software showing up in Debian. So a few days +ago I offered my help to try to improve the situation.

+ +

Now I am very happy to see that all the packages required by a +working Cura in Debian are uploaded into Debian and waiting in the NEW +queue for the ftpmasters to have a look. You can track the progress +on +the +status page for the 3D printer team.

+ +

The uploaded packages are a bit behind upstream, and was uploaded +now to get slots in the NEW +queue while we work up updating the packages to the latest +upstream version.

+ +

On a related note, two competitors for Cura, which I found harder +to use and was unable to configure correctly for Ultimaker 2+ in the +short time I spent on it, are already in Debian. If you are looking +for 3D printer "slicers" and want something already available in +Debian, check out +slic3r and +slic3r-prusa. +The latter is a fork of the former.

+
+
+ + + Tags: 3d-printer, debian, english. + + +
+
+
+
Mangler du en skrue, eller har du en skrue løs?
4th October 2017
@@ -351,322 +847,6 @@ one frequency?

-
-
Norwegian Bokmål edition of Debian Administrator's Handbook is now available
-
25th July 2017
-

- -

I finally received a copy of the Norwegian Bokmål edition of -"The Debian Administrator's -Handbook". This test copy arrived in the mail a few days ago, and -I am very happy to hold the result in my hand. We spent around one and a half year translating it. This paperbook edition -is available -from lulu.com. If you buy it quickly, you save 25% on the list -price. The book is also available for download in electronic form as -PDF, EPUB and Mobipocket, as can be -read online -as a web page.

- -

This is the second book I publish (the first was the book -"Free Culture" by Lawrence Lessig -in -English, -French -and -Norwegian -Bokmål), and I am very excited to finally wrap up this -project. I hope -"Håndbok -for Debian-administratoren" will be well received.

-
-
- - - Tags: debian, debian-handbook, english. - - -
-
-
- -
-
«Rapporten ser ikke på informasjonssikkerhet knyttet til personlig integritet»
-
27th June 2017
-

Jeg kom over teksten -«Killing -car privacy by federal mandate» av Leonid Reyzin på Freedom to -Tinker i dag, og det gleder meg å se en god gjennomgang om hvorfor det -er et urimelig inngrep i privatsfæren å la alle biler kringkaste sin -posisjon og bevegelse via radio. Det omtalte forslaget basert på -Dedicated Short Range Communication (DSRC) kalles Basic Safety Message -(BSM) i USA og Cooperative Awareness Message (CAM) i Europa, og det -norske Vegvesenet er en av de som ser ut til å kunne tenke seg å -pålegge alle biler å fjerne nok en bit av innbyggernes privatsfære. -Anbefaler alle å lese det som står der. - -

Mens jeg tittet litt på DSRC på biler i Norge kom jeg over et sitat -jeg synes er illustrativt for hvordan det offentlige Norge håndterer -problemstillinger rundt innbyggernes privatsfære i SINTEF-rapporten -«Informasjonssikkerhet -i AutoPASS-brikker» av Trond Foss:

- -

-«Rapporten ser ikke på informasjonssikkerhet knyttet til personlig - integritet.» -

- -

Så enkelt kan det tydeligvis gjøres når en vurderer -informasjonssikkerheten. Det holder vel at folkene på toppen kan si -at «Personvernet er ivaretatt», som jo er den populære intetsigende -frasen som gjør at mange tror enkeltindividers integritet tas vare på. -Sitatet fikk meg til å undres på hvor ofte samme tilnærming, å bare se -bort fra behovet for personlig itegritet, blir valgt når en velger å -legge til rette for nok et inngrep i privatsfæren til personer i -Norge. Det er jo sjelden det får reaksjoner. Historien om -reaksjonene på Helse Sør-Østs tjenesteutsetting er jo sørgelig nok et -unntak og toppen av isfjellet, desverre. Tror jeg fortsatt takker nei -til både AutoPASS og holder meg så langt unna det norske helsevesenet -som jeg kan, inntil de har demonstrert og dokumentert at de verdsetter -individets privatsfære og personlige integritet høyere enn kortsiktig -gevist og samfunnsnytte.

-
-
- - - Tags: norsk, personvern, sikkerhet. - - -
-
-
- -
-
Updated sales number for my Free Culture paper editions
-
12th June 2017
-

It is pleasing to see that the work we put down in publishing new -editions of the classic Free -Culture book by the founder of the Creative Commons movement, -Lawrence Lessig, is still being appreciated. I had a look at the -latest sales numbers for the paper edition today. Not too impressive, -but happy to see some buyers still exist. All the revenue from the -books is sent to the Creative -Commons Corporation, and they receive the largest cut if you buy -directly from Lulu. Most books are sold via Amazon, with Ingram -second and only a small fraction directly from Lulu. The ebook -edition is available for free from -Github.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Title / languageQuantity
2016 jan-jun2016 jul-dec2017 jan-may
Culture Libre / French3615
Fri kultur / Norwegian710
Free Culture / English142716
Total243431
- -

A bit sad to see the low sales number on the Norwegian edition, and -a bit surprising the English edition still selling so well.

- -

If you would like to translate and publish the book in your native -language, I would be happy to help make it happen. Please get in -touch.

-
-
- - - Tags: docbook, english, freeculture. - - -
-
-
- -
-
Release 0.1.1 of free software archive system Nikita announced
-
10th June 2017
-

I am very happy to report that the -Nikita Noark 5 -core project tagged its second release today. The free software -solution is an implementation of the Norwegian archive standard Noark -5 used by government offices in Norway. These were the changes in -version 0.1.1 since version 0.1.0 (from NEWS.md): - -

- -

If this sound interesting to you, please contact us on IRC (#nikita -on irc.freenode.net) or email -(nikita-noark -mailing list).

-
-
- - - Tags: english, nuug, offentlig innsyn, standard. - - -
-
-
- -
-
Idea for storing trusted timestamps in a Noark 5 archive
-
7th June 2017
-

This is a copy of -an -email I posted to the nikita-noark mailing list. Please follow up -there if you would like to discuss this topic. The background is that -we are making a free software archive system based on the Norwegian -Noark -5 standard for government archives.

- -

I've been wondering a bit lately how trusted timestamps could be -stored in Noark 5. -Trusted -timestamps can be used to verify that some information -(document/file/checksum/metadata) have not been changed since a -specific time in the past. This is useful to verify the integrity of -the documents in the archive.

- -

Then it occured to me, perhaps the trusted timestamps could be -stored as dokument variants (ie dokumentobjekt referered to from -dokumentbeskrivelse) with the filename set to the hash it is -stamping?

- -

Given a "dokumentbeskrivelse" with an associated "dokumentobjekt", -a new dokumentobjekt is associated with "dokumentbeskrivelse" with the -same attributes as the stamped dokumentobjekt except these -attributes:

- - - -

This assume a service following -IETF RFC 3161 is -used, which specifiy the given MIME type for replies and the .tsr file -ending for the content of such trusted timestamp. As far as I can -tell from the Noark 5 specifications, it is OK to have several -variants/renderings of a dokument attached to a given -dokumentbeskrivelse objekt. It might be stretching it a bit to make -some of these variants represent crypto-signatures useful for -verifying the document integrity instead of representing the dokument -itself.

- -

Using the source of the service in formatDetaljer allow several -timestamping services to be used. This is useful to spread the risk -of key compromise over several organisations. It would only be a -problem to trust the timestamps if all of the organisations are -compromised.

- -

The following oneliner on Linux can be used to generate the tsr -file. $input is the path to the file to checksum, and $sha256 is the -SHA-256 checksum of the file (ie the ".tsr" value mentioned -above).

- -

-openssl ts -query -data "$inputfile" -cert -sha256 -no_nonce \
-  | curl -s -H "Content-Type: application/timestamp-query" \
-      --data-binary "@-" http://zeitstempel.dfn.de > $sha256.tsr
-

- -

To verify the timestamp, you first need to download the public key -of the trusted timestamp service, for example using this command:

- -

-wget -O ca-cert.txt \
-  https://pki.pca.dfn.de/global-services-ca/pub/cacert/chain.txt
-

- -

Note, the public key should be stored alongside the timestamps in -the archive to make sure it is also available 100 years from now. It -is probably a good idea to standardise how and were to store such -public keys, to make it easier to find for those trying to verify -documents 100 or 1000 years from now. :)

- -

The verification itself is a simple openssl command:

- -

-openssl ts -verify -data $inputfile -in $sha256.tsr \
-  -CAfile ca-cert.txt -text
-

- -

Is there any reason this approach would not work? Is it somehow against -the Noark 5 specification?

-
-
- - - Tags: english, offentlig innsyn, standard. - - -
-
-
-

RSS feed