Petter Reinholdtsen

Updated sales number for my Free Culture paper editions

12th June 2017

It is pleasing to see that the work we put down in publishing new editions of the classic Free Culture book by the founder of the Creative Commons movement, Lawrence Lessig, is still being appreciated. I had a look at the latest sales numbers for the paper edition today. Not too impressive, but some buyers still exist. All the revenue from the books are sent to the Creative Commons Corporation, and they receive the largest cut if you buy directly from Lulu. Most books are sold via Amazon, with Ingram second and only a small fraction directly from Lulu. The ebook edition is available for free from Github.

Title / language	Quantity
Title / language	2016 jan-jun	2016 jul-dec	2017 jan-may
Culture Libre / French	3	6	15
Fri kultur / Norwegian	7	1	0
Free Culture / English	14	27	16
Total	24	34	31

A bit sad to see the low sales number on the Norwegian edition, and a bit surprising the English edition still selling so well.

If you would like to translate and publish the book in your native language, I would be happy to help make it happen. Please get in touch.

Tags: docbook, english, freeculture.

Release 0.1.1 of free software archive system Nikita announced

10th June 2017

I am very happy to report that the Nikita Noark 5 core project tagged its second release today. The free software solution is an implementation of the Norwegian archive standard Noark 5 used by government offices in Norway. These were the changes in version 0.1.1 since version 0.1.0 (from NEWS.md):

Continued work on the angularjs GUI, including document upload.
Implemented correspondencepartPerson, correspondencepartUnit and correspondencepartInternal
Applied for coverity coverage and started submitting code on regualr basis.
Started fixing bugs reported by coverity
Corrected and completed HATEOAS links to make sure entire API is available via URLs in _links.
Corrected all relation URLs to use trailing slash.
Add initial support for storing data in ElasticSearch.
Now able to receive and store uploaded files in the archive.
Changed JSON output for object lists to have relations in _links.
Improve JSON output for empty object lists.
Now uses correct MIME type application/vnd.noark5-v4+json.
Added support for docker container images.
Added simple API browser implemented in JavaScript/Angular.
Started on archive client implemented in JavaScript/Angular.
Started on prototype to show the public mail journal.
Improved performance by disabling Sprint FileWatcher.
Added support for 'arkivskaper', 'saksmappe' and 'journalpost'.
Added support for some metadata codelists.
Added support for Cross-origin resource sharing (CORS).
Changed login method from Basic Auth to JSON Web Token (RFC 7519) style.
Added support for GET-ing ny-* URLs.
Added support for modifying entities using PUT and eTag.
Added support for returning XML output on request.
Removed support for English field and class names, limiting ourself to the official names.
...

If this sound interesting to you, please contact us on IRC (#nikita on irc.freenode.net) or email (nikita-noark mailing list).

Tags: english, nuug, offentlig innsyn, standard.

Idea for storing trusted timestamps in a Noark 5 archive

7th June 2017

This is a copy of an email I posted to the nikita-noark mailing list. Please follow up there if you would like to discuss this topic. The background is that we are making a free software archive system based on the Norwegian Noark 5 standard for government archives.

I've been wondering a bit lately how trusted timestamps could be stored in Noark 5. Trusted timestamps can be used to verify that some information (document/file/checksum/metadata) have not been changed since a specific time in the past. This is useful to verify the integrity of the documents in the archive.

Then it occured to me, perhaps the trusted timestamps could be stored as dokument variants (ie dokumentobjekt referered to from dokumentbeskrivelse) with the filename set to the hash it is stamping?

Given a "dokumentbeskrivelse" with an associated "dokumentobjekt", a new dokumentobjekt is associated with "dokumentbeskrivelse" with the same attributes as the stamped dokumentobjekt except these attributes:

format -> "RFC3161"
mimeType -> "application/timestamp-reply"
formatDetaljer -> "<source URL for timestamp service>"
filenavn -> "<sjekksum>.tsr"

This assume a service following IETF RFC 3161 is used, which specifiy the given MIME type for replies and the .tsr file ending for the content of such trusted timestamp. As far as I can tell from the Noark 5 specifications, it is OK to have several variants/renderings of a dokument attached to a given dokumentbeskrivelse objekt. It might be stretching it a bit to make some of these variants represent crypto-signatures useful for verifying the document integrity instead of representing the dokument itself.

Using the source of the service in formatDetaljer allow several timestamping services to be used. This is useful to spread the risk of key compromise over several organisations. It would only be a problem to trust the timestamps if all of the organisations are compromised.

The following oneliner on Linux can be used to generate the tsr file. $input is the path to the file to checksum, and $sha256 is the SHA-256 checksum of the file (ie the ".tsr" value mentioned above).

openssl ts -query -data "$inputfile" -cert -sha256 -no_nonce \
  | curl -s -H "Content-Type: application/timestamp-query" \
      --data-binary "@-" http://zeitstempel.dfn.de > $sha256.tsr

To verify the timestamp, you first need to download the public key of the trusted timestamp service, for example using this command:

wget -O ca-cert.txt \
  https://pki.pca.dfn.de/global-services-ca/pub/cacert/chain.txt

Note, the public key should be stored alongside the timestamps in the archive to make sure it is also available 100 years from now. It is probably a good idea to standardise how and were to store such public keys, to make it easier to find for those trying to verify documents 100 or 1000 years from now. :)

The verification itself is a simple openssl command:

openssl ts -verify -data $inputfile -in $sha256.tsr \
  -CAfile ca-cert.txt -text

Is there any reason this approach would not work? Is it somehow against the Noark 5 specification?

Tags: english, offentlig innsyn, standard.

Når nynorskoversettelsen svikter til eksamen...

3rd June 2017

Aftenposten melder i dag om feil i eksamensoppgavene for eksamen i politikk og menneskerettigheter, der teksten i bokmåls og nynorskutgaven ikke var like. Oppgaveteksten er gjengitt i artikkelen, og jeg ble nysgjerring på om den fri oversetterløsningen Apertium ville gjort en bedre jobb enn Utdanningsdirektoratet. Det kan se slik ut.

Her er bokmålsoppgaven fra eksamenen:

Drøft utfordringene knyttet til nasjonalstatenes og andre aktørers rolle og muligheter til å håndtere internasjonale utfordringer, som for eksempel flykningekrisen.

Vedlegge er eksempler på tekster som kan gi relevante perspektiver på temaet:

Flykningeregnskapet 2016, UNHCR og IDMC
«Grenseløst Europa for fall» A-Magasinet, 26. november 2015

Dette oversetter Apertium slik:

Drøft utfordringane knytte til nasjonalstatane sine og rolla til andre aktørar og høve til å handtera internasjonale utfordringar, som til dømes *flykningekrisen.

Vedleggja er døme på tekster som kan gje relevante perspektiv på temaet:

*Flykningeregnskapet 2016, *UNHCR og *IDMC

«*Grenseløst Europa for fall» A-Magasinet, 26. november 2015

Ord som ikke ble forstått er markert med stjerne (*), og trenger ekstra språksjekk. Men ingen ord er forsvunnet, slik det var i oppgaven elevene fikk presentert på eksamen. Jeg mistenker dog at "andre aktørers rolle og muligheter til ..." burde vært oversatt til "rolla til andre aktørar og deira høve til ..." eller noe slikt, men det er kanskje flisespikking. Det understreker vel bare at det alltid trengs korrekturlesning etter automatisk oversettelse.

Tags: debian, norsk, stavekontroll.

Epost inn som arkivformat i Riksarkivarens forskrift?

27th April 2017

I disse dager, med frist 1. mai, har Riksarkivaren ute en høring på sin forskrift. Som en kan se er det ikke mye tid igjen før fristen som går ut på søndag. Denne forskriften er det som lister opp hvilke formater det er greit å arkivere i Noark 5-løsninger i Norge.

Jeg fant høringsdokumentene hos Norsk Arkivråd etter å ha blitt tipset på epostlisten til fri programvareprosjektet Nikita Noark5-Core, som lager et Noark 5 Tjenestegresesnitt. Jeg er involvert i Nikita-prosjektet og takket være min interesse for tjenestegrensesnittsprosjektet har jeg lest en god del Noark 5-relaterte dokumenter, og til min overraskelse oppdaget at standard epost ikke er på listen over godkjente formater som kan arkiveres. Høringen med frist søndag er en glimrende mulighet til å forsøke å gjøre noe med det. Jeg holder på med egen høringsuttalelse, og lurer på om andre er interessert i å støtte forslaget om å tillate arkivering av epost som epost i arkivet.

Er du igang med å skrive egen høringsuttalelse allerede? I så fall kan du jo vurdere å ta med en formulering om epost-lagring. Jeg tror ikke det trengs så mye. Her et kort forslag til tekst:

Viser til høring sendt ut 2017-02-17 (Riksarkivarens referanse 2016/9840 HELHJO), og tillater oss å sende inn noen innspill om revisjon av Forskrift om utfyllende tekniske og arkivfaglige bestemmelser om behandling av offentlige arkiver (Riksarkivarens forskrift).

Svært mye av vår kommuikasjon foregår i dag på e-post. Vi foreslår derfor at Internett-e-post, slik det er beskrevet i IETF RFC 5322, https://tools.ietf.org/html/rfc5322. bør inn som godkjent dokumentformat. Vi foreslår at forskriftens oversikt over godkjente dokumentformater ved innlevering i § 5-16 endres til å ta med Internett-e-post.

Som del av arbeidet med tjenestegrensesnitt har vi testet hvordan epost kan lagres i en Noark 5-struktur, og holder på å skrive et forslag om hvordan dette kan gjøres som vil bli sendt over til arkivverket så snart det er ferdig. De som er interesserte kan følge fremdriften på web.

Oppdatering 2017-04-28: I dag ble høringuttalelsen jeg skrev sendt inn av foreningen NUUG.

Tags: norsk, offentlig innsyn, standard.

Offentlig elektronisk postjournal blokkerer tilgang for utvalgte webklienter

20th April 2017

Jeg oppdaget i dag at nettstedet som publiserer offentlige postjournaler fra statlige etater, OEP, har begynt å blokkerer enkelte typer webklienter fra å få tilgang. Vet ikke hvor mange det gjelder, men det gjelder i hvert fall libwww-perl og curl. For å teste selv, kjør følgende:

% curl -v -s https://www.oep.no/pub/report.xhtml?reportId=3 2>&1 |grep '< HTTP'
< HTTP/1.1 404 Not Found
% curl -v -s --header 'User-Agent:Opera/12.0' https://www.oep.no/pub/report.xhtml?reportId=3 2>&1 |grep '< HTTP'
< HTTP/1.1 200 OK
%

Her kan en se at tjenesten gir «404 Not Found» for curl i standardoppsettet, mens den gir «200 OK» hvis curl hevder å være Opera versjon 12.0. Offentlig elektronisk postjournal startet blokkeringen 2017-03-02.

Blokkeringen vil gjøre det litt vanskeligere å maskinelt hente informasjon fra oep.no. Kan blokkeringen være gjort for å hindre automatisert innsamling av informasjon fra OEP, slik Pressens Offentlighetsutvalg gjorde for å dokumentere hvordan departementene hindrer innsyn i rapporten «Slik hindrer departementer innsyn» som ble publiserte i januar 2017. Det virker usannsynlig, da det jo er trivielt å bytte User-Agent til noe nytt.

Finnes det juridisk grunnlag for det offentlige å diskriminere webklienter slik det gjøres her? Der tilgang gis eller ikke alt etter hva klienten sier at den heter? Da OEP eies av DIFI og driftes av Basefarm, finnes det kanskje noen dokumenter sendt mellom disse to aktørene man kan be om innsyn i for å forstå hva som har skjedd. Men postjournalen til DIFI viser kun to dokumenter det siste året mellom DIFI og Basefarm. Mimes brønn neste, tenker jeg.

Tags: norsk, offentlig innsyn.

Free software archive system Nikita now able to store documents

19th March 2017

The Nikita Noark 5 core project is implementing the Norwegian standard for keeping an electronic archive of government documents. The Noark 5 standard document the requirement for data systems used by the archives in the Norwegian government, and the Noark 5 web interface specification document a REST web service for storing, searching and retrieving documents and metadata in such archive. I've been involved in the project since a few weeks before Christmas, when the Norwegian Unix User Group announced it supported the project. I believe this is an important project, and hope it can make it possible for the government archives in the future to use free software to keep the archives we citizens depend on. But as I do not hold such archive myself, personally my first use case is to store and analyse public mail journal metadata published from the government. I find it useful to have a clear use case in mind when developing, to make sure the system scratches one of my itches.

If you would like to help make sure there is a free software alternatives for the archives, please join our IRC channel (#nikita on irc.freenode.net) and the project mailing list.

When I got involved, the web service could store metadata about documents. But a few weeks ago, a new milestone was reached when it became possible to store full text documents too. Yesterday, I completed an implementation of a command line tool archive-pdf to upload a PDF file to the archive using this API. The tool is very simple at the moment, and find existing fonds, series and files while asking the user to select which one to use if more than one exist. Once a file is identified, the PDF is associated with the file and uploaded, using the title extracted from the PDF itself. The process is fairly similar to visiting the archive, opening a cabinet, locating a file and storing a piece of paper in the archive. Here is a test run directly after populating the database with test data using our API tester:

~/src//noark5-tester$ ./archive-pdf mangelmelding/mangler.pdf
using arkiv: Title of the test fonds created 2017-03-18T23:49:32.103446
using arkivdel: Title of the test series created 2017-03-18T23:49:32.103446

 0 - Title of the test case file created 2017-03-18T23:49:32.103446
 1 - Title of the test file created 2017-03-18T23:49:32.103446
Select which mappe you want (or search term): 0
Uploading mangelmelding/mangler.pdf
  PDF title: Mangler i spesifikasjonsdokumentet for NOARK 5 Tjenestegrensesnitt
  File 2017/1: Title of the test case file created 2017-03-18T23:49:32.103446
~/src//noark5-tester$

You can see here how the fonds (arkiv) and serie (arkivdel) only had one option, while the user need to choose which file (mappe) to use among the two created by the API tester. The archive-pdf tool can be found in the git repository for the API tester.

In the project, I have been mostly working on the API tester so far, while getting to know the code base. The API tester currently use the HATEOAS links to traverse the entire exposed service API and verify that the exposed operations and objects match the specification, as well as trying to create objects holding metadata and uploading a simple XML file to store. The tester has proved very useful for finding flaws in our implementation, as well as flaws in the reference site and the specification.

The test document I uploaded is a summary of all the specification defects we have collected so far while implementing the web service. There are several unclear and conflicting parts of the specification, and we have started writing down the questions we get from implementing it. We use a format inspired by how The Austin Group collect defect reports for the POSIX standard with their instructions for the MANTIS defect tracker system, in lack of an official way to structure defect reports for Noark 5 (our first submitted defect report was a request for a procedure for submitting defect reports :).

The Nikita project is implemented using Java and Spring, and is fairly easy to get up and running using Docker containers for those that want to test the current code base. The API tester is implemented in Python.

Tags: english, nuug, offentlig innsyn, standard.

Detecting NFS hangs on Linux without hanging yourself...

9th March 2017

Over the years, administrating thousand of NFS mounting linux computers at the time, I often needed a way to detect if the machine was experiencing NFS hang. If you try to use df or look at a file or directory affected by the hang, the process (and possibly the shell) will hang too. So you want to be able to detect this without risking the detection process getting stuck too. It has not been obvious how to do this. When the hang has lasted a while, it is possible to find messages like these in dmesg:

nfs: server nfsserver not responding, still trying
nfs: server nfsserver OK

It is hard to know if the hang is still going on, and it is hard to be sure looking in dmesg is going to work. If there are lots of other messages in dmesg the lines might have rotated out of site before they are noticed.

While reading through the nfs client implementation in linux kernel code, I came across some statistics that seem to give a way to detect it. The om_timeouts sunrpc value in the kernel will increase every time the above log entry is inserted into dmesg. And after digging a bit further, I discovered that this value show up in /proc/self/mountstats on Linux.

The mountstats content seem to be shared between files using the same file system context, so it is enough to check one of the mountstats files to get the state of the mount point for the machine. I assume this will not show lazy umounted NFS points, nor NFS mount points in a different process context (ie with a different filesystem view), but that does not worry me.

The content for a NFS mount point look similar to this:

[...]
device /dev/mapper/Debian-var mounted on /var with fstype ext3
device nfsserver:/mnt/nfsserver/home0 mounted on /mnt/nfsserver/home0 with fstype nfs statvers=1.1
        opts:   rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=129.240.3.145,mountvers=3,mountport=4048,mountproto=udp,local_lock=all
        age:    7863311
        caps:   caps=0x3fe7,wtmult=4096,dtsize=8192,bsize=0,namlen=255
        sec:    flavor=1,pseudoflavor=1
        events: 61063112 732346265 1028140 35486205 16220064 8162542 761447191 71714012 37189 3891185 45561809 110486139 4850138 420353 15449177 296502 52736725 13523379 0 52182 9016896 1231 0 0 0 0 0 
        bytes:  166253035039 219519120027 0 0 40783504807 185466229638 11677877 45561809 
        RPC iostats version: 1.0  p/v: 100003/3 (nfs)
        xprt:   tcp 925 1 6810 0 0 111505412 111480497 109 2672418560317 0 248 53869103 22481820
        per-op statistics
                NULL: 0 0 0 0 0 0 0 0
             GETATTR: 61063106 61063108 0 9621383060 6839064400 453650 77291321 78926132
             SETATTR: 463469 463470 0 92005440 66739536 63787 603235 687943
              LOOKUP: 17021657 17021657 0 3354097764 4013442928 57216 35125459 35566511
              ACCESS: 14281703 14290009 5 2318400592 1713803640 1709282 4865144 7130140
            READLINK: 125 125 0 20472 18620 0 1112 1118
                READ: 4214236 4214237 0 715608524 41328653212 89884 22622768 22806693
               WRITE: 8479010 8494376 22 187695798568 1356087148 178264904 51506907 231671771
              CREATE: 171708 171708 0 38084748 46702272 873 1041833 1050398
               MKDIR: 3680 3680 0 773980 993920 26 23990 24245
             SYMLINK: 903 903 0 233428 245488 6 5865 5917
               MKNOD: 80 80 0 20148 21760 0 299 304
              REMOVE: 429921 429921 0 79796004 61908192 3313 2710416 2741636
               RMDIR: 3367 3367 0 645112 484848 22 5782 6002
              RENAME: 466201 466201 0 130026184 121212260 7075 5935207 5961288
                LINK: 289155 289155 0 72775556 67083960 2199 2565060 2585579
             READDIR: 2933237 2933237 0 516506204 13973833412 10385 3190199 3297917
         READDIRPLUS: 1652839 1652839 0 298640972 6895997744 84735 14307895 14448937
              FSSTAT: 6144 6144 0 1010516 1032192 51 9654 10022
              FSINFO: 2 2 0 232 328 0 1 1
            PATHCONF: 1 1 0 116 140 0 0 0
              COMMIT: 0 0 0 0 0 0 0 0

device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc
[...]

The key number to look at is the third number in the per-op list. It is the number of NFS timeouts experiences per file system operation. Here 22 write timeouts and 5 access timeouts. If these numbers are increasing, I believe the machine is experiencing NFS hang. Unfortunately the timeout value do not start to increase right away. The NFS operations need to time out first, and this can take a while. The exact timeout value depend on the setup. For example the defaults for TCP and UDP mount points are quite different, and the timeout value is affected by the soft, hard, timeo and retrans NFS mount options.

The only way I have been able to get working on Debian and RedHat Enterprise Linux for getting the timeout count is to peek in /proc/. But according to Solaris 10 System Administration Guide: Network Services, the 'nfsstat -c' command can be used to get these timeout values. But this do not work on Linux, as far as I can tell. I asked Debian about this, but have not seen any replies yet.

Is there a better way to figure out if a Linux NFS client is experiencing NFS hangs? Is there a way to detect which processes are affected? Is there a way to get the NFS mount going quickly once the network problem causing the NFS hang has been cleared? I would very much welcome some clues, as we regularly run into NFS hangs.

Tags: debian, english, sysadmin.

How does it feel to be wiretapped, when you should be doing the wiretapping...

8th March 2017

So the new president in the United States of America claim to be surprised to discover that he was wiretapped during the election before he was elected president. He even claim this must be illegal. Well, doh, if it is one thing the confirmations from Snowden documented, it is that the entire population in USA is wiretapped, one way or another. Of course the president candidates were wiretapped, alongside the senators, judges and the rest of the people in USA.

Next, the Federal Bureau of Investigation ask the Department of Justice to go public rejecting the claims that Donald Trump was wiretapped illegally. I fail to see the relevance, given that I am sure the surveillance industry in USA believe they have all the legal backing they need to conduct mass surveillance on the entire world.

There is even the director of the FBI stating that he never saw an order requesting wiretapping of Donald Trump. That is not very surprising, given how the FISA court work, with all its activity being secret. Perhaps he only heard about it?

What I find most sad in this story is how Norwegian journalists present it. In a news reports the other day in the radio from the Norwegian National broadcasting Company (NRK), I heard the journalist claim that 'the FBI denies any wiretapping', while the reality is that 'the FBI denies any illegal wiretapping'. There is a fundamental and important difference, and it make me sad that the journalists are unable to grasp it.

Update 2017-03-13: Look like The Intercept report that US Senator Rand Paul confirm what I state above.

Tags: english, surveillance.

Petter Reinholdtsen

Archive

Tags