Petter Reinholdtsen

Epost inn som arkivformat i Riksarkivarens forskrift?

27th April 2017

I disse dager, med frist 1. mai, har Riksarkivaren ute en høring på sin forskrift. Som en kan se er det ikke mye tid igjen før fristen som går ut på søndag. Denne forskriften er det som lister opp hvilke formater det er greit å arkivere i Noark 5-løsninger i Norge.

Jeg fant høringsdokumentene hos Norsk Arkivråd etter å ha blitt tipset på epostlisten til fri programvareprosjektet Nikita Noark5-Core, som lager et Noark 5 Tjenestegresesnitt. Jeg er involvert i Nikita-prosjektet og takket være min interesse for tjenestegrensesnittsprosjektet har jeg lest en god del Noark 5-relaterte dokumenter, og til min overraskelse oppdaget at standard epost ikke er på listen over godkjente formater som kan arkiveres. Høringen med frist søndag er en glimrende mulighet til å forsøke å gjøre noe med det. Jeg holder på med egen høringsuttalelse, og lurer på om andre er interessert i å støtte forslaget om å tillate arkivering av epost som epost i arkivet.

Er du igang med å skrive egen høringsuttalelse allerede? I så fall kan du jo vurdere å ta med en formulering om epost-lagring. Jeg tror ikke det trengs så mye. Her et kort forslag til tekst:

Viser til høring sendt ut 2017-02-17 (Riksarkivarens referanse 2016/9840 HELHJO), og tillater oss å sende inn noen innspill om revisjon av Forskrift om utfyllende tekniske og arkivfaglige bestemmelser om behandling av offentlige arkiver (Riksarkivarens forskrift).

Svært mye av vår kommuikasjon foregår i dag på e-post. Vi foreslår derfor at Internett-e-post, slik det er beskrevet i IETF RFC 5322, https://tools.ietf.org/html/rfc5322. bør inn som godkjent dokumentformat. Vi foreslår at forskriftens oversikt over godkjente dokumentformater ved innlevering i § 5-16 endres til å ta med Internett-e-post.

Som del av arbeidet med tjenestegrensesnitt har vi testet hvordan epost kan lagres i en Noark 5-struktur, og holder på å skrive et forslag om hvordan dette kan gjøres som vil bli sendt over til arkivverket så snart det er ferdig. De som er interesserte kan følge fremdriften på web.

Tags: norsk, offentlig innsyn.

Offentlig elektronisk postjournal blokkerer tilgang for utvalgte webklienter

20th April 2017

Jeg oppdaget i dag at nettstedet som publiserer offentlige postjournaler fra statlige etater, OEP, har begynt å blokkerer enkelte typer webklienter fra å få tilgang. Vet ikke hvor mange det gjelder, men det gjelder i hvert fall libwww-perl og curl. For å teste selv, kjør følgende:

% curl -v -s https://www.oep.no/pub/report.xhtml?reportId=3 2>&1 |grep '< HTTP'
< HTTP/1.1 404 Not Found
% curl -v -s --header 'User-Agent:Opera/12.0' https://www.oep.no/pub/report.xhtml?reportId=3 2>&1 |grep '< HTTP'
< HTTP/1.1 200 OK
%

Her kan en se at tjenesten gir «404 Not Found» for curl i standardoppsettet, mens den gir «200 OK» hvis curl hevder å være Opera versjon 12.0. Offentlig elektronisk postjournal startet blokkeringen 2017-03-02.

Blokkeringen vil gjøre det litt vanskeligere å maskinelt hente informasjon fra oep.no. Kan blokkeringen være gjort for å hindre automatisert innsamling av informasjon fra OEP, slik Pressens Offentlighetsutvalg gjorde for å dokumentere hvordan departementene hindrer innsyn i rapporten «Slik hindrer departementer innsyn» som ble publiserte i januar 2017. Det virker usannsynlig, da det jo er trivielt å bytte User-Agent til noe nytt.

Finnes det juridisk grunnlag for det offentlige å diskriminere webklienter slik det gjøres her? Der tilgang gis eller ikke alt etter hva klienten sier at den heter? Da OEP eies av DIFI og driftes av Basefarm, finnes det kanskje noen dokumenter sendt mellom disse to aktørene man kan be om innsyn i for å forstå hva som har skjedd. Men postjournalen til DIFI viser kun to dokumenter det siste året mellom DIFI og Basefarm. Mimes brønn neste, tenker jeg.

Tags: norsk, offentlig innsyn.

Free software archive system Nikita now able to store documents

19th March 2017

The Nikita Noark 5 core project is implementing the Norwegian standard for keeping an electronic archive of government documents. The Noark 5 standard document the requirement for data systems used by the archives in the Norwegian government, and the Noark 5 web interface specification document a REST web service for storing, searching and retrieving documents and metadata in such archive. I've been involved in the project since a few weeks before Christmas, when the Norwegian Unix User Group announced it supported the project. I believe this is an important project, and hope it can make it possible for the government archives in the future to use free software to keep the archives we citizens depend on. But as I do not hold such archive myself, personally my first use case is to store and analyse public mail journal metadata published from the government. I find it useful to have a clear use case in mind when developing, to make sure the system scratches one of my itches.

If you would like to help make sure there is a free software alternatives for the archives, please join our IRC channel (#nikita on irc.freenode.net) and the project mailing list.

When I got involved, the web service could store metadata about documents. But a few weeks ago, a new milestone was reached when it became possible to store full text documents too. Yesterday, I completed an implementation of a command line tool archive-pdf to upload a PDF file to the archive using this API. The tool is very simple at the moment, and find existing fonds, series and files while asking the user to select which one to use if more than one exist. Once a file is identified, the PDF is associated with the file and uploaded, using the title extracted from the PDF itself. The process is fairly similar to visiting the archive, opening a cabinet, locating a file and storing a piece of paper in the archive. Here is a test run directly after populating the database with test data using our API tester:

~/src//noark5-tester$ ./archive-pdf mangelmelding/mangler.pdf
using arkiv: Title of the test fonds created 2017-03-18T23:49:32.103446
using arkivdel: Title of the test series created 2017-03-18T23:49:32.103446

 0 - Title of the test case file created 2017-03-18T23:49:32.103446
 1 - Title of the test file created 2017-03-18T23:49:32.103446
Select which mappe you want (or search term): 0
Uploading mangelmelding/mangler.pdf
  PDF title: Mangler i spesifikasjonsdokumentet for NOARK 5 Tjenestegrensesnitt
  File 2017/1: Title of the test case file created 2017-03-18T23:49:32.103446
~/src//noark5-tester$

You can see here how the fonds (arkiv) and serie (arkivdel) only had one option, while the user need to choose which file (mappe) to use among the two created by the API tester. The archive-pdf tool can be found in the git repository for the API tester.

In the project, I have been mostly working on the API tester so far, while getting to know the code base. The API tester currently use the HATEOAS links to traverse the entire exposed service API and verify that the exposed operations and objects match the specification, as well as trying to create objects holding metadata and uploading a simple XML file to store. The tester has proved very useful for finding flaws in our implementation, as well as flaws in the reference site and the specification.

The test document I uploaded is a summary of all the specification defects we have collected so far while implementing the web service. There are several unclear and conflicting parts of the specification, and we have started writing down the questions we get from implementing it. We use a format inspired by how The Austin Group collect defect reports for the POSIX standard with their instructions for the MANTIS defect tracker system, in lack of an official way to structure defect reports for Noark 5 (our first submitted defect report was a request for a procedure for submitting defect reports :).

The Nikita project is implemented using Java and Spring, and is fairly easy to get up and running using Docker containers for those that want to test the current code base. The API tester is implemented in Python.

Tags: english, nuug, offentlig innsyn, standard.

Detecting NFS hangs on Linux without hanging yourself...

9th March 2017

Over the years, administrating thousand of NFS mounting linux computers at the time, I often needed a way to detect if the machine was experiencing NFS hang. If you try to use df or look at a file or directory affected by the hang, the process (and possibly the shell) will hang too. So you want to be able to detect this without risking the detection process getting stuck too. It has not been obvious how to do this. When the hang has lasted a while, it is possible to find messages like these in dmesg:

nfs: server nfsserver not responding, still trying
nfs: server nfsserver OK

It is hard to know if the hang is still going on, and it is hard to be sure looking in dmesg is going to work. If there are lots of other messages in dmesg the lines might have rotated out of site before they are noticed.

While reading through the nfs client implementation in linux kernel code, I came across some statistics that seem to give a way to detect it. The om_timeouts sunrpc value in the kernel will increase every time the above log entry is inserted into dmesg. And after digging a bit further, I discovered that this value show up in /proc/self/mountstats on Linux.

The mountstats content seem to be shared between files using the same file system context, so it is enough to check one of the mountstats files to get the state of the mount point for the machine. I assume this will not show lazy umounted NFS points, nor NFS mount points in a different process context (ie with a different filesystem view), but that does not worry me.

The content for a NFS mount point look similar to this:

[...]
device /dev/mapper/Debian-var mounted on /var with fstype ext3
device nfsserver:/mnt/nfsserver/home0 mounted on /mnt/nfsserver/home0 with fstype nfs statvers=1.1
        opts:   rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=129.240.3.145,mountvers=3,mountport=4048,mountproto=udp,local_lock=all
        age:    7863311
        caps:   caps=0x3fe7,wtmult=4096,dtsize=8192,bsize=0,namlen=255
        sec:    flavor=1,pseudoflavor=1
        events: 61063112 732346265 1028140 35486205 16220064 8162542 761447191 71714012 37189 3891185 45561809 110486139 4850138 420353 15449177 296502 52736725 13523379 0 52182 9016896 1231 0 0 0 0 0 
        bytes:  166253035039 219519120027 0 0 40783504807 185466229638 11677877 45561809 
        RPC iostats version: 1.0  p/v: 100003/3 (nfs)
        xprt:   tcp 925 1 6810 0 0 111505412 111480497 109 2672418560317 0 248 53869103 22481820
        per-op statistics
                NULL: 0 0 0 0 0 0 0 0
             GETATTR: 61063106 61063108 0 9621383060 6839064400 453650 77291321 78926132
             SETATTR: 463469 463470 0 92005440 66739536 63787 603235 687943
              LOOKUP: 17021657 17021657 0 3354097764 4013442928 57216 35125459 35566511
              ACCESS: 14281703 14290009 5 2318400592 1713803640 1709282 4865144 7130140
            READLINK: 125 125 0 20472 18620 0 1112 1118
                READ: 4214236 4214237 0 715608524 41328653212 89884 22622768 22806693
               WRITE: 8479010 8494376 22 187695798568 1356087148 178264904 51506907 231671771
              CREATE: 171708 171708 0 38084748 46702272 873 1041833 1050398
               MKDIR: 3680 3680 0 773980 993920 26 23990 24245
             SYMLINK: 903 903 0 233428 245488 6 5865 5917
               MKNOD: 80 80 0 20148 21760 0 299 304
              REMOVE: 429921 429921 0 79796004 61908192 3313 2710416 2741636
               RMDIR: 3367 3367 0 645112 484848 22 5782 6002
              RENAME: 466201 466201 0 130026184 121212260 7075 5935207 5961288
                LINK: 289155 289155 0 72775556 67083960 2199 2565060 2585579
             READDIR: 2933237 2933237 0 516506204 13973833412 10385 3190199 3297917
         READDIRPLUS: 1652839 1652839 0 298640972 6895997744 84735 14307895 14448937
              FSSTAT: 6144 6144 0 1010516 1032192 51 9654 10022
              FSINFO: 2 2 0 232 328 0 1 1
            PATHCONF: 1 1 0 116 140 0 0 0
              COMMIT: 0 0 0 0 0 0 0 0

device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc
[...]

The key number to look at is the third number in the per-op list. It is the number of NFS timeouts experiences per file system operation. Here 22 write timeouts and 5 access timeouts. If these numbers are increasing, I believe the machine is experiencing NFS hang. Unfortunately the timeout value do not start to increase right away. The NFS operations need to time out first, and this can take a while. The exact timeout value depend on the setup. For example the defaults for TCP and UDP mount points are quite different, and the timeout value is affected by the soft, hard, timeo and retrans NFS mount options.

The only way I have been able to get working on Debian and RedHat Enterprise Linux for getting the timeout count is to peek in /proc/. But according to Solaris 10 System Administration Guide: Network Services, the 'nfsstat -c' command can be used to get these timeout values. But this do not work on Linux, as far as I can tell. I asked Debian about this, but have not seen any replies yet.

Is there a better way to figure out if a Linux NFS client is experiencing NFS hangs? Is there a way to detect which processes are affected? Is there a way to get the NFS mount going quickly once the network problem causing the NFS hang has been cleared? I would very much welcome some clues, as we regularly run into NFS hangs.

Tags: debian, english, sysadmin.

How does it feel to be wiretapped, when you should be doing the wiretapping...

8th March 2017

So the new president in the United States of America claim to be surprised to discover that he was wiretapped during the election before he was elected president. He even claim this must be illegal. Well, doh, if it is one thing the confirmations from Snowden documented, it is that the entire population in USA is wiretapped, one way or another. Of course the president candidates were wiretapped, alongside the senators, judges and the rest of the people in USA.

Next, the Federal Bureau of Investigation ask the Department of Justice to go public rejecting the claims that Donald Trump was wiretapped illegally. I fail to see the relevance, given that I am sure the surveillance industry in USA believe they have all the legal backing they need to conduct mass surveillance on the entire world.

There is even the director of the FBI stating that he never saw an order requesting wiretapping of Donald Trump. That is not very surprising, given how the FISA court work, with all its activity being secret. Perhaps he only heard about it?

What I find most sad in this story is how Norwegian journalists present it. In a news reports the other day in the radio from the Norwegian National broadcasting Company (NRK), I heard the journalist claim that 'the FBI denies any wiretapping', while the reality is that 'the FBI denies any illegal wiretapping'. There is a fundamental and important difference, and it make me sad that the journalists are unable to grasp it.

Update 2017-03-13: Look like The Intercept report that US Senator Rand Paul confirm what I state above.

Tags: english, surveillance.

Norwegian Bokmål translation of The Debian Administrator's Handbook complete, proofreading in progress

3rd March 2017

For almost a year now, we have been working on making a Norwegian Bokmål edition of The Debian Administrator's Handbook. Now, thanks to the tireless effort of Ole-Erik, Ingrid and Andreas, the initial translation is complete, and we are working on the proof reading to ensure consistent language and use of correct computer science terms. The plan is to make the book available on paper, as well as in electronic form. For that to happen, the proof reading must be completed and all the figures need to be translated. If you want to help out, get in touch.

A fresh PDF edition in A4 format (the final book will have smaller pages) of the book created every morning is available for proofreading. If you find any errors, please visit Weblate and correct the error. The state of the translation including figures is a useful source for those provide Norwegian bokmål screen shots and figures.

Tags: debian, debian-handbook, english.

Unlimited randomness with the ChaosKey?

1st March 2017

A few days ago I ordered a small batch of the ChaosKey, a small USB dongle for generating entropy created by Bdale Garbee and Keith Packard. Yesterday it arrived, and I am very happy to report that it work great! According to its designers, to get it to work out of the box, you need the Linux kernel version 4.1 or later. I tested on a Debian Stretch machine (kernel version 4.9), and there it worked just fine, increasing the available entropy very quickly. I wrote a small test oneliner to test. It first print the current entropy level, drain /dev/random, and then print the entropy level for five seconds. Here is the situation without the ChaosKey inserted:

% cat /proc/sys/kernel/random/entropy_avail; \
  dd bs=1M if=/dev/random of=/dev/null count=1; \
  for n in $(seq 1 5); do \
     cat /proc/sys/kernel/random/entropy_avail; \
     sleep 1; \
  done
300
0+1 oppføringer inn
0+1 oppføringer ut
28 byte kopiert, 0,000264565 s, 106 kB/s
4
8
12
17
21
%

The entropy level increases by 3-4 every second. In such case any application requiring random bits (like a HTTPS enabled web server) will halt and wait for more entrpy. And here is the situation with the ChaosKey inserted:

% cat /proc/sys/kernel/random/entropy_avail; \
  dd bs=1M if=/dev/random of=/dev/null count=1; \
  for n in $(seq 1 5); do \
     cat /proc/sys/kernel/random/entropy_avail; \
     sleep 1; \
  done
1079
0+1 oppføringer inn
0+1 oppføringer ut
104 byte kopiert, 0,000487647 s, 213 kB/s
433
1028
1031
1035
1038
%

Quite the difference. :) I bought a few more than I need, in case someone want to buy one here in Norway. :)

Update: The dongle was presented at Debconf last year. You might find the talk recording illuminating. It explains exactly what the source of randomness is, if you are unable to spot it from the schema drawing available from the ChaosKey web site linked at the start of this blog post.

Tags: debian, english.

Detect OOXML files with undefined behaviour?

21st February 2017

I just noticed the new Norwegian proposal for archiving rules in the goverment list ECMA-376 / ISO/IEC 29500 (aka OOXML) as valid formats to put in long term storage. Luckily such files will only be accepted based on pre-approval from the National Archive. Allowing OOXML files to be used for long term storage might seem like a good idea as long as we forget that there are plenty of ways for a "valid" OOXML document to have content with no defined interpretation in the standard, which lead to a question and an idea.

Is there any tool to detect if a OOXML document depend on such undefined behaviour? It would be useful for the National Archive (and anyone else interested in verifying that a document is well defined) to have such tool available when considering to approve the use of OOXML. I'm aware of the officeotron OOXML validator, but do not know how complete it is nor if it will report use of undefined behaviour. Are there other similar tools available? Please send me an email if you know of any such tool.

Tags: english, nuug, standard.

Ruling ignored our objections to the seizure of popcorn-time.no (#domstolkontroll)

13th February 2017

A few days ago, we received the ruling from my day in court. The case in question is a challenge of the seizure of the DNS domain popcorn-time.no. The ruling simply did not mention most of our arguments, and seemed to take everything ØKOKRIM said at face value, ignoring our demonstration and explanations. But it is hard to tell for sure, as we still have not seen most of the documents in the case and thus were unprepared and unable to contradict several of the claims made in court by the opposition. We are considering an appeal, but it is partly a question of funding, as it is costing us quite a bit to pay for our lawyer. If you want to help, please donate to the NUUG defense fund.

The details of the case, as far as we know it, is available in Norwegian from the NUUG blog. This also include the ruling itself.

Tags: english, nuug, offentlig innsyn, opphavsrett.

A day in court challenging seizure of popcorn-time.no for #domstolkontroll

3rd February 2017

On Wednesday, I spent the entire day in court in Follo Tingrett representing the member association NUUG, alongside the member association EFN and the DNS registrar IMC, challenging the seizure of the DNS name popcorn-time.no. It was interesting to sit in a court of law for the first time in my life. Our team can be seen in the picture above: attorney Ola Tellesbø, EFN board member Tom Fredrik Blenning, IMC CEO Morten Emil Eriksen and NUUG board member Petter Reinholdtsen.

The case at hand is that the Norwegian National Authority for Investigation and Prosecution of Economic and Environmental Crime (aka Økokrim) decided on their own, to seize a DNS domain early last year, without following the official policy of the Norwegian DNS authority which require a court decision. The web site in question was a site covering Popcorn Time. And Popcorn Time is the name of a technology with both legal and illegal applications. Popcorn Time is a client combining searching a Bittorrent directory available on the Internet with downloading/distribute content via Bittorrent and playing the downloaded content on screen. It can be used illegally if it is used to distribute content against the will of the right holder, but it can also be used legally to play a lot of content, for example the millions of movies available from the Internet Archive or the collection available from Vodo. We created a video demonstrating legally use of Popcorn Time and played it in Court. It can of course be downloaded using Bittorrent.

I did not quite know what to expect from a day in court. The government held on to their version of the story and we held on to ours, and I hope the judge is able to make sense of it all. We will know in two weeks time. Unfortunately I do not have high hopes, as the Government have the upper hand here with more knowledge about the case, better training in handling criminal law and in general higher standing in the courts than fairly unknown DNS registrar and member associations. It is expensive to be right also in Norway. So far the case have cost more than NOK 70 000,-. To help fund the case, NUUG and EFN have asked for donations, and managed to collect around NOK 25 000,- so far. Given the presentation from the Government, I expect the government to appeal if the case go our way. And if the case do not go our way, I hope we have enough funding to appeal.

From the other side came two people from Økokrim. On the benches, appearing to be part of the group from the government were two people from the Simonsen Vogt Wiik lawyer office, and three others I am not quite sure who was. Økokrim had proposed to present two witnesses from The Motion Picture Association, but this was rejected because they did not speak Norwegian and it was a bit late to bring in a translator, but perhaps the two from MPA were present anyway. All seven appeared to know each other. Good to see the case is take seriously.

If you, like me, believe the courts should be involved before a DNS domain is hijacked by the government, or you believe the Popcorn Time technology have a lot of useful and legal applications, I suggest you too donate to the NUUG defense fund. Both Bitcoin and bank transfer are available. If NUUG get more than we need for the legal action (very unlikely), the rest will be spend promoting free software, open standards and unix-like operating systems in Norway, so no matter what happens the money will be put to good use.

If you want to lean more about the case, I recommend you check out the blog posts from NUUG covering the case. They cover the legal arguments on both sides.

Tags: english, nuug, offentlig innsyn, opphavsrett.