]> pere.pagekite.me Git - homepage.git/blob - blog/data/2017-03-19-noark5-nikita.txt
Generated.
[homepage.git] / blog / data / 2017-03-19-noark5-nikita.txt
1 Title: Free software archive system Nikita now able to store documents
2 Tags: english, standard, nuug, offentlig innsyn
3 Date: 2017-03-19 08:00
4
5 <p>The <a href="https://github.com/hiOA-ABI/nikita-noark5-core">Nikita
6 Noark 5 core project</a> is implementing the Norwegian standard for
7 keeping an electronic archive of government documents.
8 <a href="http://www.arkivverket.no/arkivverket/Offentlig-forvaltning/Noark/Noark-5/English-version">The
9 Noark 5 standard</a> document the requirement for data systems used by
10 the archives in the Norwegian government, and the Noark 5 web interface
11 specification document a REST web service for storing, searching and
12 retrieving documents and metadata in such archive. I've been involved
13 in the project since a few weeks before Christmas, when the Norwegian
14 Unix User Group
15 <a href="https://www.nuug.no/news/NOARK5_kjerne_som_fri_programvare_f_r_epostliste_hos_NUUG.shtml">announced
16 it supported the project</a>. I believe this is an important project,
17 and hope it can make it possible for the government archives in the
18 future to use free software to keep the archives we citizens depend
19 on. But as I do not hold such archive myself, personally my first use
20 case is to store and analyse public mail journal metadata published
21 from the government. I find it useful to have a clear use case in
22 mind when developing, to make sure the system scratches one of my
23 itches.</p>
24
25 <p>If you would like to help make sure there is a free software
26 alternatives for the archives, please join our IRC channel
27 (<a href="irc://irc.freenode.net/%23nikita">#nikita on
28 irc.freenode.net</a>) and
29 <a href="https://lists.nuug.no/mailman/listinfo/nikita-noark">the
30 project mailing list</a>.</p>
31
32 <p>When I got involved, the web service could store metadata about
33 documents. But a few weeks ago, a new milestone was reached when it
34 became possible to store full text documents too. Yesterday, I
35 completed an implementation of a command line tool
36 <tt>archive-pdf</tt> to upload a PDF file to the archive using this
37 API. The tool is very simple at the moment, and find existing
38 <a href="https://en.wikipedia.org/wiki/Fonds">fonds</a>, series and
39 files while asking the user to select which one to use if more than
40 one exist. Once a file is identified, the PDF is associated with the
41 file and uploaded, using the title extracted from the PDF itself. The
42 process is fairly similar to visiting the archive, opening a cabinet,
43 locating a file and storing a piece of paper in the archive. Here is
44 a test run directly after populating the database with test data using
45 our API tester:</p>
46
47 <p><blockquote><pre>
48 ~/src//noark5-tester$ ./archive-pdf mangelmelding/mangler.pdf
49 using arkiv: Title of the test fonds created 2017-03-18T23:49:32.103446
50 using arkivdel: Title of the test series created 2017-03-18T23:49:32.103446
51
52 0 - Title of the test case file created 2017-03-18T23:49:32.103446
53 1 - Title of the test file created 2017-03-18T23:49:32.103446
54 Select which mappe you want (or search term): 0
55 Uploading mangelmelding/mangler.pdf
56 PDF title: Mangler i spesifikasjonsdokumentet for NOARK 5 Tjenestegrensesnitt
57 File 2017/1: Title of the test case file created 2017-03-18T23:49:32.103446
58 ~/src//noark5-tester$
59 </pre></blockquote></p>
60
61 <p>You can see here how the fonds (arkiv) and serie (arkivdel) only had
62 one option, while the user need to choose which file (mappe) to use
63 among the two created by the API tester. The <tt>archive-pdf</tt>
64 tool can be found in the git repository for the API tester.</p>
65
66 <p>In the project, I have been mostly working on
67 <a href="https://github.com/petterreinholdtsen/noark5-tester">the API
68 tester</a> so far, while getting to know the code base. The API
69 tester currently use
70 <a href="https://en.wikipedia.org/wiki/HATEOAS">the HATEOAS links</a>
71 to traverse the entire exposed service API and verify that the exposed
72 operations and objects match the specification, as well as trying to
73 create objects holding metadata and uploading a simple XML file to
74 store. The tester has proved very useful for finding flaws in our
75 implementation, as well as flaws in the reference site and the
76 specification.</p>
77
78 <p>The test document I uploaded is a summary of all the specification
79 defects we have collected so far while implementing the web service.
80 There are several unclear and conflicting parts of the specification,
81 and we have
82 <a href="https://github.com/petterreinholdtsen/noark5-tester/tree/master/mangelmelding">started
83 writing down</a> the questions we get from implementing it. We use a
84 format inspired by how <a href="http://www.opengroup.org/austin/">The
85 Austin Group</a> collect defect reports for the POSIX standard with
86 <a href="http://www.opengroup.org/austin/mantis.html">their
87 instructions for the MANTIS defect tracker system</a>, in lack of an official way to structure defect reports for Noark 5 (our first submitted defect report was a <a href="https://github.com/petterreinholdtsen/noark5-tester/blob/master/mangelmelding/sendt/2017-03-15-mangel-prosess.md">request for a procedure for submitting defect reports</a> :).
88
89 <p>The Nikita project is implemented using Java and Spring, and is
90 fairly easy to get up and running using Docker containers for those
91 that want to test the current code base. The API tester is
92 implemented in Python.</p>