Did you ever wonder where the web trafic really flow to reach the
-web servers, and who own the network equipment it is flowing through?
-It is possible to get a glimpse of this from using traceroute, but it
-is hard to find all the details. Many years ago, I wrote a system to
-map the Norwegian Internet (trying to figure out if our plans for a
-network game service would get low enough latency, and who we needed
-to talk to about setting up game servers close to the users. Back
-then I used traceroute output from many locations (I asked my friends
-to run a script and send me their traceroute output) to create the
-graph and the map. The output from traceroute typically look like
-this:
-
-
-traceroute to www.stortinget.no (85.88.67.10), 30 hops max, 60 byte packets
- 1 uio-gw10.uio.no (129.240.202.1) 0.447 ms 0.486 ms 0.621 ms
- 2 uio-gw8.uio.no (129.240.24.229) 0.467 ms 0.578 ms 0.675 ms
- 3 oslo-gw1.uninett.no (128.39.65.17) 0.385 ms 0.373 ms 0.358 ms
- 4 te3-1-2.br1.fn3.as2116.net (193.156.90.3) 1.174 ms 1.172 ms 1.153 ms
- 5 he16-1-1.cr1.san110.as2116.net (195.0.244.234) 2.627 ms he16-1-1.cr2.oslosda310.as2116.net (195.0.244.48) 3.172 ms he16-1-1.cr1.san110.as2116.net (195.0.244.234) 2.857 ms
- 6 ae1.ar8.oslosda310.as2116.net (195.0.242.39) 0.662 ms 0.637 ms ae0.ar8.oslosda310.as2116.net (195.0.242.23) 0.622 ms
- 7 89.191.10.146 (89.191.10.146) 0.931 ms 0.917 ms 0.955 ms
- 8 * * *
- 9 * * *
-[...]
-
-
-
This show the DNS names and IP addresses of (at least some of the)
-network equipment involved in getting the data traffic from me to the
-www.stortinget.no server, and how long it took in milliseconds for a
-package to reach the equipment and return to me. Three packages are
-sent, and some times the packages do not follow the same path. This
-is shown for hop 5, where three different IP addresses replied to the
-traceroute request.
-
-
There are many ways to measure trace routes. Other good traceroute
-implementations I use are traceroute (using ICMP packages) mtr (can do
-both ICMP, UDP and TCP) and scapy (python library with ICMP, UDP, TCP
-traceroute and a lot of other capabilities). All of them are easily
-available in Debian.
-
-
This time around, I wanted to know the geographic location of
-different route points, to visualize how visiting a web page spread
-information about the visit to a lot of servers around the globe. The
-background is that a web site today often will ask the browser to get
-from many servers the parts (for example HTML, JSON, fonts,
-JavaScript, CSS, video) required to display the content. This will
-leak information about the visit to those controlling these servers
-and anyone able to peek at the data traffic passing by (like your ISP,
-the ISPs backbone provider, FRA, GCHQ, NSA and others).
-
-
Lets pick an example, the Norwegian parliament web site
-www.stortinget.no. It is read daily by all members of parliament and
-their staff, as well as political journalists, activits and many other
-citizens of Norway. A visit to the www.stortinget.no web site will
-ask your browser to contact 8 other servers: ajax.googleapis.com,
-insights.hotjar.com, script.hotjar.com, static.hotjar.com,
-stats.g.doubleclick.net, www.google-analytics.com,
-www.googletagmanager.com and www.netigate.se. I extracted this by
-asking PhantomJS to visit the
-Stortinget web page and tell me all the URLs PhantomJS downloaded to
-render the page (in HAR format using
-their
-netsniff example. I am very grateful to Gorm for showing me how
-to do this). My goal is to visualize network traces to all IP
-addresses behind these DNS names, do show where visitors personal
-information is spread when visiting the page.
-
-
![map of combined traces for URLs used by www.stortinget.no using GeoIP]()
-
-
When I had a look around for options, I could not find any good
-free software tools to do this, and decided I needed my own traceroute
-wrapper outputting KML based on locations looked up using GeoIP. KML
-is easy to work with and easy to generate, and understood by several
-of the GIS tools I have available. I got good help from by NUUG
-colleague Anders Einar with this, and the result can be seen in
-my
-kmltraceroute git repository. Unfortunately, the quality of the
-free GeoIP databases I could find (and the for-pay databases my
-friends had access to) is not up to the task. The IP addresses of
-central Internet infrastructure would typically be placed near the
-controlling companies main office, and not where the router is really
-located, as you can see from the
-KML file I created using the GeoLite City dataset from MaxMind.
-
-
![scapy traceroute graph for URLs used by www.stortinget.no]()
-
-
I also had a look at the visual traceroute graph created by
-the scrapy project,
-showing IP network ownership (aka AS owner) for the IP address in
-question.
-The
-graph display a lot of useful information about the traceroute in SVG
-format, and give a good indication on who control the network
-equipment involved, but it do not include geolocation. This graph
-make it possible to see the information is made available at least for
-UNINETT, Catchcom, Stortinget, Nordunet, Google, Amazon, Telia, Level
-3 Communications and NetDNA.
-
-
![example geotraceroute view for www.stortinget.no]()
-
-
In the process, I came across the
-web service GeoTraceroute by
-Salim Gasmi. Its methology of combining guesses based on DNS names,
-various location databases and finally use latecy times to rule out
-candidate locations seemed to do a very good job of guessing correct
-geolocation. But it could only do one trace at the time, did not have
-a sensor in Norway and did not make the geolocations easily available
-for postprocessing. So I contacted the developer and asked if he
-would be willing to share the code (he refused until he had time to
-clean it up), but he was interested in providing the geolocations in a
-machine readable format, and willing to set up a sensor in Norway. So
-since yesterday, it is possible to run traces from Norway in this
-service thanks to a sensor node set up by
-the NUUG assosiation, and get the
-trace in KML format for further processing.
-
-
![map of combined traces for URLs used by www.stortinget.no using geotraceroute]()
-
-
Here we can see a lot of trafic passes Sweden on its way to
-Denmark, Germany, Holland and Ireland. Plenty of places where the
-Snowden confirmations verified the traffic is read by various actors
-without your best interest as their top priority.
-
-
Combining KML files is trivial using a text editor, so I could loop
-over all the hosts behind the urls imported by www.stortinget.no and
-ask for the KML file from GeoTraceroute, and create a combined KML
-file with all the traces (unfortunately only one of the IP addresses
-behind the DNS name is traced this time. To get them all, one would
-have to request traces using IP number instead of DNS names from
-GeoTraceroute). That might be the next step in this project.
-
-
Armed with these tools, I find it a lot easier to figure out where
-the IP traffic moves and who control the boxes involved in moving it.
-And every time the link crosses for example the Swedish border, we can
-be sure Swedish Signal Intelligence (FRA) is listening, as GCHQ do in
-Britain and NSA in USA and cables around the globe. (Hm, what should
-we tell them? :) Keep that in mind if you ever send anything
-unencrypted over the Internet.
-
-
PS: KML files are drawn using
-the KML viewer from Ivan
-Rublev, as it was less cluttered than the local Linux application
-Marble. There are heaps of other options too.
-
-
As usual, if you use Bitcoin and want to show your support of my
-activities, please send Bitcoin donations to my address
-15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
+
+
20th April 2017
+
Jeg oppdaget i dag at nettstedet som
+publiserer offentlige postjournaler fra statlige etater, OEP, har
+begynt å blokkerer enkelte typer webklienter fra å få tilgang. Vet
+ikke hvor mange det gjelder, men det gjelder i hvert fall libwww-perl
+og curl. For å teste selv, kjør følgende:
+
+
+% curl -v -s https://www.oep.no/pub/report.xhtml?reportId=3 2>&1 |grep '< HTTP'
+< HTTP/1.1 404 Not Found
+% curl -v -s --header 'User-Agent:Opera/12.0' https://www.oep.no/pub/report.xhtml?reportId=3 2>&1 |grep '< HTTP'
+< HTTP/1.1 200 OK
+%
+
+
+
Her kan en se at tjenesten gir «404 Not Found» for curl i
+standardoppsettet, mens den gir «200 OK» hvis curl hevder å være Opera
+versjon 12.0. Offentlig elektronisk postjournal startet blokkeringen
+2017-03-02.
+
+
Blokkeringen vil gjøre det litt vanskeligere å maskinelt hente
+informasjon fra oep.no. Kan blokkeringen være gjort for å hindre
+automatisert innsamling av informasjon fra OEP, slik Pressens
+Offentlighetsutvalg gjorde for å dokumentere hvordan departementene
+hindrer innsyn i
+rapporten
+«Slik hindrer departementer innsyn» som ble publiserte i januar
+2017. Det virker usannsynlig, da det jo er trivielt å bytte
+User-Agent til noe nytt.
+
+
Finnes det juridisk grunnlag for det offentlige å diskriminere
+webklienter slik det gjøres her? Der tilgang gis eller ikke alt etter
+hva klienten sier at den heter? Da OEP eies av DIFI og driftes av
+Basefarm, finnes det kanskje noen dokumenter sendt mellom disse to
+aktørene man kan be om innsyn i for å forstå hva som har skjedd. Men
+postjournalen
+til DIFI viser kun to dokumenter det siste året mellom DIFI og
+Basefarm.
+Mimes brønn neste,
+tenker jeg.
@@ -623,77 +616,101 @@ activities, please send Bitcoin donations to my address
-
-
4th January 2017
-
Do you have a large iCalendar
-file with lots of old entries, and would like to archive them to save
-space and resources? At least those of us using KOrganizer know that
-turning on and off an event set become slower and slower the more
-entries are in the set. While working on migrating our calendars to a
-Radicale CalDAV server on our
-Freedombox server, my
-loved one wondered if I could find a way to split up the calendar file
-she had in KOrganizer, and I set out to write a tool. I spent a few
-days writing and polishing the system, and it is now ready for general
-consumption. The
-code for
-ical-archiver is publicly available from a git repository on
-github. The system is written in Python and depend on
-the vobject Python
-module.
-
-
To use it, locate the iCalendar file you want to operate on and
-give it as an argument to the ical-archiver script. This will
-generate a set of new files, one file per component type per year for
-all components expiring more than two years in the past. The vevent,
-vtodo and vjournal entries are handled by the script. The remaining
-entries are stored in a 'remaining' file.
-
-
This is what a test run can look like:
-
-
-% ical-archiver t/2004-2016.ics
-Found 3612 vevents
-Found 6 vtodos
-Found 2 vjournals
-Writing t/2004-2016.ics-subset-vevent-2004.ics
-Writing t/2004-2016.ics-subset-vevent-2005.ics
-Writing t/2004-2016.ics-subset-vevent-2006.ics
-Writing t/2004-2016.ics-subset-vevent-2007.ics
-Writing t/2004-2016.ics-subset-vevent-2008.ics
-Writing t/2004-2016.ics-subset-vevent-2009.ics
-Writing t/2004-2016.ics-subset-vevent-2010.ics
-Writing t/2004-2016.ics-subset-vevent-2011.ics
-Writing t/2004-2016.ics-subset-vevent-2012.ics
-Writing t/2004-2016.ics-subset-vevent-2013.ics
-Writing t/2004-2016.ics-subset-vevent-2014.ics
-Writing t/2004-2016.ics-subset-vjournal-2007.ics
-Writing t/2004-2016.ics-subset-vjournal-2011.ics
-Writing t/2004-2016.ics-subset-vtodo-2012.ics
-Writing t/2004-2016.ics-remaining.ics
-%
-
-
-
As you can see, the original file is untouched and new files are
-written with names derived from the original file. If you are happy
-with their content, the *-remaining.ics file can replace the original
-the the others can be archived or imported as historical calendar
-collections.
-
-
The script should probably be improved a bit. The error handling
-when discovering broken entries is not good, and I am not sure yet if
-it make sense to split different entry types into separate files or
-not. The program is thus likely to change. If you find it
-interesting, please get in touch. :)
-
-
As usual, if you use Bitcoin and want to show your support of my
-activities, please send Bitcoin donations to my address
-15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
+
+
19th March 2017
+
The Nikita
+Noark 5 core project is implementing the Norwegian standard for
+keeping an electronic archive of government documents.
+The
+Noark 5 standard document the requirement for data systems used by
+the archives in the Norwegian government, and the Noark 5 web interface
+specification document a REST web service for storing, searching and
+retrieving documents and metadata in such archive. I've been involved
+in the project since a few weeks before Christmas, when the Norwegian
+Unix User Group
+announced
+it supported the project. I believe this is an important project,
+and hope it can make it possible for the government archives in the
+future to use free software to keep the archives we citizens depend
+on. But as I do not hold such archive myself, personally my first use
+case is to store and analyse public mail journal metadata published
+from the government. I find it useful to have a clear use case in
+mind when developing, to make sure the system scratches one of my
+itches.
+
+
If you would like to help make sure there is a free software
+alternatives for the archives, please join our IRC channel
+(#nikita on
+irc.freenode.net) and
+the
+project mailing list.
+
+
When I got involved, the web service could store metadata about
+documents. But a few weeks ago, a new milestone was reached when it
+became possible to store full text documents too. Yesterday, I
+completed an implementation of a command line tool
+archive-pdf to upload a PDF file to the archive using this
+API. The tool is very simple at the moment, and find existing
+fonds, series and
+files while asking the user to select which one to use if more than
+one exist. Once a file is identified, the PDF is associated with the
+file and uploaded, using the title extracted from the PDF itself. The
+process is fairly similar to visiting the archive, opening a cabinet,
+locating a file and storing a piece of paper in the archive. Here is
+a test run directly after populating the database with test data using
+our API tester:
+
+
+~/src//noark5-tester$ ./archive-pdf mangelmelding/mangler.pdf
+using arkiv: Title of the test fonds created 2017-03-18T23:49:32.103446
+using arkivdel: Title of the test series created 2017-03-18T23:49:32.103446
+
+ 0 - Title of the test case file created 2017-03-18T23:49:32.103446
+ 1 - Title of the test file created 2017-03-18T23:49:32.103446
+Select which mappe you want (or search term): 0
+Uploading mangelmelding/mangler.pdf
+ PDF title: Mangler i spesifikasjonsdokumentet for NOARK 5 Tjenestegrensesnitt
+ File 2017/1: Title of the test case file created 2017-03-18T23:49:32.103446
+~/src//noark5-tester$
+
+
+
You can see here how the fonds (arkiv) and serie (arkivdel) only had
+one option, while the user need to choose which file (mappe) to use
+among the two created by the API tester. The archive-pdf
+tool can be found in the git repository for the API tester.
+
+
In the project, I have been mostly working on
+the API
+tester so far, while getting to know the code base. The API
+tester currently use
+the HATEOAS links
+to traverse the entire exposed service API and verify that the exposed
+operations and objects match the specification, as well as trying to
+create objects holding metadata and uploading a simple XML file to
+store. The tester has proved very useful for finding flaws in our
+implementation, as well as flaws in the reference site and the
+specification.
+
+
The test document I uploaded is a summary of all the specification
+defects we have collected so far while implementing the web service.
+There are several unclear and conflicting parts of the specification,
+and we have
+started
+writing down the questions we get from implementing it. We use a
+format inspired by how The
+Austin Group collect defect reports for the POSIX standard with
+their
+instructions for the MANTIS defect tracker system, in lack of an official way to structure defect reports for Noark 5 (our first submitted defect report was a request for a procedure for submitting defect reports :).
+
+
The Nikita project is implemented using Java and Spring, and is
+fairly easy to get up and running using Docker containers for those
+that want to test the current code base. The API tester is
+implemented in Python.
@@ -715,7 +732,15 @@ activities, please send Bitcoin donations to my address
February (3)
-
March (3)
+
March (5)
+
+
April (2)
+
+
June (5)
+
+
July (1)
+
+
August (1)
@@ -983,27 +1008,27 @@ activities, please send Bitcoin donations to my address
chrpath (2)
-
debian (147)
+
debian (151)
debian edu (158)
-
debian-handbook (3)
+
debian-handbook (4)
digistan (10)
dld (16)
-
docbook (23)
+
docbook (24)
drivstoffpriser (4)
-
english (344)
+
english (351)
fiksgatami (23)
fildeling (12)
-
freeculture (29)
+
freeculture (30)
freedombox (9)
@@ -1031,17 +1056,17 @@ activities, please send Bitcoin donations to my address
nice free software (9)
-
norsk (287)
+
norsk (291)
-
nuug (187)
+
nuug (189)
-
offentlig innsyn (28)
+
offentlig innsyn (33)
open311 (2)
opphavsrett (64)
-
personvern (99)
+
personvern (101)
raid (1)
@@ -1059,21 +1084,21 @@ activities, please send Bitcoin donations to my address
scraperwiki (2)
-
sikkerhet (52)
+
sikkerhet (53)
sitesummary (4)
skepsis (5)
-
standard (51)
+
standard (55)
-
stavekontroll (5)
+
stavekontroll (6)
stortinget (11)
-
surveillance (48)
+
surveillance (49)
-
sysadmin (2)
+
sysadmin (3)
usenix (2)