1 Title: Where did that package go? — geolocated IP traceroute
2 Tags: english, debian, surveillance, web, stortinget, personvern, nuug, kart
5 <p>Did you ever wonder where the web trafic really flow to reach the
6 web servers, and who own the network equipment it is flowing through?
7 It is possible to get a glimpse of this from using traceroute, but it
8 is hard to find all the details. Many years ago, I wrote a system to
9 map the Norwegian Internet (trying to figure out if our plans for a
10 network game service would get low enough latency, and who we needed
11 to talk to about setting up game servers close to the users. Back
12 then I used traceroute output from many locations (I asked my friends
13 to run a script and send me their traceroute output) to create the
14 graph and the map. The output from traceroute typically look like
18 traceroute to www.stortinget.no (85.88.67.10), 30 hops max, 60 byte packets
19 1 uio-gw10.uio.no (129.240.202.1) 0.447 ms 0.486 ms 0.621 ms
20 2 uio-gw8.uio.no (129.240.24.229) 0.467 ms 0.578 ms 0.675 ms
21 3 oslo-gw1.uninett.no (128.39.65.17) 0.385 ms 0.373 ms 0.358 ms
22 4 te3-1-2.br1.fn3.as2116.net (193.156.90.3) 1.174 ms 1.172 ms 1.153 ms
23 5 he16-1-1.cr1.san110.as2116.net (195.0.244.234) 2.627 ms he16-1-1.cr2.oslosda310.as2116.net (195.0.244.48) 3.172 ms he16-1-1.cr1.san110.as2116.net (195.0.244.234) 2.857 ms
24 6 ae1.ar8.oslosda310.as2116.net (195.0.242.39) 0.662 ms 0.637 ms ae0.ar8.oslosda310.as2116.net (195.0.242.23) 0.622 ms
25 7 89.191.10.146 (89.191.10.146) 0.931 ms 0.917 ms 0.955 ms
31 <p>This show the DNS names and IP addresses of (at least some of the)
32 network equipment involved in getting the data traffic from me to the
33 www.stortinget.no server, and how long it took in milliseconds for a
34 package to reach the equipment and return to me. Three packages are
35 sent, and some times the packages do not follow the same path. This
36 is shown for hop 5, where three different IP addresses replied to the
37 traceroute request.</p>
39 <p>There are many ways to measure trace routes. Other good traceroute
40 implementations I use are traceroute (using ICMP packages) mtr (can do
41 both ICMP, UDP and TCP) and scapy (python library with ICMP, UDP, TCP
42 traceroute and a lot of other capabilities). All of them are easily
43 available in <a href="https://www.debian.org/">Debian</a>.</p>
45 <p>This time around, I wanted to know the geographic location of
46 different route points, to visualize how visiting a web page spread
47 information about the visit to a lot of servers around the globe. The
48 background is that a web site today often will ask the browser to get
49 from many servers the parts (for example HTML, JSON, fonts,
50 JavaScript, CSS, video) required to display the content. This will
51 leak information about the visit to those controlling these servers
52 and anyone able to peek at the data traffic passing by (like your ISP,
53 the ISPs backbone provider, FRA, GCHQ, NSA and others).</p>
55 <p>Lets pick an example, the Norwegian parliament web site
56 www.stortinget.no. It is read daily by all members of parliament and
57 their staff, as well as political journalists, activits and many other
58 citizens of Norway. A visit to the www.stortinget.no web site will
59 ask your browser to contact 8 other servers: ajax.googleapis.com,
60 insights.hotjar.com, script.hotjar.com, static.hotjar.com,
61 stats.g.doubleclick.net, www.google-analytics.com,
62 www.googletagmanager.com and www.netigate.se. I extracted this by
63 asking <a href="http://phantomjs.org/">PhantomJS</a> to visit the
64 Stortinget web page and tell me all the URLs PhantomJS downloaded to
65 render the page (in HAR format using
66 <a href="https://github.com/ariya/phantomjs/blob/master/examples/netsniff.js">their
67 netsniff example</a>. I am very grateful to Gorm for showing me how
68 to do this). My goal is to visualize network traces to all IP
69 addresses behind these DNS names, do show where visitors personal
70 information is spread when visiting the page.</p>
72 <p align="center"><a href="www.stortinget.no-geoip.kml"><img
73 src="http://www.hungry.com/~pere/blog/images/2017-01-09-www.stortinget.no-geoip-small.png" alt="map of combined traces for URLs used by www.stortinget.no using GeoIP"/></a></p>
75 <p>When I had a look around for options, I could not find any good
76 free software tools to do this, and decided I needed my own traceroute
77 wrapper outputting KML based on locations looked up using GeoIP. KML
78 is easy to work with and easy to generate, and understood by several
79 of the GIS tools I have available. I got good help from by NUUG
80 colleague Anders Einar with this, and the result can be seen in
81 <a href="https://github.com/petterreinholdtsen/kmltraceroute">my
82 kmltraceroute git repository</a>. Unfortunately, the quality of the
83 free GeoIP databases I could find (and the for-pay databases my
84 friends had access to) is not up to the task. The IP addresses of
85 central Internet infrastructure would typically be placed near the
86 controlling companies main office, and not where the router is really
87 located, as you can see from <a href="www.stortinget.no-geoip.kml">the
88 KML file I created</a> using the GeoLite City dataset from MaxMind.
90 <p align="center"><a href="http://www.hungry.com/~pere/blog/images/2017-01-09-www.stortinget.no-scapy.svg"><img
91 src="http://www.hungry.com/~pere/blog/images/2017-01-09-www.stortinget.no-scapy-small.png" alt="scapy traceroute graph for URLs used by www.stortinget.no"/></a></p>
93 <p>I also had a look at the visual traceroute graph created by
94 <a href="http://www.secdev.org/projects/scapy/">the scrapy project</a>,
95 showing IP network ownership (aka AS owner) for the IP address in
97 <a href="http://www.hungry.com/~pere/blog/images/2017-01-09-www.stortinget.no-scapy.svg">The
98 graph display a lot of useful information about the traceroute in SVG
99 format</a>, and give a good indication on who control the network
100 equipment involved, but it do not include geolocation. This graph
101 make it possible to see the information is made available at least for
102 UNINETT, Catchcom, Stortinget, Nordunet, Google, Amazon, Telia, Level
103 3 Communications and NetDNA.</p>
105 <p align="center"><a href="https://geotraceroute.com/index.php?node=4&host=www.stortinget.no"><img
106 src="http://www.hungry.com/~pere/blog/images/2017-01-09-www.stortinget.no-geotraceroute-small.png" alt="example geotraceroute view for www.stortinget.no"/></a></p>
108 <p>In the process, I came across the
109 <a href="https://geotraceroute.com/">web service GeoTraceroute</a> by
110 Salim Gasmi. Its methology of combining guesses based on DNS names,
111 various location databases and finally use latecy times to rule out
112 candidate locations seemed to do a very good job of guessing correct
113 geolocation. But it could only do one trace at the time, did not have
114 a sensor in Norway and did not make the geolocations easily available
115 for postprocessing. So I contacted the developer and asked if he
116 would be willing to share the code (he refused until he had time to
117 clean it up), but he was interested in providing the geolocations in a
118 machine readable format, and willing to set up a sensor in Norway. So
119 since yesterday, it is possible to run traces from Norway in this
120 service thanks to a sensor node set up by
121 <a href="https://www.nuug.no/">the NUUG assosiation</a>, and get the
122 trace in KML format for further processing.</p>
124 <p align="center"><a href="http://www.hungry.com/~pere/blog/images/2017-01-09-www.stortinget.no-geotraceroute-kml-join.kml"><img
125 src="http://www.hungry.com/~pere/blog/images/2017-01-09-www.stortinget.no-geotraceroute-kml-join.png" alt="map of combined traces for URLs used by www.stortinget.no using geotraceroute"/></a></p>
127 <p>Here we can see a lot of trafic passes Sweden on its way to
128 Denmark, Germany, Holland and Ireland. Plenty of places where the
129 Snowden confirmations verified the traffic is read by various actors
130 without your best interest as their top priority.</p>
132 <p>Combining KML files is trivial using a text editor, so I could loop
133 over all the hosts behind the urls imported by www.stortinget.no and
134 ask for the KML file from GeoTraceroute, and create a combined KML
135 file with all the traces (unfortunately only one of the IP addresses
136 behind the DNS name is traced this time. To get them all, one would
137 have to request traces using IP number instead of DNS names from
138 GeoTraceroute). That might be the next step in this project.</p>
140 <p>Armed with these tools, I find it a lot easier to figure out where
141 the IP traffic moves and who control the boxes involved in moving it.
142 And every time the link crosses for example the Swedish border, we can
143 be sure Swedish Signal Intelligence (FRA) is listening, as GCHQ do in
144 Britain and NSA in USA and cables around the globe. (Hm, what should
145 we tell them? :) Keep that in mind if you ever send anything
146 unencrypted over the Internet.</p>
148 <p>PS: KML files are drawn using
149 <a href="http://ivanrublev.me/kml/">the KML viewer from Ivan
150 Rublev<a/>, as it was less cluttered than the local Linux application
151 Marble. There are heaps of other options too.</p>
153 <p>As usual, if you use Bitcoin and want to show your support of my
154 activities, please send Bitcoin donations to my address
155 <b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>