blog/archive/2017/01/01.rss

   1 <?xml version="1.0" encoding="ISO-8859-1"?>
   2 <rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/'>
   3         <channel>
   4                 <title>Petter Reinholdtsen - Entries from January 2017</title>
   5                 <description>Entries from January 2017</description>
   6                 <link>http://people.skolelinux.org/pere/blog/</link>
   7
   8
   9         <item>
  10                 <title>Where did that package go? &amp;mdash; geolocated IP traceroute</title>
  11                 <link>http://people.skolelinux.org/pere/blog/Where_did_that_package_go___mdash__geolocated_IP_traceroute.html</link>
  12                 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Where_did_that_package_go___mdash__geolocated_IP_traceroute.html</guid>
  13                 <pubDate>Mon, 9 Jan 2017 12:20:00 +0100</pubDate>
  14                 <description>&lt;p&gt;Did you ever wonder where the web trafic really flow to reach the
  15 web servers, and who own the network equipment it is flowing through?
  16 It is possible to get a glimpse of this from using traceroute, but it
  17 is hard to find all the details.  Many years ago, I wrote a system to
  18 map the Norwegian Internet (trying to figure out if our plans for a
  19 network game service would get low enough latency, and who we needed
  20 to talk to about setting up game servers close to the users.  Back
  21 then I used traceroute output from many locations (I asked my friends
  22 to run a script and send me their traceroute output) to create the
  23 graph and the map.  The output from traceroute typically look like
  24 this:
  25
  26 &lt;p&gt;&lt;pre&gt;
  27 traceroute to www.stortinget.no (85.88.67.10), 30 hops max, 60 byte packets
  28  1  uio-gw10.uio.no (129.240.202.1)  0.447 ms  0.486 ms  0.621 ms
  29  2  uio-gw8.uio.no (129.240.24.229)  0.467 ms  0.578 ms  0.675 ms
  30  3  oslo-gw1.uninett.no (128.39.65.17)  0.385 ms  0.373 ms  0.358 ms
  31  4  te3-1-2.br1.fn3.as2116.net (193.156.90.3)  1.174 ms  1.172 ms  1.153 ms
  32  5  he16-1-1.cr1.san110.as2116.net (195.0.244.234)  2.627 ms he16-1-1.cr2.oslosda310.as2116.net (195.0.244.48)  3.172 ms he16-1-1.cr1.san110.as2116.net (195.0.244.234)  2.857 ms
  33  6  ae1.ar8.oslosda310.as2116.net (195.0.242.39)  0.662 ms  0.637 ms ae0.ar8.oslosda310.as2116.net (195.0.242.23)  0.622 ms
  34  7  89.191.10.146 (89.191.10.146)  0.931 ms  0.917 ms  0.955 ms
  35  8  * * *
  36  9  * * *
  37 [...]
  38 &lt;/pre&gt;&lt;/p&gt;
  39
  40 &lt;p&gt;This show the DNS names and IP addresses of (at least some of the)
  41 network equipment involved in getting the data traffic from me to the
  42 www.stortinget.no server, and how long it took in milliseconds for a
  43 package to reach the equipment and return to me.  Three packages are
  44 sent, and some times the packages do not follow the same path.  This
  45 is shown for hop 5, where three different IP addresses replied to the
  46 traceroute request.&lt;/p&gt;
  47
  48 &lt;p&gt;There are many ways to measure trace routes.  Other good traceroute
  49 implementations I use are traceroute (using ICMP packages) mtr (can do
  50 both ICMP, UDP and TCP) and scapy (python library with ICMP, UDP, TCP
  51 traceroute and a lot of other capabilities).  All of them are easily
  52 available in &lt;a href=&quot;https://www.debian.org/&quot;&gt;Debian&lt;/a&gt;.&lt;/p&gt;
  53
  54 &lt;p&gt;This time around, I wanted to know the geographic location of
  55 different route points, to visualize how visiting a web page spread
  56 information about the visit to a lot of servers around the globe.  The
  57 background is that a web site today often will ask the browser to get
  58 from many servers the parts (for example HTML, JSON, fonts,
  59 JavaScript, CSS, video) required to display the content.  This will
  60 leak information about the visit to those controlling these servers
  61 and anyone able to peek at the data traffic passing by (like your ISP,
  62 the ISPs backbone provider, FRA, GCHQ, NSA and others).&lt;/p&gt;
  63
  64 &lt;p&gt;Lets pick an example, the Norwegian parliament web site
  65 www.stortinget.no.  It is read daily by all members of parliament and
  66 their staff, as well as political journalists, activits and many other
  67 citizens of Norway.  A visit to the www.stortinget.no web site will
  68 ask your browser to contact 8 other servers: ajax.googleapis.com,
  69 insights.hotjar.com, script.hotjar.com, static.hotjar.com,
  70 stats.g.doubleclick.net, www.google-analytics.com,
  71 www.googletagmanager.com and www.netigate.se.  I extracted this by
  72 asking &lt;a href=&quot;http://phantomjs.org/&quot;&gt;PhantomJS&lt;/a&gt; to visit the
  73 Stortinget web page and tell me all the URLs PhantomJS downloaded to
  74 render the page (in HAR format using
  75 &lt;a href=&quot;https://github.com/ariya/phantomjs/blob/master/examples/netsniff.js&quot;&gt;their
  76 netsniff example&lt;/a&gt;.  I am very grateful to Gorm for showing me how
  77 to do this).  My goal is to visualize network traces to all IP
  78 addresses behind these DNS names, do show where visitors personal
  79 information is spread when visiting the page.&lt;/p&gt;
  80
  81 &lt;p align=&quot;center&quot;&gt;&lt;a href=&quot;www.stortinget.no-geoip.kml&quot;&gt;&lt;img
  82 src=&quot;http://people.skolelinux.org/pere/blog/images/2017-01-09-www.stortinget.no-geoip-small.png&quot; alt=&quot;map of combined traces for URLs used by www.stortinget.no using GeoIP&quot;/&gt;&lt;/a&gt;&lt;/p&gt;
  83
  84 &lt;p&gt;When I had a look around for options, I could not find any good
  85 free software tools to do this, and decided I needed my own traceroute
  86 wrapper outputting KML based on locations looked up using GeoIP.  KML
  87 is easy to work with and easy to generate, and understood by several
  88 of the GIS tools I have available.  I got good help from by NUUG
  89 colleague Anders Einar with this, and the result can be seen in
  90 &lt;a href=&quot;https://github.com/petterreinholdtsen/kmltraceroute&quot;&gt;my
  91 kmltraceroute git repository&lt;/a&gt;.  Unfortunately, the quality of the
  92 free GeoIP databases I could find (and the for-pay databases my
  93 friends had access to) is not up to the task.  The IP addresses of
  94 central Internet infrastructure would typically be placed near the
  95 controlling companies main office, and not where the router is really
  96 located, as you can see from &lt;a href=&quot;www.stortinget.no-geoip.kml&quot;&gt;the
  97 KML file I created&lt;/a&gt; using the GeoLite City dataset from MaxMind.
  98
  99 &lt;p align=&quot;center&quot;&gt;&lt;a href=&quot;http://people.skolelinux.org/pere/blog/images/2017-01-09-www.stortinget.no-scapy.svg&quot;&gt;&lt;img
 100 src=&quot;http://people.skolelinux.org/pere/blog/images/2017-01-09-www.stortinget.no-scapy-small.png&quot; alt=&quot;scapy traceroute graph for URLs used by www.stortinget.no&quot;/&gt;&lt;/a&gt;&lt;/p&gt;
 101
 102 &lt;p&gt;I also had a look at the visual traceroute graph created by
 103 &lt;a href=&quot;http://www.secdev.org/projects/scapy/&quot;&gt;the scrapy project&lt;/a&gt;,
 104 showing IP network ownership (aka AS owner) for the IP address in
 105 question.
 106 &lt;a href=&quot;http://people.skolelinux.org/pere/blog/images/2017-01-09-www.stortinget.no-scapy.svg&quot;&gt;The
 107 graph display a lot of useful information about the traceroute in SVG
 108 format&lt;/a&gt;, and give a good indication on who control the network
 109 equipment involved, but it do not include geolocation.  This graph
 110 make it possible to see the information is made available at least for
 111 UNINETT, Catchcom, Stortinget, Nordunet, Google, Amazon, Telia, Level
 112 3 Communications and NetDNA.&lt;/p&gt;
 113
 114 &lt;p align=&quot;center&quot;&gt;&lt;a href=&quot;https://geotraceroute.com/index.php?node=4&amp;host=www.stortinget.no&quot;&gt;&lt;img
 115 src=&quot;http://people.skolelinux.org/pere/blog/images/2017-01-09-www.stortinget.no-geotraceroute-small.png&quot; alt=&quot;example geotraceroute view for www.stortinget.no&quot;/&gt;&lt;/a&gt;&lt;/p&gt;
 116
 117 &lt;p&gt;In the process, I came across the
 118 &lt;a href=&quot;https://geotraceroute.com/&quot;&gt;web service GeoTraceRoute&lt;/a&gt; by
 119 Salim Gasmi.  Its methology of combining guesses based on DNS names,
 120 various location databases and finally use latecy times to rule out
 121 candidate locations seemed to do a very good job of guessing correct
 122 geolocation.  But it could only do one trace at the time, did not have
 123 a sensor in Norway and did not make the geolocations easily available
 124 for postprocessing.  So I contacted the developer and asked if he
 125 would be willing to share the code (he refused until he had time to
 126 clean it up), but he was interested in providing the geolocations in a
 127 machine readable format, and willing to set up a sensor in Norway.  So
 128 since yesterday, it is possible to run traces from Norway in this
 129 service thanks to a sensor node set up by
 130 &lt;a href=&quot;https://www.nuug.no/&quot;&gt;the NUUG assosiation&lt;/a&gt;, and get the
 131 trace in KML format for further processing.&lt;/p&gt;
 132
 133 &lt;p align=&quot;center&quot;&gt;&lt;a href=&quot;http://people.skolelinux.org/pere/blog/images/2017-01-09-www.stortinget.no-geotraceroute-kml-join.kml&quot;&gt;&lt;img
 134 src=&quot;http://people.skolelinux.org/pere/blog/images/2017-01-09-www.stortinget.no-geotraceroute-kml-join.png&quot; alt=&quot;map of combined traces for URLs used by www.stortinget.no using geotraceroute&quot;/&gt;&lt;/a&gt;&lt;/p&gt;
 135
 136 &lt;p&gt;Here we can see a lot of trafic passes Sweden on its way to
 137 Denmark, Germany, Holland and Ireland.  Plenty of places where the
 138 Snowden confirmations verified the traffic is read by various actors
 139 without your best interest as their top priority.&lt;/p&gt;
 140
 141 &lt;p&gt;Combining KML files is trivial using a text editor, so I could loop
 142 over all the hosts behind the urls imported by www.stortinget.no and
 143 ask for the KML file from geotraceroute, and create a combined KML
 144 file with all the traces (unfortunately only one of the IP addresses
 145 behind the DNS name is traced this time.  To get them all, one would
 146 have to request traces using IP number instead of DNS names from
 147 geotraceroute).  That might be the next step in this project.&lt;/p&gt;
 148
 149 &lt;p&gt;Armed with these tools, I find it a lot easier to figure out where
 150 the IP traffic moves and who control the boxes involved in moving it.
 151 And every time the link crosses for example the Swedish border, we can
 152 be sure Swedish Signal Intelligence (FRA) is listening, as GCHQ do in
 153 Britain and NSA in USA and cables around the globe.  (Hm, what should
 154 we tell them? :) Keep that in mind if you ever send anything
 155 unencrypted over the Internet.&lt;/p&gt;
 156
 157 &lt;p&gt;PS: KML files are drawn using
 158 &lt;a href=&quot;http://ivanrublev.me/kml/&quot;&gt;the KML viewer from Ivan
 159 Rublev&lt;a/&gt;, as it was less cluttered than the local Linux application
 160 Marble.  There are heaps of other options too.&lt;/p&gt;
 161 </description>
 162         </item>
 163
 164         <item>
 165                 <title>Introducing ical-archiver to split out old iCalendar entries</title>
 166                 <link>http://people.skolelinux.org/pere/blog/Introducing_ical_archiver_to_split_out_old_iCalendar_entries.html</link>
 167                 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Introducing_ical_archiver_to_split_out_old_iCalendar_entries.html</guid>
 168                 <pubDate>Wed, 4 Jan 2017 12:20:00 +0100</pubDate>
 169                 <description>&lt;p&gt;Do you have a large &lt;a href=&quot;https://icalendar.org/&quot;&gt;iCalendar&lt;/a&gt;
 170 file with lots of old entries, and would like to archive them to save
 171 space and resources?  At least those of us using KOrganizer know that
 172 turning on and off an event set become slower and slower the more
 173 entries are in the set.  While working on migrating our calendars to a
 174 &lt;a href=&quot;http://radicale.org/&quot;&gt;Radicale CalDAV server&lt;/a&gt; on our
 175 &lt;a href=&quot;https://freedomboxfoundation.org/&quot;&gt;Freedombox server&lt;/a/&gt;, my
 176 loved one wondered if I could find a way to split up the calendar file
 177 she had in KOrganizer, and I set out to write a tool.  I spent a few
 178 days writing and polishing the system, and it is now ready for general
 179 consumption.  The
 180 &lt;a href=&quot;https://github.com/petterreinholdtsen/ical-archiver&quot;&gt;code for
 181 ical-archiver&lt;/a&gt; is publicly available from a git repository on
 182 github.  The system is written in Python and depend on
 183 &lt;a href=&quot;http://eventable.github.io/vobject/&quot;&gt;the vobject Python
 184 module&lt;/a&gt;.&lt;/p&gt;
 185
 186 &lt;p&gt;To use it, locate the iCalendar file you want to operate on and
 187 give it as an argument to the ical-archiver script.  This will
 188 generate a set of new files, one file per component type per year for
 189 all components expiring more than two years in the past.  The vevent,
 190 vtodo and vjournal entries are handled by the script.  The remaining
 191 entries are stored in a &#39;remaining&#39; file.&lt;/p&gt;
 192
 193 &lt;p&gt;This is what a test run can look like:
 194
 195 &lt;p&gt;&lt;pre&gt;
 196 % ical-archiver t/2004-2016.ics
 197 Found 3612 vevents
 198 Found 6 vtodos
 199 Found 2 vjournals
 200 Writing t/2004-2016.ics-subset-vevent-2004.ics
 201 Writing t/2004-2016.ics-subset-vevent-2005.ics
 202 Writing t/2004-2016.ics-subset-vevent-2006.ics
 203 Writing t/2004-2016.ics-subset-vevent-2007.ics
 204 Writing t/2004-2016.ics-subset-vevent-2008.ics
 205 Writing t/2004-2016.ics-subset-vevent-2009.ics
 206 Writing t/2004-2016.ics-subset-vevent-2010.ics
 207 Writing t/2004-2016.ics-subset-vevent-2011.ics
 208 Writing t/2004-2016.ics-subset-vevent-2012.ics
 209 Writing t/2004-2016.ics-subset-vevent-2013.ics
 210 Writing t/2004-2016.ics-subset-vevent-2014.ics
 211 Writing t/2004-2016.ics-subset-vjournal-2007.ics
 212 Writing t/2004-2016.ics-subset-vjournal-2011.ics
 213 Writing t/2004-2016.ics-subset-vtodo-2012.ics
 214 Writing t/2004-2016.ics-remaining.ics
 215 %
 216 &lt;/pre&gt;&lt;/p&gt;
 217
 218 &lt;p&gt;As you can see, the original file is untouched and new files are
 219 written with names derived from the original file.  If you are happy
 220 with their content, the *-remaining.ics file can replace the original
 221 the the others can be archived or imported as historical calendar
 222 collections.&lt;/p&gt;
 223
 224 &lt;p&gt;The script should probably be improved a bit.  The error handling
 225 when discovering broken entries is not good, and I am not sure yet if
 226 it make sense to split different entry types into separate files or
 227 not.  The program is thus likely to change.  If you find it
 228 interesting, please get in touch. :)&lt;/p&gt;
 229
 230 &lt;p&gt;As usual, if you use Bitcoin and want to show your support of my
 231 activities, please send Bitcoin donations to my address
 232 &lt;b&gt;&lt;a href=&quot;bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&amp;label=PetterReinholdtsenBlog&quot;&gt;15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&lt;/a&gt;&lt;/b&gt;.&lt;/p&gt;
 233 </description>
 234         </item>
 235
 236         </channel>
 237 </rss>