X-Git-Url: http://pere.pagekite.me/gitweb/homepage.git/blobdiff_plain/b0e0987d30dc410df79d1033c314e2dbbf4fb26f..cd775096bf095558010073a71ca149dbfe487c3f:/blog/index.html diff --git a/blog/index.html b/blog/index.html index 863f4aeab5..032f861b4b 100644 --- a/blog/index.html +++ b/blog/index.html @@ -20,40 +20,85 @@
-
How does it feel to be wiretapped, when you should be doing the wiretapping...
-
8th March 2017
-

So the new president in the United States of America claim to be -surprised to discover that he was wiretapped during the election -before he was elected president. He even claim this must be illegal. -Well, doh, if it is one thing the confirmations from Snowden -documented, it is that the entire population in USA is wiretapped, one -way or another. Of course the president candidates were wiretapped, -alongside the senators, judges and the rest of the people in USA.

- -

Next, the Federal Bureau of Investigation ask the Department of -Justice to go public rejecting the claims that Donald Trump was -wiretapped illegally. I fail to see the relevance, given that I am -sure the surveillance industry in USA according to themselves believe -they have all the legal backing they need to conduct mass surveillance -on the entire world.

- -

There is even the director of the FBI stating that he never saw an -order requesting wiretapping of Donald Trump. That is not very -surprising, given how the FISA court work, with all its activity being -secret. Perhaps he only heard about it?

- -

What I find most sad in this story is how Norwegian journalists -present it. In a news reports the other day in the radio from the -Norwegian National broadcasting Company (NRK), I heard the journalist -claim that 'the FBI denies any wiretapping', while the reality is that -'the FBI denies any illegal wiretapping'. There is a fundamental and -important difference, and it make me sad that the journalists are -unable to grasp it.

+ +
13th December 2017
+

While looking at +the scanned copies +for the copyright renewal entries for movies published in the USA, +an idea occurred to me. The number of renewals are so few per year, it +should be fairly quick to transcribe them all and add references to +the corresponding IMDB title ID. This would give the (presumably) +complete list of movies published 28 years earlier that did _not_ +enter the public domain for the transcribed year. By fetching the +list of USA movies published 28 years earlier and subtract the movies +with renewals, we should be left with movies registered in IMDB that +are now in the public domain. For the year 1955 (which is the one I +have looked at the most), the total number of pages to transcribe is +21. For the 28 years from 1950 to 1978, it should be in the range +500-600 pages. It is just a few days of work, and spread among a +small group of people it should be doable in a few weeks of spare +time.

+ +

A typical copyright renewal entry look like this (the first one +listed for 1955):

+ +

+ ADAM AND EVIL, a photoplay in seven reels by Metro-Goldwyn-Mayer + Distribution Corp. (c) 17Aug27; L24293. Loew's Incorporated (PWH); + 10Jun55; R151558. +

+ +

The movie title as well as registration and renewal dates are easy +enough to locate by a program (split on first comma and look for +DDmmmYY). The rest of the text is not required to find the movie in +IMDB, but is useful to confirm the correct movie is found. I am not +quite sure what the L and R numbers mean, but suspect they are +reference numbers into the archive of the US Copyright Office.

+ +

Tracking down the equivalent IMDB title ID is probably going to be +a manual task, but given the year it is fairly easy to search for the +movie title using for example +http://www.imdb.com/find?q=adam+and+evil+1927&s=all. +Using this search, I find that the equivalent IMDB title ID for the +first renewal entry from 1955 is +http://www.imdb.com/title/tt0017588/.

+ +

I suspect the best way to do this would be to make a specialised +web service to make it easy for contributors to transcribe and track +down IMDB title IDs. In the web service, once a entry is transcribed, +the title and year could be extracted from the text, a search in IMDB +conducted for the user to pick the equivalent IMDB title ID right +away. By spreading out the work among volunteers, it would also be +possible to make at least two persons transcribe the same entries to +be able to discover any typos introduced. But I will need help to +make this happen, as I lack the spare time to do all of this on my +own. If you would like to help, please get in touch. Perhaps you can +draft a web service for crowd sourcing the task?

+ +

Note, Project Gutenberg already have some +transcribed +copies of the US Copyright Office renewal protocols, but I have +not been able to find any film renewals there, so I suspect they only +have copies of renewal for written works. I have not been able to find +any transcribed versions of movie renewals so far. Perhaps they exist +somewhere?

+ +

I would love to figure out methods for finding all the public +domain works in other countries too, but it is a lot harder. At least +for Norway and Great Britain, such work involve tracking down the +people involved in making the movie and figuring out when they died. +It is hard enough to figure out who was part of making a movie, but I +do not know how to automate such procedure without a registry of every +person involved in making movies and their death year.

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

@@ -61,33 +106,49 @@ unable to grasp it.

- -
3rd March 2017
-

For almost a year now, we have been working on making a Norwegian -Bokmål edition of The Debian -Administrator's Handbook. Now, thanks to the tireless effort of -Ole-Erik, Ingrid and Andreas, the initial translation is complete, and -we are working on the proof reading to ensure consistent language and -use of correct computer science terms. The plan is to make the book -available on paper, as well as in electronic form. For that to -happen, the proof reading must be completed and all the figures need -to be translated. If you want to help out, get in touch.

- -

A - -fresh PDF edition in A4 format (the final book will have smaller -pages) of the book created every morning is available for -proofreading. If you find any errors, please -visit -Weblate and correct the error. The -state -of the translation including figures is a useful source for those -provide Norwegian bokmål screen shots and figures.

+ +
5th December 2017
+

Three years ago, a presumed lost animation film, +Empty Socks from +1927, was discovered in the Norwegian National Library. At the +time it was discovered, it was generally assumed to be copyrighted by +The Walt Disney Company, and I blogged about +my +reasoning to conclude that it would would enter the Norwegian +equivalent of the public domain in 2053, based on my understanding of +Norwegian Copyright Law. But a few days ago, I came across +a +blog post claiming the movie was already in the public domain, at +least in USA. The reasoning is as follows: The film was released in +November or Desember 1927 (sources disagree), and presumably +registered its copyright that year. At that time, right holders of +movies registered by the copyright office received government +protection for there work for 28 years. After 28 years, the copyright +had to be renewed if the wanted the government to protect it further. +The blog post I found claim such renewal did not happen for this +movie, and thus it entered the public domain in 1956. Yet someone +claim the copyright was renewed and the movie is still copyright +protected. Can anyone help me to figure out which claim is correct? +I have not been able to find Empty Socks in Catalog of copyright +entries. Ser.3 pt.12-13 v.9-12 1955-1958 Motion Pictures +available +from the University of Pennsylvania, neither in +page +45 for the first half of 1955, nor in +page +119 for the second half of 1955. It is of course possible that +the renewal entry was left out of the printed catalog by mistake. Is +there some way to rule out this possibility? Please help, and update +the wikipedia page with your findings. + +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

@@ -95,77 +156,124 @@ provide Norwegian bokmål screen shots and figures.

- -
1st March 2017
-

A few days ago I ordered a small batch of -the ChaosKey, a small -USB dongle for generating entropy created by Bdale Garbee and Keith -Packard. Yesterday it arrived, and I am very happy to report that it -work great! According to its designers, to get it to work out of the -box, you need the Linux kernel version 4.1 or later. I tested on a -Debian Stretch machine (kernel version 4.9), and there it worked just -fine, increasing the available entropy very quickly. I wrote a small -test oneliner to test. It first print the current entropy level, -drain /dev/random, and then print the entropy level for five seconds. -Here is the situation without the ChaosKey inserted:

- -
-% cat /proc/sys/kernel/random/entropy_avail; \
-  dd bs=1M if=/dev/random of=/dev/null count=1; \
-  for n in $(seq 1 5); do \
-     cat /proc/sys/kernel/random/entropy_avail; \
-     sleep 1; \
-  done
-300
-0+1 oppføringer inn
-0+1 oppføringer ut
-28 byte kopiert, 0,000264565 s, 106 kB/s
-4
-8
-12
-17
-21
-%
-
- -

The entropy level increases by 3-4 every second. In such case any -application requiring random bits (like a HTTPS enabled web server) -will halt and wait for more entrpy. And here is the situation with -the ChaosKey inserted:

- -
-% cat /proc/sys/kernel/random/entropy_avail; \
-  dd bs=1M if=/dev/random of=/dev/null count=1; \
-  for n in $(seq 1 5); do \
-     cat /proc/sys/kernel/random/entropy_avail; \
-     sleep 1; \
-  done
-1079
-0+1 oppføringer inn
-0+1 oppføringer ut
-104 byte kopiert, 0,000487647 s, 213 kB/s
-433
-1028
-1031
-1035
-1038
-%
-
- -

Quite the difference. :) I bought a few more than I need, in case -someone want to buy one here in Norway. :)

- -

Update: The dongle was presented at Debconf last year. You might -find the talk -recording illuminating. It explains exactly what the source of -randomness is, if you are unable to spot it from the schema drawing -available from the ChaosKey web site linked at the start of this blog -post.

+ +
28th November 2017
+

It would be easier to locate the movie you want to watch in +the Internet Archive, if the +metadata about each movie was more complete and accurate. In the +archiving community, a well known saying state that good metadata is a +love letter to the future. The metadata in the Internet Archive could +use a face lift for the future to love us back. Here is a proposal +for a small improvement that would make the metadata more useful +today. I've been unable to find any document describing the various +standard fields available when uploading videos to the archive, so +this proposal is based on my best quess and searching through several +of the existing movies.

+ +

I have a few use cases in mind. First of all, I would like to be +able to count the number of distinct movies in the Internet Archive, +without duplicates. I would further like to identify the IMDB title +ID of the movies in the Internet Archive, to be able to look up a IMDB +title ID and know if I can fetch the video from there and share it +with my friends.

+ +

Second, I would like the Butter data provider for The Internet +archive +(available +from github), to list as many of the good movies as possible. The +plugin currently do a search in the archive with the following +parameters:

+ +

+collection:moviesandfilms
+AND NOT collection:movie_trailers
+AND -mediatype:collection
+AND format:"Archive BitTorrent"
+AND year
+

+ +

Most of the cool movies that fail to show up in Butter do so +because the 'year' field is missing. The 'year' field is populated by +the year part from the 'date' field, and should be when the movie was +released (date or year). Two such examples are +Ben Hur +from 1905 and +Caminandes +2: Gran Dillama from 2013, where the year metadata field is +missing.

+ +So, my proposal is simply, for every movie in The Internet Archive +where an IMDB title ID exist, please fill in these metadata fields +(note, they can be updated also long after the video was uploaded, but +as far as I can tell, only by the uploader): + +
+ +
mediatype
+
Should be 'movie' for movies.
+ +
collection
+
Should contain 'moviesandfilms'.
+ +
title
+
The title of the movie, without the publication year.
+ +
date
+
The data or year the movie was released. This make the movie show +up in Butter, as well as make it possible to know the age of the +movie and is useful to figure out copyright status.
+ +
director
+
The director of the movie. This make it easier to know if the +correct movie is found in movie databases.
+ +
publisher
+
The production company making the movie. Also useful for +identifying the correct movie.
+ +
links
+ +
Add a link to the IMDB title page, for example like this: <a +href="http://www.imdb.com/title/tt0028496/">Movie in +IMDB</a>. This make it easier to find duplicates and allow for +counting of number of unique movies in the Archive. Other external +references, like to TMDB, could be added like this too.
+ +
+ +

I did consider proposing a Custom field for the IMDB title ID (for +example 'imdb_title_url', 'imdb_code' or simply 'imdb', but suspect it +will be easier to simply place it in the links free text field.

+ +

I created +a +list of IMDB title IDs for several thousand movies in the Internet +Archive, but I also got a list of several thousand movies without +such IMDB title ID (and quite a few duplicates). It would be great if +this data set could be integrated into the Internet Archive metadata +to be available for everyone in the future, but with the current +policy of leaving metadata editing to the uploaders, it will take a +while before this happen. If you have uploaded movies into the +Internet Archive, you can help. Please consider following my proposal +above for your movies, to ensure that movie is properly +counted. :)

+ +

The list is mostly generated using wikidata, which based on +Wikipedia articles make it possible to link between IMDB and movies in +the Internet Archive. But there are lots of movies without a +Wikipedia article, and some movies where only a collection page exist +(like for the +Caminandes example above, where there are three movies but only +one Wikidata entry).

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

@@ -173,34 +281,76 @@ post.

- -
21st February 2017
-

I just noticed -the -new Norwegian proposal for archiving rules in the goverment list -ECMA-376 -/ ISO/IEC 29500 (aka OOXML) as valid formats to put in long term -storage. Luckily such files will only be accepted based on -pre-approval from the National Archive. Allowing OOXML files to be -used for long term storage might seem like a good idea as long as we -forget that there are plenty of ways for a "valid" OOXML document to -have content with no defined interpretation in the standard, which -lead to a question and an idea.

- -

Is there any tool to detect if a OOXML document depend on such -undefined behaviour? It would be useful for the National Archive (and -anyone else interested in verifying that a document is well defined) -to have such tool available when considering to approve the use of -OOXML. I'm aware of the -officeotron OOXML -validator, but do not know how complete it is nor if it will -report use of undefined behaviour. Are there other similar tools -available? Please send me an email if you know of any such tool.

+ +
18th November 2017
+

A month ago, I blogged about my work to +automatically +check the copyright status of IMDB entries, and try to count the +number of movies listed in IMDB that is legal to distribute on the +Internet. I have continued to look for good data sources, and +identified a few more. The code used to extract information from +various data sources is available in +a +git repository, currently available from github.

+ +

So far I have identified 3186 unique IMDB title IDs. To gain +better understanding of the structure of the data set, I created a +histogram of the year associated with each movie (typically release +year). It is interesting to notice where the peaks and dips in the +graph are located. I wonder why they are placed there. I suspect +World War II caused the dip around 1940, but what caused the peak +around 2010?

+ +

+ +

I've so far identified ten sources for IMDB title IDs for movies in +the public domain or with a free license. This is the statistics +reported when running 'make stats' in the git repository:

+ +
+  249 entries (    6 unique) with and   288 without IMDB title ID in free-movies-archive-org-butter.json
+ 2301 entries (  540 unique) with and     0 without IMDB title ID in free-movies-archive-org-wikidata.json
+  830 entries (   29 unique) with and     0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json
+ 2109 entries (  377 unique) with and     0 without IMDB title ID in free-movies-imdb-pd.json
+  291 entries (  122 unique) with and     0 without IMDB title ID in free-movies-letterboxd-pd.json
+  144 entries (  135 unique) with and     0 without IMDB title ID in free-movies-manual.json
+  350 entries (    1 unique) with and   801 without IMDB title ID in free-movies-publicdomainmovies.json
+    4 entries (    0 unique) with and   124 without IMDB title ID in free-movies-publicdomainreview.json
+  698 entries (  119 unique) with and   118 without IMDB title ID in free-movies-publicdomaintorrents.json
+    8 entries (    8 unique) with and   196 without IMDB title ID in free-movies-vodo.json
+ 3186 unique IMDB title IDs in total
+
+ +

The entries without IMDB title ID are candidates to increase the +data set, but might equally well be duplicates of entries already +listed with IMDB title ID in one of the other sources, or represent +movies that lack a IMDB title ID. I've seen examples of all these +situations when peeking at the entries without IMDB title ID. Based +on these data sources, the lower bound for movies listed in IMDB that +are legal to distribute on the Internet is between 3186 and 4713. + +

It would be great for improving the accuracy of this measurement, +if the various sources added IMDB title ID to their metadata. I have +tried to reach the people behind the various sources to ask if they +are interested in doing this, without any replies so far. Perhaps you +can help me get in touch with the people behind VODO, Public Domain +Torrents, Public Domain Movies and Public Domain Review to try to +convince them to add more metadata to their movie entries?

+ +

Another way you could help is by adding pages to Wikipedia about +movies that are legal to distribute on the Internet. If such page +exist and include a link to both IMDB and The Internet Archive, the +script used to generate free-movies-archive-org-wikidata.json should +pick up the mapping as soon as wikidata is updates.

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

@@ -208,33 +358,87 @@ available? Please send me an email if you know of any such tool.

- -
13th February 2017
-

A few days ago, we received the ruling from -my -day in court. The case in question is a challenge of the seizure -of the DNS domain popcorn-time.no. The ruling simply did not mention -most of our arguments, and seemed to take everything ØKOKRIM said at -face value, ignoring our demonstration and explanations. But it is -hard to tell for sure, as we still have not seen most of the documents -in the case and thus were unprepared and unable to contradict several -of the claims made in court by the opposition. We are considering an -appeal, but it is partly a question of funding, as it is costing us -quite a bit to pay for our lawyer. If you want to help, please -donate to the -NUUG defense fund.

- -

The details of the case, as far as we know it, is available in -Norwegian from -the NUUG -blog. This also include -the -ruling itself.

+ +
1st November 2017
+

If you care about how fault tolerant your storage is, you might +find these articles and papers interesting. They have formed how I +think of when designing a storage system.

+ + + +

Several of these research papers are based on data collected from +hundred thousands or millions of disk, and their findings are eye +opening. The short story is simply do not implicitly trust RAID or +redundant storage systems. Details matter. And unfortunately there +are few options on Linux addressing all the identified issues. Both +ZFS and Btrfs are doing a fairly good job, but have legal and +practical issues on their own. I wonder how cluster file systems like +Ceph do in this regard. After all, there is an old saying, you know +you have a distributed system when the crash of a computer you have +never heard of stops you from getting any work done. The same holds +true if fault tolerance do not work.

+ +

Just remember, in the end, it do not matter how redundant, or how +fault tolerant your storage is, if you do not continuously monitor its +status to detect and replace failed disks.

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

@@ -242,86 +446,49 @@ ruling itself.

- -
3rd February 2017
-

- -

On Wednesday, I spent the entire day in court in Follo Tingrett -representing the member association -NUUG, alongside the member -association EFN and the DNS registrar -IMC, challenging the seizure of the DNS name popcorn-time.no. It -was interesting to sit in a court of law for the first time in my -life. Our team can be seen in the picture above: attorney Ola -Tellesbø, EFN board member Tom Fredrik Blenning, IMC CEO Morten Emil -Eriksen and NUUG board member Petter Reinholdtsen.

- -

The -case at hand is that the Norwegian National Authority for -Investigation and Prosecution of Economic and Environmental Crime (aka -Økokrim) decided on their own, to seize a DNS domain early last -year, without following -the -official policy of the Norwegian DNS authority which require a -court decision. The web site in question was a site covering Popcorn -Time. And Popcorn Time is the name of a technology with both legal -and illegal applications. Popcorn Time is a client combining -searching a Bittorrent directory available on the Internet with -downloading/distribute content via Bittorrent and playing the -downloaded content on screen. It can be used illegally if it is used -to distribute content against the will of the right holder, but it can -also be used legally to play a lot of content, for example the -millions of movies -available from the -Internet Archive or the collection -available from Vodo. We created -a -video demonstrating legally use of Popcorn Time and played it in -Court. It can of course be downloaded using Bittorrent.

- -

I did not quite know what to expect from a day in court. The -government held on to their version of the story and we held on to -ours, and I hope the judge is able to make sense of it all. We will -know in two weeks time. Unfortunately I do not have high hopes, as -the Government have the upper hand here with more knowledge about the -case, better training in handling criminal law and in general higher -standing in the courts than fairly unknown DNS registrar and member -associations. It is expensive to be right also in Norway. So far the -case have cost more than NOK 70 000,-. To help fund the case, NUUG -and EFN have asked for donations, and managed to collect around NOK 25 -000,- so far. Given the presentation from the Government, I expect -the government to appeal if the case go our way. And if the case do -not go our way, I hope we have enough funding to appeal.

- -

From the other side came two people from Økokrim. On the benches, -appearing to be part of the group from the government were two people -from the Simonsen Vogt Wiik lawyer office, and three others I am not -quite sure who was. Økokrim had proposed to present two witnesses -from The Motion Picture Association, but this was rejected because -they did not speak Norwegian and it was a bit late to bring in a -translator, but perhaps the two from MPA were present anyway. All -seven appeared to know each other. Good to see the case is take -seriously.

- -

If you, like me, believe the courts should be involved before a DNS -domain is hijacked by the government, or you believe the Popcorn Time -technology have a lot of useful and legal applications, I suggest you -too donate to -the NUUG defense fund. Both Bitcoin and bank transfer are -available. If NUUG get more than we need for the legal action (very -unlikely), the rest will be spend promoting free software, open -standards and unix-like operating systems in Norway, so no matter what -happens the money will be put to good use.

- -

If you want to lean more about the case, I recommend you check out -the blog -posts from NUUG covering the case. They cover the legal arguments -on both sides.

+ +
31st October 2017
+

I was surprised today to learn that a friend in academia did not +know there are easily available web services available for writing +LaTeX documents as a team. I thought it was common knowledge, but to +make sure at least my readers are aware of it, I would like to mention +these useful services for writing LaTeX documents. Some of them even +provide a WYSIWYG editor to ease writing even further.

+ +

There are two commercial services available, +ShareLaTeX and +Overleaf. They are very easy to +use. Just start a new document, select which publisher to write for +(ie which LaTeX style to use), and start writing. Note, these two +have announced their intention to join forces, so soon it will only be +one joint service. I've used both for different documents, and they +work just fine. While +ShareLaTeX is free +software, while the latter is not. According to a +announcement from Overleaf, they plan to keep the ShareLaTeX code +base maintained as free software.

+ +But these two are not the only alternatives. +Fidus Writer is another free +software solution with the +source available on github. I have not used it myself. Several +others can be found on the nice +alterntiveTo +web service. + +

If you like Google Docs or Etherpad, but would like to write +documents in LaTeX, you should check out these services. You can even +host your own, if you want to. :)

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

@@ -329,41 +496,292 @@ on both sides.

- -
12th January 2017
-

I dag fikk jeg en skikkelig gladmelding. Bakgrunnen er at før jul -arrangerte Nasjonalbiblioteket -et -seminar om sitt knakende gode tiltak «verksregister». Eneste -måten å melde seg på dette seminaret var å sende personopplysninger -til Google via Google Skjemaer. Dette syntes jeg var tvilsom praksis, -da det bør være mulig å delta på seminarer arrangert av det offentlige -uten å måtte dele sine interesser, posisjon og andre -personopplysninger med Google. Jeg ba derfor om innsyn via -Mimes brønn i -avtaler -og vurderinger Nasjonalbiblioteket hadde rundt dette. -Personopplysningsloven legger klare rammer for hva som må være på -plass før en kan be tredjeparter, spesielt i utlandet, behandle -personopplysninger på sine vegne, så det burde eksistere grundig -dokumentasjon før noe slikt kan bli lovlig. To jurister hos -Nasjonalbiblioteket mente først dette var helt i orden, og at Googles -standardavtale kunne brukes som databehandlingsavtale. Det syntes jeg -var merkelig, men har ikke hatt kapasitet til å følge opp saken før -for to dager siden.

- -

Gladnyheten i dag, som kom etter at jeg tipset Nasjonalbiblioteket -om at Datatilsynet underkjente Googles standardavtaler som -databehandleravtaler i 2011, er at Nasjonalbiblioteket har bestemt seg -for å avslutte bruken av Googles Skjemaer/Apps og gå i dialog med DIFI -for å finne bedre måter å håndtere påmeldinger i tråd med -personopplysningsloven. Det er fantastisk å se at av og til hjelper -det å spørre hva i alle dager det offentlige holder på med.

+ +
25th October 2017
+

Recently, I needed to automatically check the copyright status of a +set of The Internet Movie database +(IMDB) entries, to figure out which one of the movies they refer +to can be freely distributed on the Internet. This proved to be +harder than it sounds. IMDB for sure list movies without any +copyright protection, where the copyright protection has expired or +where the movie is lisenced using a permissive license like one from +Creative Commons. These are mixed with copyright protected movies, +and there seem to be no way to separate these classes of movies using +the information in IMDB.

+ +

First I tried to look up entries manually in IMDB, +Wikipedia and +The Internet Archive, to get a +feel how to do this. It is hard to know for sure using these sources, +but it should be possible to be reasonable confident a movie is "out +of copyright" with a few hours work per movie. As I needed to check +almost 20,000 entries, this approach was not sustainable. I simply +can not work around the clock for about 6 years to check this data +set.

+ +

I asked the people behind The Internet Archive if they could +introduce a new metadata field in their metadata XML for IMDB ID, but +was told that they leave it completely to the uploaders to update the +metadata. Some of the metadata entries had IMDB links in the +description, but I found no way to download all metadata files in bulk +to locate those ones and put that approach aside.

+ +

In the process I noticed several Wikipedia articles about movies +had links to both IMDB and The Internet Archive, and it occured to me +that I could use the Wikipedia RDF data set to locate entries with +both, to at least get a lower bound on the number of movies on The +Internet Archive with a IMDB ID. This is useful based on the +assumption that movies distributed by The Internet Archive can be +legally distributed on the Internet. With some help from the RDF +community (thank you DanC), I was able to come up with this query to +pass to the SPARQL interface on +Wikidata: + +

+SELECT ?work ?imdb ?ia ?when ?label
+WHERE
+{
+  ?work wdt:P31/wdt:P279* wd:Q11424.
+  ?work wdt:P345 ?imdb.
+  ?work wdt:P724 ?ia.
+  OPTIONAL {
+        ?work wdt:P577 ?when.
+        ?work rdfs:label ?label.
+        FILTER(LANG(?label) = "en").
+  }
+}
+

+ +

If I understand the query right, for every film entry anywhere in +Wikpedia, it will return the IMDB ID and The Internet Archive ID, and +when the movie was released and its English title, if either or both +of the latter two are available. At the moment the result set contain +2338 entries. Of course, it depend on volunteers including both +correct IMDB and The Internet Archive IDs in the wikipedia articles +for the movie. It should be noted that the result will include +duplicates if the movie have entries in several languages. There are +some bogus entries, either because The Internet Archive ID contain a +typo or because the movie is not available from The Internet Archive. +I did not verify the IMDB IDs, as I am unsure how to do that +automatically.

+ +

I wrote a small python script to extract the data set from Wikidata +and check if the XML metadata for the movie is available from The +Internet Archive, and after around 1.5 hour it produced a list of 2097 +free movies and their IMDB ID. In total, 171 entries in Wikidata lack +the refered Internet Archive entry. I assume the 70 "disappearing" +entries (ie 2338-2097-171) are duplicate entries.

+ +

This is not too bad, given that The Internet Archive report to +contain 5331 +feature films at the moment, but it also mean more than 3000 +movies are missing on Wikipedia or are missing the pair of references +on Wikipedia.

+ +

I was curious about the distribution by release year, and made a +little graph to show how the amount of free movies is spread over the +years:

+ +

+ +

I expect the relative distribution of the remaining 3000 movies to +be similar.

+ +

If you want to help, and want to ensure Wikipedia can be used to +cross reference The Internet Archive and The Internet Movie Database, +please make sure entries like this are listed under the "External +links" heading on the Wikipedia article for the movie:

+ +

+* {{Internet Archive film|id=FightingLady}}
+* {{IMDb title|id=0036823|title=The Fighting Lady}}
+

+ +

Please verify the links on the final page, to make sure you did not +introduce a typo.

+ +

Here is the complete list, if you want to correct the 171 +identified Wikipedia entries with broken links to The Internet +Archive: Q1140317, +Q458656, +Q458656, +Q470560, +Q743340, +Q822580, +Q480696, +Q128761, +Q1307059, +Q1335091, +Q1537166, +Q1438334, +Q1479751, +Q1497200, +Q1498122, +Q865973, +Q834269, +Q841781, +Q841781, +Q1548193, +Q499031, +Q1564769, +Q1585239, +Q1585569, +Q1624236, +Q4796595, +Q4853469, +Q4873046, +Q915016, +Q4660396, +Q4677708, +Q4738449, +Q4756096, +Q4766785, +Q880357, +Q882066, +Q882066, +Q204191, +Q204191, +Q1194170, +Q940014, +Q946863, +Q172837, +Q573077, +Q1219005, +Q1219599, +Q1643798, +Q1656352, +Q1659549, +Q1660007, +Q1698154, +Q1737980, +Q1877284, +Q1199354, +Q1199354, +Q1199451, +Q1211871, +Q1212179, +Q1238382, +Q4906454, +Q320219, +Q1148649, +Q645094, +Q5050350, +Q5166548, +Q2677926, +Q2698139, +Q2707305, +Q2740725, +Q2024780, +Q2117418, +Q2138984, +Q1127992, +Q1058087, +Q1070484, +Q1080080, +Q1090813, +Q1251918, +Q1254110, +Q1257070, +Q1257079, +Q1197410, +Q1198423, +Q706951, +Q723239, +Q2079261, +Q1171364, +Q617858, +Q5166611, +Q5166611, +Q324513, +Q374172, +Q7533269, +Q970386, +Q976849, +Q7458614, +Q5347416, +Q5460005, +Q5463392, +Q3038555, +Q5288458, +Q2346516, +Q5183645, +Q5185497, +Q5216127, +Q5223127, +Q5261159, +Q1300759, +Q5521241, +Q7733434, +Q7736264, +Q7737032, +Q7882671, +Q7719427, +Q7719444, +Q7722575, +Q2629763, +Q2640346, +Q2649671, +Q7703851, +Q7747041, +Q6544949, +Q6672759, +Q2445896, +Q12124891, +Q3127044, +Q2511262, +Q2517672, +Q2543165, +Q426628, +Q426628, +Q12126890, +Q13359969, +Q13359969, +Q2294295, +Q2294295, +Q2559509, +Q2559912, +Q7760469, +Q6703974, +Q4744, +Q7766962, +Q7768516, +Q7769205, +Q7769988, +Q2946945, +Q3212086, +Q3212086, +Q18218448, +Q18218448, +Q18218448, +Q6909175, +Q7405709, +Q7416149, +Q7239952, +Q7317332, +Q7783674, +Q7783704, +Q7857590, +Q3372526, +Q3372642, +Q3372816, +Q3372909, +Q7959649, +Q7977485, +Q7992684, +Q3817966, +Q3821852, +Q3420907, +Q3429733, +Q774474

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

@@ -371,86 +789,30 @@ det å spørre hva i alle dager det offentlige holder på med.

- -
11th January 2017
-

Jeg leste med interesse en nyhetssak hos -digi.no -og -NRK -om at det ikke bare er meg, men at også NAV bedriver geolokalisering -av IP-adresser, og at det gjøres analyse av IP-adressene til de som -sendes inn meldekort for å se om meldekortet sendes inn fra -utenlandske IP-adresser. Politiadvokat i Drammen, Hans Lyder Haare, -er sitert i NRK på at «De to er jo blant annet avslørt av -IP-adresser. At man ser at meldekortet kommer fra utlandet.»

- -

Jeg synes det er fint at det blir bedre kjent at IP-adresser -knyttes til enkeltpersoner og at innsamlet informasjon brukes til å -stedsbestemme personer også av aktører her i Norge. Jeg ser det som -nok et argument for å bruke -Tor så mye som mulig for å -gjøre gjøre IP-lokalisering vanskeligere, slik at en kan beskytte sin -privatsfære og unngå å dele sin fysiske plassering med -uvedkommede.

- -

Men det er en ting som bekymrer meg rundt denne nyheten. Jeg ble -tipset (takk #nuug) om -NAVs -personvernerklæring, som under punktet «Personvern og statistikk» -lyder:

+ +
14th October 2017
+

I find it fascinating how many of the people being locked inside +the proposed border wall between USA and Mexico support the idea. The +proposal to keep Mexicans out reminds me of +the +propaganda twist from the East Germany government calling the wall +the “Antifascist Bulwark” after erecting the Berlin Wall, claiming +that the wall was erected to keep enemies from creeping into East +Germany, while it was obvious to the people locked inside it that it +was erected to keep the people from escaping.

+ +

Do the people in USA supporting this wall really believe it is a +one way wall, only keeping people on the outside from getting in, +while not keeping people in the inside from getting out?

-

- -

«Når du besøker nav.no, etterlater du deg elektroniske spor. Sporene -dannes fordi din nettleser automatisk sender en rekke opplysninger til -NAVs tjener (server-maskin) hver gang du ber om å få vist en side. Det -er eksempelvis opplysninger om hvilken nettleser og -versjon du -bruker, og din internettadresse (ip-adresse). For hver side som vises, -lagres følgende opplysninger:

- -
    -
  • hvilken side du ser pÃ¥
  • -
  • dato og tid
  • -
  • hvilken nettleser du bruker
  • -
  • din ip-adresse
  • -
- -

Ingen av opplysningene vil bli brukt til å identifisere -enkeltpersoner. NAV bruker disse opplysningene til å generere en -samlet statistikk som blant annet viser hvilke sider som er mest -populære. Statistikken er et redskap til å forbedre våre -tjenester.»

- -

- -

Jeg klarer ikke helt å se hvordan analyse av de besøkendes -IP-adresser for å se hvem som sender inn meldekort via web fra en -IP-adresse i utlandet kan gjøres uten å komme i strid med påstanden om -at «ingen av opplysningene vil bli brukt til å identifisere -enkeltpersoner». Det virker dermed for meg som at NAV bryter sine -egen personvernerklæring, hvilket -Datatilsynet -fortalte meg i starten av desember antagelig er brudd på -personopplysningsloven. - -

I tillegg er personvernerklæringen ganske misvisende i og med at -NAVs nettsider ikke bare forsyner NAV med personopplysninger, men i -tillegg ber brukernes nettleser kontakte fem andre nettjenere -(script.hotjar.com, static.hotjar.com, vars.hotjar.com, -www.google-analytics.com og www.googletagmanager.com), slik at -personopplysninger blir gjort tilgjengelig for selskapene Hotjar og -Google , og alle som kan lytte på trafikken på veien (som FRA, GCHQ og -NSA). Jeg klarer heller ikke se hvordan slikt spredning av -personopplysninger kan være i tråd med kravene i -personopplysningloven, eller i tråd med NAVs personvernerklæring.

- -

Kanskje NAV bør ta en nøye titt på sin personvernerklæring? Eller -kanskje Datatilsynet bør gjøre det?

+

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

- Tags: norsk, nuug, personvern, surveillance. + Tags: english.
@@ -458,164 +820,52 @@ kanskje Datatilsynet bør gjøre det?

- -
9th January 2017
-

Did you ever wonder where the web trafic really flow to reach the -web servers, and who own the network equipment it is flowing through? -It is possible to get a glimpse of this from using traceroute, but it -is hard to find all the details. Many years ago, I wrote a system to -map the Norwegian Internet (trying to figure out if our plans for a -network game service would get low enough latency, and who we needed -to talk to about setting up game servers close to the users. Back -then I used traceroute output from many locations (I asked my friends -to run a script and send me their traceroute output) to create the -graph and the map. The output from traceroute typically look like -this: - -

-traceroute to www.stortinget.no (85.88.67.10), 30 hops max, 60 byte packets
- 1  uio-gw10.uio.no (129.240.202.1)  0.447 ms  0.486 ms  0.621 ms
- 2  uio-gw8.uio.no (129.240.24.229)  0.467 ms  0.578 ms  0.675 ms
- 3  oslo-gw1.uninett.no (128.39.65.17)  0.385 ms  0.373 ms  0.358 ms
- 4  te3-1-2.br1.fn3.as2116.net (193.156.90.3)  1.174 ms  1.172 ms  1.153 ms
- 5  he16-1-1.cr1.san110.as2116.net (195.0.244.234)  2.627 ms he16-1-1.cr2.oslosda310.as2116.net (195.0.244.48)  3.172 ms he16-1-1.cr1.san110.as2116.net (195.0.244.234)  2.857 ms
- 6  ae1.ar8.oslosda310.as2116.net (195.0.242.39)  0.662 ms  0.637 ms ae0.ar8.oslosda310.as2116.net (195.0.242.23)  0.622 ms
- 7  89.191.10.146 (89.191.10.146)  0.931 ms  0.917 ms  0.955 ms
- 8  * * *
- 9  * * *
-[...]
-

- -

This show the DNS names and IP addresses of (at least some of the) -network equipment involved in getting the data traffic from me to the -www.stortinget.no server, and how long it took in milliseconds for a -package to reach the equipment and return to me. Three packages are -sent, and some times the packages do not follow the same path. This -is shown for hop 5, where three different IP addresses replied to the -traceroute request.

- -

There are many ways to measure trace routes. Other good traceroute -implementations I use are traceroute (using ICMP packages) mtr (can do -both ICMP, UDP and TCP) and scapy (python library with ICMP, UDP, TCP -traceroute and a lot of other capabilities). All of them are easily -available in Debian.

- -

This time around, I wanted to know the geographic location of -different route points, to visualize how visiting a web page spread -information about the visit to a lot of servers around the globe. The -background is that a web site today often will ask the browser to get -from many servers the parts (for example HTML, JSON, fonts, -JavaScript, CSS, video) required to display the content. This will -leak information about the visit to those controlling these servers -and anyone able to peek at the data traffic passing by (like your ISP, -the ISPs backbone provider, FRA, GCHQ, NSA and others).

- -

Lets pick an example, the Norwegian parliament web site -www.stortinget.no. It is read daily by all members of parliament and -their staff, as well as political journalists, activits and many other -citizens of Norway. A visit to the www.stortinget.no web site will -ask your browser to contact 8 other servers: ajax.googleapis.com, -insights.hotjar.com, script.hotjar.com, static.hotjar.com, -stats.g.doubleclick.net, www.google-analytics.com, -www.googletagmanager.com and www.netigate.se. I extracted this by -asking PhantomJS to visit the -Stortinget web page and tell me all the URLs PhantomJS downloaded to -render the page (in HAR format using -their -netsniff example. I am very grateful to Gorm for showing me how -to do this). My goal is to visualize network traces to all IP -addresses behind these DNS names, do show where visitors personal -information is spread when visiting the page.

- -

map of combined traces for URLs used by www.stortinget.no using GeoIP

- -

When I had a look around for options, I could not find any good -free software tools to do this, and decided I needed my own traceroute -wrapper outputting KML based on locations looked up using GeoIP. KML -is easy to work with and easy to generate, and understood by several -of the GIS tools I have available. I got good help from by NUUG -colleague Anders Einar with this, and the result can be seen in -my -kmltraceroute git repository. Unfortunately, the quality of the -free GeoIP databases I could find (and the for-pay databases my -friends had access to) is not up to the task. The IP addresses of -central Internet infrastructure would typically be placed near the -controlling companies main office, and not where the router is really -located, as you can see from the -KML file I created using the GeoLite City dataset from MaxMind. - -

scapy traceroute graph for URLs used by www.stortinget.no

- -

I also had a look at the visual traceroute graph created by -the scrapy project, -showing IP network ownership (aka AS owner) for the IP address in -question. -The -graph display a lot of useful information about the traceroute in SVG -format, and give a good indication on who control the network -equipment involved, but it do not include geolocation. This graph -make it possible to see the information is made available at least for -UNINETT, Catchcom, Stortinget, Nordunet, Google, Amazon, Telia, Level -3 Communications and NetDNA.

- -

example geotraceroute view for www.stortinget.no

- -

In the process, I came across the -web service GeoTraceroute by -Salim Gasmi. Its methology of combining guesses based on DNS names, -various location databases and finally use latecy times to rule out -candidate locations seemed to do a very good job of guessing correct -geolocation. But it could only do one trace at the time, did not have -a sensor in Norway and did not make the geolocations easily available -for postprocessing. So I contacted the developer and asked if he -would be willing to share the code (he refused until he had time to -clean it up), but he was interested in providing the geolocations in a -machine readable format, and willing to set up a sensor in Norway. So -since yesterday, it is possible to run traces from Norway in this -service thanks to a sensor node set up by -the NUUG assosiation, and get the -trace in KML format for further processing.

- -

map of combined traces for URLs used by www.stortinget.no using geotraceroute

- -

Here we can see a lot of trafic passes Sweden on its way to -Denmark, Germany, Holland and Ireland. Plenty of places where the -Snowden confirmations verified the traffic is read by various actors -without your best interest as their top priority.

- -

Combining KML files is trivial using a text editor, so I could loop -over all the hosts behind the urls imported by www.stortinget.no and -ask for the KML file from GeoTraceroute, and create a combined KML -file with all the traces (unfortunately only one of the IP addresses -behind the DNS name is traced this time. To get them all, one would -have to request traces using IP number instead of DNS names from -GeoTraceroute). That might be the next step in this project.

- -

Armed with these tools, I find it a lot easier to figure out where -the IP traffic moves and who control the boxes involved in moving it. -And every time the link crosses for example the Swedish border, we can -be sure Swedish Signal Intelligence (FRA) is listening, as GCHQ do in -Britain and NSA in USA and cables around the globe. (Hm, what should -we tell them? :) Keep that in mind if you ever send anything -unencrypted over the Internet.

- -

PS: KML files are drawn using -the KML viewer from Ivan -Rublev, as it was less cluttered than the local Linux application -Marble. There are heaps of other options too.

+ +
9th October 2017
+

At my nearby maker space, +Sonen, I heard the story that it +was easier to generate gcode files for theyr 3D printers (Ultimake 2+) +on Windows and MacOS X than Linux, because the software involved had +to be manually compiled and set up on Linux while premade packages +worked out of the box on Windows and MacOS X. I found this annoying, +as the software involved, +Cura, is free software +and should be trivial to get up and running on Linux if someone took +the time to package it for the relevant distributions. I even found +a request for adding into +Debian from 2013, which had seem some activity over the years but +never resulted in the software showing up in Debian. So a few days +ago I offered my help to try to improve the situation.

+ +

Now I am very happy to see that all the packages required by a +working Cura in Debian are uploaded into Debian and waiting in the NEW +queue for the ftpmasters to have a look. You can track the progress +on +the +status page for the 3D printer team.

+ +

The uploaded packages are a bit behind upstream, and was uploaded +now to get slots in the NEW +queue while we work up updating the packages to the latest +upstream version.

+ +

On a related note, two competitors for Cura, which I found harder +to use and was unable to configure correctly for Ultimaker 2+ in the +short time I spent on it, are already in Debian. If you are looking +for 3D printer "slicers" and want something already available in +Debian, check out +slic3r and +slic3r-prusa. +The latter is a fork of the former.

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address -15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

+15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

@@ -623,77 +873,35 @@ activities, please send Bitcoin donations to my address
- -
4th January 2017
-

Do you have a large iCalendar -file with lots of old entries, and would like to archive them to save -space and resources? At least those of us using KOrganizer know that -turning on and off an event set become slower and slower the more -entries are in the set. While working on migrating our calendars to a -Radicale CalDAV server on our -Freedombox server, my -loved one wondered if I could find a way to split up the calendar file -she had in KOrganizer, and I set out to write a tool. I spent a few -days writing and polishing the system, and it is now ready for general -consumption. The -code for -ical-archiver is publicly available from a git repository on -github. The system is written in Python and depend on -the vobject Python -module.

- -

To use it, locate the iCalendar file you want to operate on and -give it as an argument to the ical-archiver script. This will -generate a set of new files, one file per component type per year for -all components expiring more than two years in the past. The vevent, -vtodo and vjournal entries are handled by the script. The remaining -entries are stored in a 'remaining' file.

- -

This is what a test run can look like: - -

-% ical-archiver t/2004-2016.ics 
-Found 3612 vevents
-Found 6 vtodos
-Found 2 vjournals
-Writing t/2004-2016.ics-subset-vevent-2004.ics
-Writing t/2004-2016.ics-subset-vevent-2005.ics
-Writing t/2004-2016.ics-subset-vevent-2006.ics
-Writing t/2004-2016.ics-subset-vevent-2007.ics
-Writing t/2004-2016.ics-subset-vevent-2008.ics
-Writing t/2004-2016.ics-subset-vevent-2009.ics
-Writing t/2004-2016.ics-subset-vevent-2010.ics
-Writing t/2004-2016.ics-subset-vevent-2011.ics
-Writing t/2004-2016.ics-subset-vevent-2012.ics
-Writing t/2004-2016.ics-subset-vevent-2013.ics
-Writing t/2004-2016.ics-subset-vevent-2014.ics
-Writing t/2004-2016.ics-subset-vjournal-2007.ics
-Writing t/2004-2016.ics-subset-vjournal-2011.ics
-Writing t/2004-2016.ics-subset-vtodo-2012.ics
-Writing t/2004-2016.ics-remaining.ics
-%
-

- -

As you can see, the original file is untouched and new files are -written with names derived from the original file. If you are happy -with their content, the *-remaining.ics file can replace the original -the the others can be archived or imported as historical calendar -collections.

- -

The script should probably be improved a bit. The error handling -when discovering broken entries is not good, and I am not sure yet if -it make sense to split different entry types into separate files or -not. The program is thus likely to change. If you find it -interesting, please get in touch. :)

- -

As usual, if you use Bitcoin and want to show your support of my -activities, please send Bitcoin donations to my address -15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

+ +
4th October 2017
+
Når jeg holder på med ulike prosjekter, så trenger jeg stadig ulike +skruer. Det siste prosjektet jeg holder på med er å lage +en boks til en +HDMI-touch-skjerm som skal brukes med Raspberry Pi. Boksen settes +sammen med skruer og bolter, og jeg har vært i tvil om hvor jeg kan +få tak i de riktige skruene. Clas Ohlson og Jernia i nærheten har +sjelden hatt det jeg trenger. Men her om dagen fikk jeg et fantastisk +tips for oss som bor i Oslo. +Zachariassen Jernvare AS i +Hegermannsgate +23A på Torshov har et fantastisk utvalg, og åpent mellom 09:00 og +17:00. De selger skruer, muttere, bolter, skiver etc i løs vekt, og +så langt har jeg fått alt jeg har lett etter. De har i tillegg det +meste av annen jernvare, som verktøy, lamper, ledninger, etc. Jeg +håper de har nok kunder til å holde det gående lenge, da dette er en +butikk jeg kommer til å besøke ofte. Butikken er et funn å ha i +nabolaget for oss som liker å bygge litt selv. :)

+ +

Som vanlig, hvis du bruker Bitcoin og ønsker å vise din støtte til +det jeg driver med, setter jeg pris på om du sender Bitcoin-donasjoner +til min adresse +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

- Tags: english, standard. + Tags: norsk.
@@ -715,7 +923,23 @@ activities, please send Bitcoin donations to my address
  • February (3)
  • -
  • March (3)
  • +
  • March (5)
  • + +
  • April (2)
  • + +
  • June (5)
  • + +
  • July (1)
  • + +
  • August (1)
  • + +
  • September (3)
  • + +
  • October (5)
  • + +
  • November (3)
  • + +
  • December (2)
  • @@ -967,7 +1191,7 @@ activities, please send Bitcoin donations to my address

    Tags