So the new president in the United States of America claim to be -surprised to discover that he was wiretapped during the election -before he was elected president. He even claim this must be illegal. -Well, doh, if it is one thing the confirmations from Snowden -documented, it is that the entire population in USA is wiretapped, one -way or another. Of course the president candidates were wiretapped, -alongside the senators, judges and the rest of the people in USA.
- -Next, the Federal Bureau of Investigation ask the Department of -Justice to go public rejecting the claims that Donald Trump was -wiretapped illegally. I fail to see the relevance, given that I am -sure the surveillance industry in USA according to themselves believe -they have all the legal backing they need to conduct mass surveillance -on the entire world.
- -There is even the director of the FBI stating that he never saw an -order requesting wiretapping of Donald Trump. That is not very -surprising, given how the FISA court work, with all its activity being -secret. Perhaps he only heard about it?
- -What I find most sad in this story is how Norwegian journalists -present it. In a news reports the other day in the radio from the -Norwegian National broadcasting Company (NRK), I heard the journalist -claim that 'the FBI denies any wiretapping', while the reality is that -'the FBI denies any illegal wiretapping'. There is a fundamental and -important difference, and it make me sad that the journalists are -unable to grasp it.
+ +While looking at +the scanned copies +for the copyright renewal entries for movies published in the USA, +an idea occurred to me. The number of renewals are so few per year, it +should be fairly quick to transcribe them all and add references to +the corresponding IMDB title ID. This would give the (presumably) +complete list of movies published 28 years earlier that did _not_ +enter the public domain for the transcribed year. By fetching the +list of USA movies published 28 years earlier and subtract the movies +with renewals, we should be left with movies registered in IMDB that +are now in the public domain. For the year 1955 (which is the one I +have looked at the most), the total number of pages to transcribe is +21. For the 28 years from 1950 to 1978, it should be in the range +500-600 pages. It is just a few days of work, and spread among a +small group of people it should be doable in a few weeks of spare +time.
+ +A typical copyright renewal entry look like this (the first one +listed for 1955):
+ ++ ADAM AND EVIL, a photoplay in seven reels by Metro-Goldwyn-Mayer + Distribution Corp. (c) 17Aug27; L24293. Loew's Incorporated (PWH); + 10Jun55; R151558. ++ +
The movie title as well as registration and renewal dates are easy +enough to locate by a program (split on first comma and look for +DDmmmYY). The rest of the text is not required to find the movie in +IMDB, but is useful to confirm the correct movie is found. I am not +quite sure what the L and R numbers mean, but suspect they are +reference numbers into the archive of the US Copyright Office.
+ +Tracking down the equivalent IMDB title ID is probably going to be +a manual task, but given the year it is fairly easy to search for the +movie title using for example +http://www.imdb.com/find?q=adam+and+evil+1927&s=all. +Using this search, I find that the equivalent IMDB title ID for the +first renewal entry from 1955 is +http://www.imdb.com/title/tt0017588/.
+ +I suspect the best way to do this would be to make a specialised +web service to make it easy for contributors to transcribe and track +down IMDB title IDs. In the web service, once a entry is transcribed, +the title and year could be extracted from the text, a search in IMDB +conducted for the user to pick the equivalent IMDB title ID right +away. By spreading out the work among volunteers, it would also be +possible to make at least two persons transcribe the same entries to +be able to discover any typos introduced. But I will need help to +make this happen, as I lack the spare time to do all of this on my +own. If you would like to help, please get in touch. Perhaps you can +draft a web service for crowd sourcing the task?
+ +Note, Project Gutenberg already have some +transcribed +copies of the US Copyright Office renewal protocols, but I have +not been able to find any film renewals there, so I suspect they only +have copies of renewal for written works. I have not been able to find +any transcribed versions of movie renewals so far. Perhaps they exist +somewhere?
+ +I would love to figure out methods for finding all the public +domain works in other countries too, but it is a lot harder. At least +for Norway and Great Britain, such work involve tracking down the +people involved in making the movie and figuring out when they died. +It is hard enough to figure out who was part of making a movie, but I +do not know how to automate such procedure without a registry of every +person involved in making movies and their death year.
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
For almost a year now, we have been working on making a Norwegian -Bokmål edition of The Debian -Administrator's Handbook. Now, thanks to the tireless effort of -Ole-Erik, Ingrid and Andreas, the initial translation is complete, and -we are working on the proof reading to ensure consistent language and -use of correct computer science terms. The plan is to make the book -available on paper, as well as in electronic form. For that to -happen, the proof reading must be completed and all the figures need -to be translated. If you want to help out, get in touch.
- -A - -fresh PDF edition in A4 format (the final book will have smaller -pages) of the book created every morning is available for -proofreading. If you find any errors, please -visit -Weblate and correct the error. The -state -of the translation including figures is a useful source for those -provide Norwegian bokmål screen shots and figures.
+ +Three years ago, a presumed lost animation film, +Empty Socks from +1927, was discovered in the Norwegian National Library. At the +time it was discovered, it was generally assumed to be copyrighted by +The Walt Disney Company, and I blogged about +my +reasoning to conclude that it would would enter the Norwegian +equivalent of the public domain in 2053, based on my understanding of +Norwegian Copyright Law. But a few days ago, I came across +a +blog post claiming the movie was already in the public domain, at +least in USA. The reasoning is as follows: The film was released in +November or Desember 1927 (sources disagree), and presumably +registered its copyright that year. At that time, right holders of +movies registered by the copyright office received government +protection for there work for 28 years. After 28 years, the copyright +had to be renewed if the wanted the government to protect it further. +The blog post I found claim such renewal did not happen for this +movie, and thus it entered the public domain in 1956. Yet someone +claim the copyright was renewed and the movie is still copyright +protected. Can anyone help me to figure out which claim is correct? +I have not been able to find Empty Socks in Catalog of copyright +entries. Ser.3 pt.12-13 v.9-12 1955-1958 Motion Pictures +available +from the University of Pennsylvania, neither in +page +45 for the first half of 1955, nor in +page +119 for the second half of 1955. It is of course possible that +the renewal entry was left out of the printed catalog by mistake. Is +there some way to rule out this possibility? Please help, and update +the wikipedia page with your findings. + +
As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
A few days ago I ordered a small batch of -the ChaosKey, a small -USB dongle for generating entropy created by Bdale Garbee and Keith -Packard. Yesterday it arrived, and I am very happy to report that it -work great! According to its designers, to get it to work out of the -box, you need the Linux kernel version 4.1 or later. I tested on a -Debian Stretch machine (kernel version 4.9), and there it worked just -fine, increasing the available entropy very quickly. I wrote a small -test oneliner to test. It first print the current entropy level, -drain /dev/random, and then print the entropy level for five seconds. -Here is the situation without the ChaosKey inserted:
- -- --% cat /proc/sys/kernel/random/entropy_avail; \ - dd bs=1M if=/dev/random of=/dev/null count=1; \ - for n in $(seq 1 5); do \ - cat /proc/sys/kernel/random/entropy_avail; \ - sleep 1; \ - done -300 -0+1 oppføringer inn -0+1 oppføringer ut -28 byte kopiert, 0,000264565 s, 106 kB/s -4 -8 -12 -17 -21 -% -
The entropy level increases by 3-4 every second. In such case any -application requiring random bits (like a HTTPS enabled web server) -will halt and wait for more entrpy. And here is the situation with -the ChaosKey inserted:
- -- --% cat /proc/sys/kernel/random/entropy_avail; \ - dd bs=1M if=/dev/random of=/dev/null count=1; \ - for n in $(seq 1 5); do \ - cat /proc/sys/kernel/random/entropy_avail; \ - sleep 1; \ - done -1079 -0+1 oppføringer inn -0+1 oppføringer ut -104 byte kopiert, 0,000487647 s, 213 kB/s -433 -1028 -1031 -1035 -1038 -% -
Quite the difference. :) I bought a few more than I need, in case -someone want to buy one here in Norway. :)
- -Update: The dongle was presented at Debconf last year. You might -find the talk -recording illuminating. It explains exactly what the source of -randomness is, if you are unable to spot it from the schema drawing -available from the ChaosKey web site linked at the start of this blog -post.
+ +It would be easier to locate the movie you want to watch in +the Internet Archive, if the +metadata about each movie was more complete and accurate. In the +archiving community, a well known saying state that good metadata is a +love letter to the future. The metadata in the Internet Archive could +use a face lift for the future to love us back. Here is a proposal +for a small improvement that would make the metadata more useful +today. I've been unable to find any document describing the various +standard fields available when uploading videos to the archive, so +this proposal is based on my best quess and searching through several +of the existing movies.
+ +I have a few use cases in mind. First of all, I would like to be +able to count the number of distinct movies in the Internet Archive, +without duplicates. I would further like to identify the IMDB title +ID of the movies in the Internet Archive, to be able to look up a IMDB +title ID and know if I can fetch the video from there and share it +with my friends.
+ +Second, I would like the Butter data provider for The Internet +archive +(available +from github), to list as many of the good movies as possible. The +plugin currently do a search in the archive with the following +parameters:
+ ++collection:moviesandfilms +AND NOT collection:movie_trailers +AND -mediatype:collection +AND format:"Archive BitTorrent" +AND year ++ +
Most of the cool movies that fail to show up in Butter do so +because the 'year' field is missing. The 'year' field is populated by +the year part from the 'date' field, and should be when the movie was +released (date or year). Two such examples are +Ben Hur +from 1905 and +Caminandes +2: Gran Dillama from 2013, where the year metadata field is +missing.
+ +So, my proposal is simply, for every movie in The Internet Archive +where an IMDB title ID exist, please fill in these metadata fields +(note, they can be updated also long after the video was uploaded, but +as far as I can tell, only by the uploader): + +-
+
+
- mediatype +
- Should be 'movie' for movies. + +
- collection +
- Should contain 'moviesandfilms'. + +
- title +
- The title of the movie, without the publication year. + +
- date +
- The data or year the movie was released. This make the movie show +up in Butter, as well as make it possible to know the age of the +movie and is useful to figure out copyright status. + +
- director +
- The director of the movie. This make it easier to know if the +correct movie is found in movie databases. + +
- publisher +
- The production company making the movie. Also useful for +identifying the correct movie. + +
- links + +
- Add a link to the IMDB title page, for example like this: <a +href="http://www.imdb.com/title/tt0028496/">Movie in +IMDB</a>. This make it easier to find duplicates and allow for +counting of number of unique movies in the Archive. Other external +references, like to TMDB, could be added like this too. + +
I did consider proposing a Custom field for the IMDB title ID (for +example 'imdb_title_url', 'imdb_code' or simply 'imdb', but suspect it +will be easier to simply place it in the links free text field.
+ +I created +a +list of IMDB title IDs for several thousand movies in the Internet +Archive, but I also got a list of several thousand movies without +such IMDB title ID (and quite a few duplicates). It would be great if +this data set could be integrated into the Internet Archive metadata +to be available for everyone in the future, but with the current +policy of leaving metadata editing to the uploaders, it will take a +while before this happen. If you have uploaded movies into the +Internet Archive, you can help. Please consider following my proposal +above for your movies, to ensure that movie is properly +counted. :)
+ +The list is mostly generated using wikidata, which based on +Wikipedia articles make it possible to link between IMDB and movies in +the Internet Archive. But there are lots of movies without a +Wikipedia article, and some movies where only a collection page exist +(like for the +Caminandes example above, where there are three movies but only +one Wikidata entry).
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
I just noticed -the -new Norwegian proposal for archiving rules in the goverment list -ECMA-376 -/ ISO/IEC 29500 (aka OOXML) as valid formats to put in long term -storage. Luckily such files will only be accepted based on -pre-approval from the National Archive. Allowing OOXML files to be -used for long term storage might seem like a good idea as long as we -forget that there are plenty of ways for a "valid" OOXML document to -have content with no defined interpretation in the standard, which -lead to a question and an idea.
- -Is there any tool to detect if a OOXML document depend on such -undefined behaviour? It would be useful for the National Archive (and -anyone else interested in verifying that a document is well defined) -to have such tool available when considering to approve the use of -OOXML. I'm aware of the -officeotron OOXML -validator, but do not know how complete it is nor if it will -report use of undefined behaviour. Are there other similar tools -available? Please send me an email if you know of any such tool.
+ +A month ago, I blogged about my work to +automatically +check the copyright status of IMDB entries, and try to count the +number of movies listed in IMDB that is legal to distribute on the +Internet. I have continued to look for good data sources, and +identified a few more. The code used to extract information from +various data sources is available in +a +git repository, currently available from github.
+ +So far I have identified 3186 unique IMDB title IDs. To gain +better understanding of the structure of the data set, I created a +histogram of the year associated with each movie (typically release +year). It is interesting to notice where the peaks and dips in the +graph are located. I wonder why they are placed there. I suspect +World War II caused the dip around 1940, but what caused the peak +around 2010?
+ +I've so far identified ten sources for IMDB title IDs for movies in +the public domain or with a free license. This is the statistics +reported when running 'make stats' in the git repository:
+ ++ 249 entries ( 6 unique) with and 288 without IMDB title ID in free-movies-archive-org-butter.json + 2301 entries ( 540 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json + 830 entries ( 29 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json + 2109 entries ( 377 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json + 291 entries ( 122 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json + 144 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-manual.json + 350 entries ( 1 unique) with and 801 without IMDB title ID in free-movies-publicdomainmovies.json + 4 entries ( 0 unique) with and 124 without IMDB title ID in free-movies-publicdomainreview.json + 698 entries ( 119 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json + 8 entries ( 8 unique) with and 196 without IMDB title ID in free-movies-vodo.json + 3186 unique IMDB title IDs in total ++ +
The entries without IMDB title ID are candidates to increase the +data set, but might equally well be duplicates of entries already +listed with IMDB title ID in one of the other sources, or represent +movies that lack a IMDB title ID. I've seen examples of all these +situations when peeking at the entries without IMDB title ID. Based +on these data sources, the lower bound for movies listed in IMDB that +are legal to distribute on the Internet is between 3186 and 4713. + +
It would be great for improving the accuracy of this measurement, +if the various sources added IMDB title ID to their metadata. I have +tried to reach the people behind the various sources to ask if they +are interested in doing this, without any replies so far. Perhaps you +can help me get in touch with the people behind VODO, Public Domain +Torrents, Public Domain Movies and Public Domain Review to try to +convince them to add more metadata to their movie entries?
+ +Another way you could help is by adding pages to Wikipedia about +movies that are legal to distribute on the Internet. If such page +exist and include a link to both IMDB and The Internet Archive, the +script used to generate free-movies-archive-org-wikidata.json should +pick up the mapping as soon as wikidata is updates.
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
A few days ago, we received the ruling from -my -day in court. The case in question is a challenge of the seizure -of the DNS domain popcorn-time.no. The ruling simply did not mention -most of our arguments, and seemed to take everything ÃKOKRIM said at -face value, ignoring our demonstration and explanations. But it is -hard to tell for sure, as we still have not seen most of the documents -in the case and thus were unprepared and unable to contradict several -of the claims made in court by the opposition. We are considering an -appeal, but it is partly a question of funding, as it is costing us -quite a bit to pay for our lawyer. If you want to help, please -donate to the -NUUG defense fund.
- -The details of the case, as far as we know it, is available in -Norwegian from -the NUUG -blog. This also include -the -ruling itself.
+ +If you care about how fault tolerant your storage is, you might +find these articles and papers interesting. They have formed how I +think of when designing a storage system.
+ +-
+
+
- USENIX :login; Redundancy +Does Not Imply Fault Tolerance. Analysis of Distributed Storage +Reactions to Single Errors and Corruptions by Aishwarya Ganesan, +Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi +H. Arpaci-Dusseau + +
- ZDNet +Why +RAID 5 stops working in 2009 by Robin Harris + +
- ZDNet +Why +RAID 6 stops working in 2019 by Robin Harris + +
- USENIX FAST'07 +Failure +Trends in a Large Disk Drive Population by Eduardo Pinheiro, +Wolf-Dietrich Weber and Luiz AndreÌ Barroso + +
- USENIX ;login: Data +Integrity. Finding Truth in a World of Guesses and Lies by Doug +Hughes + +
- USENIX FAST'08 +An +Analysis of Data Corruption in the Storage Stack by +L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. +Arpaci-Dusseau, and R. H. Arpaci-Dusseau + +
- USENIX FAST'07 Disk +failures in the real world: what does an MTTF of 1,000,000 hours mean +to you? by B. Schroeder and G. A. Gibson. + +
- USENIX ;login: Are +Disks the Dominant Contributor for Storage Failures? A Comprehensive +Study of Storage Subsystem Failure Characteristics by Weihang +Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky + +
- SIGMETRICS 2007 +An +analysis of latent sector errors in disk drives by +L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler + +
Several of these research papers are based on data collected from +hundred thousands or millions of disk, and their findings are eye +opening. The short story is simply do not implicitly trust RAID or +redundant storage systems. Details matter. And unfortunately there +are few options on Linux addressing all the identified issues. Both +ZFS and Btrfs are doing a fairly good job, but have legal and +practical issues on their own. I wonder how cluster file systems like +Ceph do in this regard. After all, there is an old saying, you know +you have a distributed system when the crash of a computer you have +never heard of stops you from getting any work done. The same holds +true if fault tolerance do not work.
+ +Just remember, in the end, it do not matter how redundant, or how +fault tolerant your storage is, if you do not continuously monitor its +status to detect and replace failed disks.
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
On Wednesday, I spent the entire day in court in Follo Tingrett -representing the member association -NUUG, alongside the member -association EFN and the DNS registrar -IMC, challenging the seizure of the DNS name popcorn-time.no. It -was interesting to sit in a court of law for the first time in my -life. Our team can be seen in the picture above: attorney Ola -Tellesbø, EFN board member Tom Fredrik Blenning, IMC CEO Morten Emil -Eriksen and NUUG board member Petter Reinholdtsen.
- -The -case at hand is that the Norwegian National Authority for -Investigation and Prosecution of Economic and Environmental Crime (aka -Ãkokrim) decided on their own, to seize a DNS domain early last -year, without following -the -official policy of the Norwegian DNS authority which require a -court decision. The web site in question was a site covering Popcorn -Time. And Popcorn Time is the name of a technology with both legal -and illegal applications. Popcorn Time is a client combining -searching a Bittorrent directory available on the Internet with -downloading/distribute content via Bittorrent and playing the -downloaded content on screen. It can be used illegally if it is used -to distribute content against the will of the right holder, but it can -also be used legally to play a lot of content, for example the -millions of movies -available from the -Internet Archive or the collection -available from Vodo. We created -a -video demonstrating legally use of Popcorn Time and played it in -Court. It can of course be downloaded using Bittorrent.
- -I did not quite know what to expect from a day in court. The -government held on to their version of the story and we held on to -ours, and I hope the judge is able to make sense of it all. We will -know in two weeks time. Unfortunately I do not have high hopes, as -the Government have the upper hand here with more knowledge about the -case, better training in handling criminal law and in general higher -standing in the courts than fairly unknown DNS registrar and member -associations. It is expensive to be right also in Norway. So far the -case have cost more than NOK 70 000,-. To help fund the case, NUUG -and EFN have asked for donations, and managed to collect around NOK 25 -000,- so far. Given the presentation from the Government, I expect -the government to appeal if the case go our way. And if the case do -not go our way, I hope we have enough funding to appeal.
- -From the other side came two people from Ãkokrim. On the benches, -appearing to be part of the group from the government were two people -from the Simonsen Vogt Wiik lawyer office, and three others I am not -quite sure who was. Ãkokrim had proposed to present two witnesses -from The Motion Picture Association, but this was rejected because -they did not speak Norwegian and it was a bit late to bring in a -translator, but perhaps the two from MPA were present anyway. All -seven appeared to know each other. Good to see the case is take -seriously.
- -If you, like me, believe the courts should be involved before a DNS -domain is hijacked by the government, or you believe the Popcorn Time -technology have a lot of useful and legal applications, I suggest you -too donate to -the NUUG defense fund. Both Bitcoin and bank transfer are -available. If NUUG get more than we need for the legal action (very -unlikely), the rest will be spend promoting free software, open -standards and unix-like operating systems in Norway, so no matter what -happens the money will be put to good use.
- -If you want to lean more about the case, I recommend you check out -the blog -posts from NUUG covering the case. They cover the legal arguments -on both sides.
+ +I was surprised today to learn that a friend in academia did not +know there are easily available web services available for writing +LaTeX documents as a team. I thought it was common knowledge, but to +make sure at least my readers are aware of it, I would like to mention +these useful services for writing LaTeX documents. Some of them even +provide a WYSIWYG editor to ease writing even further.
+ +There are two commercial services available, +ShareLaTeX and +Overleaf. They are very easy to +use. Just start a new document, select which publisher to write for +(ie which LaTeX style to use), and start writing. Note, these two +have announced their intention to join forces, so soon it will only be +one joint service. I've used both for different documents, and they +work just fine. While +ShareLaTeX is free +software, while the latter is not. According to a +announcement from Overleaf, they plan to keep the ShareLaTeX code +base maintained as free software.
+ +But these two are not the only alternatives. +Fidus Writer is another free +software solution with the +source available on github. I have not used it myself. Several +others can be found on the nice +alterntiveTo +web service. + +If you like Google Docs or Etherpad, but would like to write +documents in LaTeX, you should check out these services. You can even +host your own, if you want to. :)
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
I dag fikk jeg en skikkelig gladmelding. Bakgrunnen er at før jul -arrangerte Nasjonalbiblioteket -et -seminar om sitt knakende gode tiltak «verksregister». Eneste -måten å melde seg på dette seminaret var å sende personopplysninger -til Google via Google Skjemaer. Dette syntes jeg var tvilsom praksis, -da det bør være mulig å delta på seminarer arrangert av det offentlige -uten å måtte dele sine interesser, posisjon og andre -personopplysninger med Google. Jeg ba derfor om innsyn via -Mimes brønn i -avtaler -og vurderinger Nasjonalbiblioteket hadde rundt dette. -Personopplysningsloven legger klare rammer for hva som må være på -plass før en kan be tredjeparter, spesielt i utlandet, behandle -personopplysninger på sine vegne, så det burde eksistere grundig -dokumentasjon før noe slikt kan bli lovlig. To jurister hos -Nasjonalbiblioteket mente først dette var helt i orden, og at Googles -standardavtale kunne brukes som databehandlingsavtale. Det syntes jeg -var merkelig, men har ikke hatt kapasitet til å følge opp saken før -for to dager siden.
- -Gladnyheten i dag, som kom etter at jeg tipset Nasjonalbiblioteket -om at Datatilsynet underkjente Googles standardavtaler som -databehandleravtaler i 2011, er at Nasjonalbiblioteket har bestemt seg -for å avslutte bruken av Googles Skjemaer/Apps og gå i dialog med DIFI -for å finne bedre måter å håndtere påmeldinger i tråd med -personopplysningsloven. Det er fantastisk å se at av og til hjelper -det å spørre hva i alle dager det offentlige holder på med.
+ +Recently, I needed to automatically check the copyright status of a +set of The Internet Movie database +(IMDB) entries, to figure out which one of the movies they refer +to can be freely distributed on the Internet. This proved to be +harder than it sounds. IMDB for sure list movies without any +copyright protection, where the copyright protection has expired or +where the movie is lisenced using a permissive license like one from +Creative Commons. These are mixed with copyright protected movies, +and there seem to be no way to separate these classes of movies using +the information in IMDB.
+ +First I tried to look up entries manually in IMDB, +Wikipedia and +The Internet Archive, to get a +feel how to do this. It is hard to know for sure using these sources, +but it should be possible to be reasonable confident a movie is "out +of copyright" with a few hours work per movie. As I needed to check +almost 20,000 entries, this approach was not sustainable. I simply +can not work around the clock for about 6 years to check this data +set.
+ +I asked the people behind The Internet Archive if they could +introduce a new metadata field in their metadata XML for IMDB ID, but +was told that they leave it completely to the uploaders to update the +metadata. Some of the metadata entries had IMDB links in the +description, but I found no way to download all metadata files in bulk +to locate those ones and put that approach aside.
+ +In the process I noticed several Wikipedia articles about movies +had links to both IMDB and The Internet Archive, and it occured to me +that I could use the Wikipedia RDF data set to locate entries with +both, to at least get a lower bound on the number of movies on The +Internet Archive with a IMDB ID. This is useful based on the +assumption that movies distributed by The Internet Archive can be +legally distributed on the Internet. With some help from the RDF +community (thank you DanC), I was able to come up with this query to +pass to the SPARQL interface on +Wikidata: + +
+SELECT ?work ?imdb ?ia ?when ?label +WHERE +{ + ?work wdt:P31/wdt:P279* wd:Q11424. + ?work wdt:P345 ?imdb. + ?work wdt:P724 ?ia. + OPTIONAL { + ?work wdt:P577 ?when. + ?work rdfs:label ?label. + FILTER(LANG(?label) = "en"). + } +} ++ +
If I understand the query right, for every film entry anywhere in +Wikpedia, it will return the IMDB ID and The Internet Archive ID, and +when the movie was released and its English title, if either or both +of the latter two are available. At the moment the result set contain +2338 entries. Of course, it depend on volunteers including both +correct IMDB and The Internet Archive IDs in the wikipedia articles +for the movie. It should be noted that the result will include +duplicates if the movie have entries in several languages. There are +some bogus entries, either because The Internet Archive ID contain a +typo or because the movie is not available from The Internet Archive. +I did not verify the IMDB IDs, as I am unsure how to do that +automatically.
+ +I wrote a small python script to extract the data set from Wikidata +and check if the XML metadata for the movie is available from The +Internet Archive, and after around 1.5 hour it produced a list of 2097 +free movies and their IMDB ID. In total, 171 entries in Wikidata lack +the refered Internet Archive entry. I assume the 70 "disappearing" +entries (ie 2338-2097-171) are duplicate entries.
+ +This is not too bad, given that The Internet Archive report to +contain 5331 +feature films at the moment, but it also mean more than 3000 +movies are missing on Wikipedia or are missing the pair of references +on Wikipedia.
+ +I was curious about the distribution by release year, and made a +little graph to show how the amount of free movies is spread over the +years:
+ +
I expect the relative distribution of the remaining 3000 movies to +be similar.
+ +If you want to help, and want to ensure Wikipedia can be used to +cross reference The Internet Archive and The Internet Movie Database, +please make sure entries like this are listed under the "External +links" heading on the Wikipedia article for the movie:
+ ++* {{Internet Archive film|id=FightingLady}} +* {{IMDb title|id=0036823|title=The Fighting Lady}} ++ +
Please verify the links on the final page, to make sure you did not +introduce a typo.
+ +Here is the complete list, if you want to correct the 171 +identified Wikipedia entries with broken links to The Internet +Archive: Q1140317, +Q458656, +Q458656, +Q470560, +Q743340, +Q822580, +Q480696, +Q128761, +Q1307059, +Q1335091, +Q1537166, +Q1438334, +Q1479751, +Q1497200, +Q1498122, +Q865973, +Q834269, +Q841781, +Q841781, +Q1548193, +Q499031, +Q1564769, +Q1585239, +Q1585569, +Q1624236, +Q4796595, +Q4853469, +Q4873046, +Q915016, +Q4660396, +Q4677708, +Q4738449, +Q4756096, +Q4766785, +Q880357, +Q882066, +Q882066, +Q204191, +Q204191, +Q1194170, +Q940014, +Q946863, +Q172837, +Q573077, +Q1219005, +Q1219599, +Q1643798, +Q1656352, +Q1659549, +Q1660007, +Q1698154, +Q1737980, +Q1877284, +Q1199354, +Q1199354, +Q1199451, +Q1211871, +Q1212179, +Q1238382, +Q4906454, +Q320219, +Q1148649, +Q645094, +Q5050350, +Q5166548, +Q2677926, +Q2698139, +Q2707305, +Q2740725, +Q2024780, +Q2117418, +Q2138984, +Q1127992, +Q1058087, +Q1070484, +Q1080080, +Q1090813, +Q1251918, +Q1254110, +Q1257070, +Q1257079, +Q1197410, +Q1198423, +Q706951, +Q723239, +Q2079261, +Q1171364, +Q617858, +Q5166611, +Q5166611, +Q324513, +Q374172, +Q7533269, +Q970386, +Q976849, +Q7458614, +Q5347416, +Q5460005, +Q5463392, +Q3038555, +Q5288458, +Q2346516, +Q5183645, +Q5185497, +Q5216127, +Q5223127, +Q5261159, +Q1300759, +Q5521241, +Q7733434, +Q7736264, +Q7737032, +Q7882671, +Q7719427, +Q7719444, +Q7722575, +Q2629763, +Q2640346, +Q2649671, +Q7703851, +Q7747041, +Q6544949, +Q6672759, +Q2445896, +Q12124891, +Q3127044, +Q2511262, +Q2517672, +Q2543165, +Q426628, +Q426628, +Q12126890, +Q13359969, +Q13359969, +Q2294295, +Q2294295, +Q2559509, +Q2559912, +Q7760469, +Q6703974, +Q4744, +Q7766962, +Q7768516, +Q7769205, +Q7769988, +Q2946945, +Q3212086, +Q3212086, +Q18218448, +Q18218448, +Q18218448, +Q6909175, +Q7405709, +Q7416149, +Q7239952, +Q7317332, +Q7783674, +Q7783704, +Q7857590, +Q3372526, +Q3372642, +Q3372816, +Q3372909, +Q7959649, +Q7977485, +Q7992684, +Q3817966, +Q3821852, +Q3420907, +Q3429733, +Q774474
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
Jeg leste med interesse en nyhetssak hos -digi.no -og -NRK -om at det ikke bare er meg, men at også NAV bedriver geolokalisering -av IP-adresser, og at det gjøres analyse av IP-adressene til de som -sendes inn meldekort for å se om meldekortet sendes inn fra -utenlandske IP-adresser. Politiadvokat i Drammen, Hans Lyder Haare, -er sitert i NRK på at «De to er jo blant annet avslørt av -IP-adresser. At man ser at meldekortet kommer fra utlandet.»
- -Jeg synes det er fint at det blir bedre kjent at IP-adresser -knyttes til enkeltpersoner og at innsamlet informasjon brukes til å -stedsbestemme personer også av aktører her i Norge. Jeg ser det som -nok et argument for å bruke -Tor så mye som mulig for å -gjøre gjøre IP-lokalisering vanskeligere, slik at en kan beskytte sin -privatsfære og unngå å dele sin fysiske plassering med -uvedkommede.
- -Men det er en ting som bekymrer meg rundt denne nyheten. Jeg ble -tipset (takk #nuug) om -NAVs -personvernerklæring, som under punktet «Personvern og statistikk» -lyder:
+ +I find it fascinating how many of the people being locked inside +the proposed border wall between USA and Mexico support the idea. The +proposal to keep Mexicans out reminds me of +the +propaganda twist from the East Germany government calling the wall +the âAntifascist Bulwarkâ after erecting the Berlin Wall, claiming +that the wall was erected to keep enemies from creeping into East +Germany, while it was obvious to the people locked inside it that it +was erected to keep the people from escaping.
+ +Do the people in USA supporting this wall really believe it is a +one way wall, only keeping people on the outside from getting in, +while not keeping people in the inside from getting out?
-- -- -«Når du besøker nav.no, etterlater du deg elektroniske spor. Sporene -dannes fordi din nettleser automatisk sender en rekke opplysninger til -NAVs tjener (server-maskin) hver gang du ber om å få vist en side. Det -er eksempelvis opplysninger om hvilken nettleser og -versjon du -bruker, og din internettadresse (ip-adresse). For hver side som vises, -lagres følgende opplysninger:
- --
- -- hvilken side du ser på
-- dato og tid
-- hvilken nettleser du bruker
-- din ip-adresse
-Ingen av opplysningene vil bli brukt til å identifisere -enkeltpersoner. NAV bruker disse opplysningene til å generere en -samlet statistikk som blant annet viser hvilke sider som er mest -populære. Statistikken er et redskap til å forbedre våre -tjenester.»
- -
Jeg klarer ikke helt å se hvordan analyse av de besøkendes -IP-adresser for å se hvem som sender inn meldekort via web fra en -IP-adresse i utlandet kan gjøres uten å komme i strid med påstanden om -at «ingen av opplysningene vil bli brukt til å identifisere -enkeltpersoner». Det virker dermed for meg som at NAV bryter sine -egen personvernerklæring, hvilket -Datatilsynet -fortalte meg i starten av desember antagelig er brudd på -personopplysningsloven. - -
I tillegg er personvernerklæringen ganske misvisende i og med at -NAVs nettsider ikke bare forsyner NAV med personopplysninger, men i -tillegg ber brukernes nettleser kontakte fem andre nettjenere -(script.hotjar.com, static.hotjar.com, vars.hotjar.com, -www.google-analytics.com og www.googletagmanager.com), slik at -personopplysninger blir gjort tilgjengelig for selskapene Hotjar og -Google , og alle som kan lytte på trafikken på veien (som FRA, GCHQ og -NSA). Jeg klarer heller ikke se hvordan slikt spredning av -personopplysninger kan være i tråd med kravene i -personopplysningloven, eller i tråd med NAVs personvernerklæring.
- -Kanskje NAV bør ta en nøye titt på sin personvernerklæring? Eller -kanskje Datatilsynet bør gjøre det?
+As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
Did you ever wonder where the web trafic really flow to reach the -web servers, and who own the network equipment it is flowing through? -It is possible to get a glimpse of this from using traceroute, but it -is hard to find all the details. Many years ago, I wrote a system to -map the Norwegian Internet (trying to figure out if our plans for a -network game service would get low enough latency, and who we needed -to talk to about setting up game servers close to the users. Back -then I used traceroute output from many locations (I asked my friends -to run a script and send me their traceroute output) to create the -graph and the map. The output from traceroute typically look like -this: - -
-traceroute to www.stortinget.no (85.88.67.10), 30 hops max, 60 byte packets - 1 uio-gw10.uio.no (129.240.202.1) 0.447 ms 0.486 ms 0.621 ms - 2 uio-gw8.uio.no (129.240.24.229) 0.467 ms 0.578 ms 0.675 ms - 3 oslo-gw1.uninett.no (128.39.65.17) 0.385 ms 0.373 ms 0.358 ms - 4 te3-1-2.br1.fn3.as2116.net (193.156.90.3) 1.174 ms 1.172 ms 1.153 ms - 5 he16-1-1.cr1.san110.as2116.net (195.0.244.234) 2.627 ms he16-1-1.cr2.oslosda310.as2116.net (195.0.244.48) 3.172 ms he16-1-1.cr1.san110.as2116.net (195.0.244.234) 2.857 ms - 6 ae1.ar8.oslosda310.as2116.net (195.0.242.39) 0.662 ms 0.637 ms ae0.ar8.oslosda310.as2116.net (195.0.242.23) 0.622 ms - 7 89.191.10.146 (89.191.10.146) 0.931 ms 0.917 ms 0.955 ms - 8 * * * - 9 * * * -[...] -- -
This show the DNS names and IP addresses of (at least some of the) -network equipment involved in getting the data traffic from me to the -www.stortinget.no server, and how long it took in milliseconds for a -package to reach the equipment and return to me. Three packages are -sent, and some times the packages do not follow the same path. This -is shown for hop 5, where three different IP addresses replied to the -traceroute request.
- -There are many ways to measure trace routes. Other good traceroute -implementations I use are traceroute (using ICMP packages) mtr (can do -both ICMP, UDP and TCP) and scapy (python library with ICMP, UDP, TCP -traceroute and a lot of other capabilities). All of them are easily -available in Debian.
- -This time around, I wanted to know the geographic location of -different route points, to visualize how visiting a web page spread -information about the visit to a lot of servers around the globe. The -background is that a web site today often will ask the browser to get -from many servers the parts (for example HTML, JSON, fonts, -JavaScript, CSS, video) required to display the content. This will -leak information about the visit to those controlling these servers -and anyone able to peek at the data traffic passing by (like your ISP, -the ISPs backbone provider, FRA, GCHQ, NSA and others).
- -Lets pick an example, the Norwegian parliament web site -www.stortinget.no. It is read daily by all members of parliament and -their staff, as well as political journalists, activits and many other -citizens of Norway. A visit to the www.stortinget.no web site will -ask your browser to contact 8 other servers: ajax.googleapis.com, -insights.hotjar.com, script.hotjar.com, static.hotjar.com, -stats.g.doubleclick.net, www.google-analytics.com, -www.googletagmanager.com and www.netigate.se. I extracted this by -asking PhantomJS to visit the -Stortinget web page and tell me all the URLs PhantomJS downloaded to -render the page (in HAR format using -their -netsniff example. I am very grateful to Gorm for showing me how -to do this). My goal is to visualize network traces to all IP -addresses behind these DNS names, do show where visitors personal -information is spread when visiting the page.
- - - -When I had a look around for options, I could not find any good -free software tools to do this, and decided I needed my own traceroute -wrapper outputting KML based on locations looked up using GeoIP. KML -is easy to work with and easy to generate, and understood by several -of the GIS tools I have available. I got good help from by NUUG -colleague Anders Einar with this, and the result can be seen in -my -kmltraceroute git repository. Unfortunately, the quality of the -free GeoIP databases I could find (and the for-pay databases my -friends had access to) is not up to the task. The IP addresses of -central Internet infrastructure would typically be placed near the -controlling companies main office, and not where the router is really -located, as you can see from the -KML file I created using the GeoLite City dataset from MaxMind. - -
- -I also had a look at the visual traceroute graph created by -the scrapy project, -showing IP network ownership (aka AS owner) for the IP address in -question. -The -graph display a lot of useful information about the traceroute in SVG -format, and give a good indication on who control the network -equipment involved, but it do not include geolocation. This graph -make it possible to see the information is made available at least for -UNINETT, Catchcom, Stortinget, Nordunet, Google, Amazon, Telia, Level -3 Communications and NetDNA.
- - - -In the process, I came across the -web service GeoTraceroute by -Salim Gasmi. Its methology of combining guesses based on DNS names, -various location databases and finally use latecy times to rule out -candidate locations seemed to do a very good job of guessing correct -geolocation. But it could only do one trace at the time, did not have -a sensor in Norway and did not make the geolocations easily available -for postprocessing. So I contacted the developer and asked if he -would be willing to share the code (he refused until he had time to -clean it up), but he was interested in providing the geolocations in a -machine readable format, and willing to set up a sensor in Norway. So -since yesterday, it is possible to run traces from Norway in this -service thanks to a sensor node set up by -the NUUG assosiation, and get the -trace in KML format for further processing.
- - - -Here we can see a lot of trafic passes Sweden on its way to -Denmark, Germany, Holland and Ireland. Plenty of places where the -Snowden confirmations verified the traffic is read by various actors -without your best interest as their top priority.
- -Combining KML files is trivial using a text editor, so I could loop -over all the hosts behind the urls imported by www.stortinget.no and -ask for the KML file from GeoTraceroute, and create a combined KML -file with all the traces (unfortunately only one of the IP addresses -behind the DNS name is traced this time. To get them all, one would -have to request traces using IP number instead of DNS names from -GeoTraceroute). That might be the next step in this project.
- -Armed with these tools, I find it a lot easier to figure out where -the IP traffic moves and who control the boxes involved in moving it. -And every time the link crosses for example the Swedish border, we can -be sure Swedish Signal Intelligence (FRA) is listening, as GCHQ do in -Britain and NSA in USA and cables around the globe. (Hm, what should -we tell them? :) Keep that in mind if you ever send anything -unencrypted over the Internet.
- -PS: KML files are drawn using -the KML viewer from Ivan -Rublev, as it was less cluttered than the local Linux application -Marble. There are heaps of other options too.
+ +At my nearby maker space, +Sonen, I heard the story that it +was easier to generate gcode files for theyr 3D printers (Ultimake 2+) +on Windows and MacOS X than Linux, because the software involved had +to be manually compiled and set up on Linux while premade packages +worked out of the box on Windows and MacOS X. I found this annoying, +as the software involved, +Cura, is free software +and should be trivial to get up and running on Linux if someone took +the time to package it for the relevant distributions. I even found +a request for adding into +Debian from 2013, which had seem some activity over the years but +never resulted in the software showing up in Debian. So a few days +ago I offered my help to try to improve the situation.
+ +Now I am very happy to see that all the packages required by a +working Cura in Debian are uploaded into Debian and waiting in the NEW +queue for the ftpmasters to have a look. You can track the progress +on +the +status page for the 3D printer team.
+ +The uploaded packages are a bit behind upstream, and was uploaded +now to get slots in the NEW +queue while we work up updating the packages to the latest +upstream version.
+ +On a related note, two competitors for Cura, which I found harder +to use and was unable to configure correctly for Ultimaker 2+ in the +short time I spent on it, are already in Debian. If you are looking +for 3D printer "slicers" and want something already available in +Debian, check out +slic3r and +slic3r-prusa. +The latter is a fork of the former.
As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address -15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
+15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.Do you have a large iCalendar -file with lots of old entries, and would like to archive them to save -space and resources? At least those of us using KOrganizer know that -turning on and off an event set become slower and slower the more -entries are in the set. While working on migrating our calendars to a -Radicale CalDAV server on our -Freedombox server, my -loved one wondered if I could find a way to split up the calendar file -she had in KOrganizer, and I set out to write a tool. I spent a few -days writing and polishing the system, and it is now ready for general -consumption. The -code for -ical-archiver is publicly available from a git repository on -github. The system is written in Python and depend on -the vobject Python -module.
- -To use it, locate the iCalendar file you want to operate on and -give it as an argument to the ical-archiver script. This will -generate a set of new files, one file per component type per year for -all components expiring more than two years in the past. The vevent, -vtodo and vjournal entries are handled by the script. The remaining -entries are stored in a 'remaining' file.
- -This is what a test run can look like: - -
-% ical-archiver t/2004-2016.ics -Found 3612 vevents -Found 6 vtodos -Found 2 vjournals -Writing t/2004-2016.ics-subset-vevent-2004.ics -Writing t/2004-2016.ics-subset-vevent-2005.ics -Writing t/2004-2016.ics-subset-vevent-2006.ics -Writing t/2004-2016.ics-subset-vevent-2007.ics -Writing t/2004-2016.ics-subset-vevent-2008.ics -Writing t/2004-2016.ics-subset-vevent-2009.ics -Writing t/2004-2016.ics-subset-vevent-2010.ics -Writing t/2004-2016.ics-subset-vevent-2011.ics -Writing t/2004-2016.ics-subset-vevent-2012.ics -Writing t/2004-2016.ics-subset-vevent-2013.ics -Writing t/2004-2016.ics-subset-vevent-2014.ics -Writing t/2004-2016.ics-subset-vjournal-2007.ics -Writing t/2004-2016.ics-subset-vjournal-2011.ics -Writing t/2004-2016.ics-subset-vtodo-2012.ics -Writing t/2004-2016.ics-remaining.ics -% -- -
As you can see, the original file is untouched and new files are -written with names derived from the original file. If you are happy -with their content, the *-remaining.ics file can replace the original -the the others can be archived or imported as historical calendar -collections.
- -The script should probably be improved a bit. The error handling -when discovering broken entries is not good, and I am not sure yet if -it make sense to split different entry types into separate files or -not. The program is thus likely to change. If you find it -interesting, please get in touch. :)
- -As usual, if you use Bitcoin and want to show your support of my -activities, please send Bitcoin donations to my address -15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
+ +Som vanlig, hvis du bruker Bitcoin og ønsker å vise din støtte til +det jeg driver med, setter jeg pris på om du sender Bitcoin-donasjoner +til min adresse +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
Tags
-
-
- 3d-printer (13) +
- 3d-printer (14)
- amiga (1) @@ -983,27 +1207,27 @@ activities, please send Bitcoin donations to my address
- chrpath (2) -
- debian (147) +
- debian (154)
- debian edu (158) -
- debian-handbook (3) +
- debian-handbook (4)
- digistan (10) -
- dld (16) +
- dld (17) -
- docbook (23) +
- docbook (24)
- drivstoffpriser (4) -
- english (344) +
- english (362)
- fiksgatami (23)
- fildeling (12) -
- freeculture (29) +
- freeculture (31)
- freedombox (9) @@ -1019,6 +1243,8 @@ activities, please send Bitcoin donations to my address
- ldap (9) +
- lego (4) +
- lenker (8)
- lsdvd (2) @@ -1031,19 +1257,19 @@ activities, please send Bitcoin donations to my address
- nice free software (9) -
- norsk (287) +
- norsk (293) -
- nuug (187) +
- nuug (189) -
- offentlig innsyn (28) +
- offentlig innsyn (33)
- open311 (2) -
- opphavsrett (64) +
- opphavsrett (69) -
- personvern (99) +
- personvern (104) -
- raid (1) +
- raid (2)
- reactos (1) @@ -1059,27 +1285,29 @@ activities, please send Bitcoin donations to my address
- scraperwiki (2) -
- sikkerhet (52) +
- sikkerhet (53)
- sitesummary (4)
- skepsis (5) -
- standard (51) +
- standard (55) -
- stavekontroll (5) +
- stavekontroll (6) -
- stortinget (11) +
- stortinget (12) -
- surveillance (48) +
- surveillance (52) -
- sysadmin (2) +
- sysadmin (4)
- usenix (2) -
- valg (8) +
- valg (9) + +
- verkidetfri (8) -
- video (59) +
- video (60)
- vitenskap (4)