Some notes on fault tolerant storage systems

1st November 2017

If you care about how fault tolerant your storage is, you might -find these articles and papers interesting. They have formed how I -think of when designing a storage system.

- -

USENIX :login; Redundancy -Does Not Imply Fault Tolerance. Analysis of Distributed Storage -Reactions to Single Errors and Corruptions by Aishwarya Ganesan, -Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi -H. Arpaci-Dusseau
ZDNet -Why -RAID 5 stops working in 2009 by Robin Harris
ZDNet -Why -RAID 6 stops working in 2019 by Robin Harris
USENIX FAST'07 -Failure -Trends in a Large Disk Drive Population by Eduardo Pinheiro, -Wolf-Dietrich Weber and Luiz AndreÌ Barroso
USENIX ;login: Data -Integrity. Finding Truth in a World of Guesses and Lies by Doug -Hughes
USENIX FAST'08 -An -cAnalysis of Data Corruption in the Storage Stack by -L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. -Arpaci-Dusseau, and R. H. Arpaci-Dusseau
USENIX FAST'07 Disk -failures in the real world: what does an MTTF of 1,000,000 hours mean -to you? by B. Schroeder and G. A. Gibson.
USENIX ;login: Are -Disks the Dominant Contributor for Storage Failures? A Comprehensive -Study of Storage Subsystem Failure Characteristics by Weihang -Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky
SIGMETRICS 2007 -An -analysis of latent sector errors in disk drives by -L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler

- -

Several of these research papers are based on data collected from -hundred thousands or millions of disk, and their findings are eye -opening. The short story is simply do not implicitly trust RAID or -redundant storage systems. Details matter. And unfortunately there -are few options on Linux addressing all the identified issues. Both -ZFS and Btrfs are doing a fairly good job, but have legal and -practical issues on their own. I wonder how cluster file systems like -Ceph do in this regard. After, all the old saying, you know you have -a distributed system when the crash of a compyter you have never heard -of stops you from getting any work done. The same holds true if fault -tolerance do not work.

- -

Just remember, in the end, it do not matter how redundant, or how -fault tolerant your storage is, if you do not continuously monitor its -status to detect and replace failed disks.

OvervÃ¥kning i Kina vs. Norge

12th February 2018

Jeg lar meg fascinere av en artikkel +i +Dagbladet om Kinas hÃ¥ndtering av Xinjiang, spesielt fÃ¸lgende +utsnitt:

+ +

+ +
Â«I den sÃ¸rvestlige byen Kashgar nÃ¦rmere grensa til +Sentral-Asia meldes det nÃ¥ at 120.000 uigurer er internert i sÃ¥kalte +omskoleringsleirer. Samtidig er det innfÃ¸rt et omfattende +helsesjekk-program med innsamling og lagring av DNA-prÃ¸ver fra +absolutt alle innbyggerne. De mest avanserte overvÃ¥kingsmetodene +testes ut her. Programmer for Ã¥ gjenkjenne ansikter og stemmer er pÃ¥ +plass i regionen. Der har de lokale myndighetene begynt Ã¥ installere +GPS-systemer i alle kjÃ¸retÃ¸y og egne sporingsapper i +mobiltelefoner.
+ +
Politimetodene griper sÃ¥ dypt inn i folks dagligliv at motstanden +mot Beijing-regimet Ã¸ker.Â»
+ +

+ +

Beskrivelsen avviker jo desverre ikke sÃ¥ veldig mye fra tilstanden +her i Norge.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Dataregistrering	Kina	Norge
Innsamling og lagring av DNA-prÃ¸ver fra befolkningen	Ja	Delvis, planlagt for alle nyfÃ¸dte.
Ansiktsgjenkjenning	Ja	Ja
Stemmegjenkjenning	Ja	Nei
Posisjons-sporing av mobiltelefoner	Ja	Ja
Posisjons-sporing av biler	Ja	Ja

+ +

I Norge har jo situasjonen rundt Folkehelseinstituttets lagring av +DNA-informasjon pÃ¥ vegne av politiet, der de nektet Ã¥ slette +informasjon politiet ikke hadde lov til Ã¥ ta vare pÃ¥, gjort det klart +at DNA tar vare pÃ¥ ganske lenge. I tillegg finnes det utallige +biobanker som lagres til evig tid, og det er planer om Ã¥ innfÃ¸re +evig +lagring av DNA-materiale fra alle spebarn som fÃ¸des (med mulighet +for Ã¥ be om sletting).

+ +

I Norge er det system pÃ¥ plass for ansiktsgjenkjenning, som +en +NRK-artikkel fra 2015 forteller er aktiv pÃ¥ Gardermoen, samt +brukes +til Ã¥ analysere bilder innsamlet av myndighetene. Brukes det ogsÃ¥ +flere plasser? Det er tett med overvÃ¥kningskamera kontrollert av +politi og andre myndigheter i for eksempel Oslo sentrum.

+ +

Jeg er ikke kjent med at Norge har noe system for identifisering av +personer ved hjelp av stemmegjenkjenning.

+ +

Posisjons-sporing av mobiltelefoner er ruinemessig tilgjengelig for +blant annet politi, NAV og Finanstilsynet, i trÃ¥d med krav i +telefonselskapenes konsesjon. I tillegg rapporterer smarttelefoner +sin posisjon til utviklerne av utallige mobil-apper, der myndigheter +og andre kan hente ut informasjon ved behov. Det er intet behov for +noen egen app for dette.

+ +

Posisjons-sporing av biler er rutinemessig tilgjengelig via et tett +nett av mÃ¥lepunkter pÃ¥ veiene (automatiske bomstasjoner, +kÃ¸fribrikke-registrering, automatiske fartsmÃ¥lere og andre veikamera). +Det er i tillegg vedtatt at alle nye biler skal selges med utstyr for +GPS-sporing (eCall).

+ +

Det er jammen godt vi lever i et liberalt demokrati, og ikke en +overvÃ¥kningsstat, eller?

- Tags: english, raid, sysadmin. + Tags: norsk, surveillance.

@@ -104,41 +132,22 @@ status to detect and replace failed disks.

Web services for writing academic LaTeX papers as a team

31st October 2017

I was surprised today to learn that a friend in academia did not -know there are easily available web services available for writing -LaTeX documents as a team. I thought it was common knowledge, but to -make sure at least my readers are aware of it, I would like to mention -these useful services for writing LaTeX documents. Some of them even -provide a WYSIWYG editor to ease writing even further.

- -

There are two commercial services available, -ShareLaTeX and -Overleaf. They are very easy to -use. Just start a new document, select which publisher to write for -(ie which LaTeX style to use), and start writing. Note, these two -have announced their intention to join forces, so soon it will only be -one joint service. I've used both for different documents, and they -work just fine. While -ShareLaTeX is free -software, while the latter is not. According to a -announcement from Overleaf, they plan to keep the ShareLaTeX code -base maintained as free software.

- -But these two are not the only alternatives. -Fidus Writer is another free -software solution with the -source available on github. I have not used it myself. Several -others can be found on the nice -alterntiveTo -web service. - -

If you like Google Docs or Etherpad, but would like to write -documents in LaTeX, you should check out these services. You can even -host your own, if you want to. :)

- +

How hard can Ã¦, Ã¸ and Ã¥ be?

11th February 2018

+ +

We write 2018, and it is 30 years since Unicode was introduced. +Most of us in Norway have come to expect the use of our alphabet to +just work with any computer system. But it is apparently beyond reach +of the computers printing recites at a restaurant. Recently I visited +a Peppes pizza resturant, and noticed a few details on the recite. +Notice how 'Ã¸' and 'Ã¥' are replaced with strange symbols in +'ServitÃ¸r', 'Ã BETALE', 'BelÃ¸p pr. gjest', 'Takk for besÃ¸ket.' and 'Vi +gleder oss til Ã¥ se deg igjen'.

+ +

I would say that this state is passed sad and over in embarrassing.

+ +

I removed personal and private information to be nice.

@@ -151,288 +160,81 @@ host your own, if you want to. :)

Locating IMDB IDs of movies in the Internet Archive using Wikidata

25th October 2017

Recently, I needed to automatically check the copyright status of a -set of The Internet Movie database -(IMDB) entries, to figure out which one of the movies they refer -to can be freely distributed on the Internet. This proved to be -harder than it sounds. IMDB for sure list movies without any -copyright protection, where the copyright protection has expired or -where the movie is lisenced using a permissive license like one from -Creative Commons. These are mixed with copyright protected movies, -and there seem to be no way to separate these classes of movies using -the information in IMDB.

- -

First I tried to look up entries manually in IMDB, -Wikipedia and -The Internet Archive, to get a -feel how to do this. It is hard to know for sure using these sources, -but it should be possible to be reasonable confident a movie is "out -of copyright" with a few hours work per movie. As I needed to check -almost 20,000 entries, this approach was not sustainable. I simply -can not work around the clock for about 6 years to check this data -set.

- -

I asked the people behind The Internet Archive if they could -introduce a new metadata field in their metadata XML for IMDB ID, but -was told that they leave it completely to the uploaders to update the -metadata. Some of the metadata entries had IMDB links in the -description, but I found no way to download all metadata files in bulk -to locate those ones and put that approach aside.

- -

In the process I noticed several Wikipedia articles about movies -had links to both IMDB and The Internet Archive, and it occured to me -that I could use the Wikipedia RDF data set to locate entries with -both, to at least get a lower bound on the number of movies on The -Internet Archive with a IMDB ID. This is useful based on the -assumption that movies distributed by The Internet Archive can be -legally distributed on the Internet. With some help from the RDF -community (thank you DanC), I was able to come up with this query to -pass to the SPARQL interface on -Wikidata: +

Legal to share more than 11,000 movies listed on IMDB?

7th January 2018

I've continued to track down list of movies that are legal to +distribute on the Internet, and identified more than 11,000 title IDs +in The Internet Movie Database (IMDB) so far. Most of them (57%) are +feature films from USA published before 1923. I've also tracked down +more than 24,000 movies I have not yet been able to map to IMDB title +ID, so the real number could be a lot higher. According to the front +web page for Retro Film +Vault, there are 44,000 public domain films, so I guess there are +still some left to identify.

+ +

The complete data set is available from +a +public git repository, including the scripts used to create it. +Most of the data is collected using web scraping, for example from the +"product catalog" of companies selling copies of public domain movies, +but any source I find believable is used. I've so far had to throw +out three sources because I did not trust the public domain status of +the movies listed.

+ +

Anyway, this is the summary of the 28 collected data sources so +far:

-SELECT ?work ?imdb ?ia ?when ?label
-WHERE
-{
-  ?work wdt:P31/wdt:P279* wd:Q11424.
-  ?work wdt:P345 ?imdb.
-  ?work wdt:P724 ?ia.
-  OPTIONAL {
-        ?work wdt:P577 ?when.
-        ?work rdfs:label ?label.
-        FILTER(LANG(?label) = "en").
-  }
-}
+ 2352 entries (   66 unique) with and 15983 without IMDB title ID in free-movies-archive-org-search.json
+ 2302 entries (  120 unique) with and     0 without IMDB title ID in free-movies-archive-org-wikidata.json
+  195 entries (   63 unique) with and   200 without IMDB title ID in free-movies-cinemovies.json
+   89 entries (   52 unique) with and    38 without IMDB title ID in free-movies-creative-commons.json
+  344 entries (   28 unique) with and   655 without IMDB title ID in free-movies-fesfilm.json
+  668 entries (  209 unique) with and  1064 without IMDB title ID in free-movies-filmchest-com.json
+  830 entries (   21 unique) with and     0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json
+   19 entries (   19 unique) with and     0 without IMDB title ID in free-movies-imdb-c-expired-gb.json
+ 6822 entries ( 6669 unique) with and     0 without IMDB title ID in free-movies-imdb-c-expired-us.json
+  137 entries (    0 unique) with and     0 without IMDB title ID in free-movies-imdb-externlist.json
+ 1205 entries (   57 unique) with and     0 without IMDB title ID in free-movies-imdb-pd.json
+   84 entries (   20 unique) with and   167 without IMDB title ID in free-movies-infodigi-pd.json
+  158 entries (  135 unique) with and     0 without IMDB title ID in free-movies-letterboxd-looney-tunes.json
+  113 entries (    4 unique) with and     0 without IMDB title ID in free-movies-letterboxd-pd.json
+  182 entries (  100 unique) with and     0 without IMDB title ID in free-movies-letterboxd-silent.json
+  229 entries (   87 unique) with and     1 without IMDB title ID in free-movies-manual.json
+   44 entries (    2 unique) with and    64 without IMDB title ID in free-movies-openflix.json
+  291 entries (   33 unique) with and   474 without IMDB title ID in free-movies-profilms-pd.json
+  211 entries (    7 unique) with and     0 without IMDB title ID in free-movies-publicdomainmovies-info.json
+ 1232 entries (   57 unique) with and  1875 without IMDB title ID in free-movies-publicdomainmovies-net.json
+   46 entries (   13 unique) with and    81 without IMDB title ID in free-movies-publicdomainreview.json
+  698 entries (   64 unique) with and   118 without IMDB title ID in free-movies-publicdomaintorrents.json
+ 1758 entries (  882 unique) with and  3786 without IMDB title ID in free-movies-retrofilmvault.json
+   16 entries (    0 unique) with and     0 without IMDB title ID in free-movies-thehillproductions.json
+   63 entries (   16 unique) with and   141 without IMDB title ID in free-movies-vodo.json
+11583 unique IMDB title IDs in total, 8724 only in one list, 24647 without IMDB title ID

If I understand the query right, for every film entry anywhere in -Wikpedia, it will return the IMDB ID and The Internet Archive ID, and -when the movie was released and its English title, if either or both -of the latter two are available. At the moment the result set contain -2338 entries. Of course, it depend on volunteers including both -correct IMDB and The Internet Archive IDs in the wikipedia articles -for the movie. It should be noted that the result will include -duplicates if the movie have entries in several languages. There are -some bogus entries, either because The Internet Archive ID contain a -typo or because the movie is not available from The Internet Archive. -I did not verify the IMDB IDs, as I am unsure how to do that -automatically.

- -

I wrote a small python script to extract the data set from Wikidata -and check if the XML metadata for the movie is available from The -Internet Archive, and after around 1.5 hour it produced a list of 2097 -free movies and their IMDB ID. In total, 171 entries in Wikidata lack -the refered Internet Archive entry. I assume the 70 "disappearing" -entries (ie 2338-2097-171) are duplicate entries.

- -

This is not too bad, given that The Internet Archive report to -contain 5331 -feature films at the moment, but it also mean more than 3000 -movies are missing on Wikipedia or are missing the pair of references -on Wikipedia.

- -

I was curious about the distribution by release year, and made a -little graph to show how the amount of free movies is spread over the -years:

- -

I expect the relative distribution of the remaining 3000 movies to -be similar.

- -

If you want to help, and want to ensure Wikipedia can be used to -cross reference The Internet Archive and The Internet Movie Database, -please make sure entries like this are listed under the "External -links" heading on the Wikipedia article for the movie:

- -

-* {{Internet Archive film|id=FightingLady}}
-* {{IMDb title|id=0036823|title=The Fighting Lady}}
-

- -

Please verify the links on the final page, to make sure you did not -introduce a typo.

- -

Here is the complete list, if you want to correct the 171 -identified Wikipedia entries with broken links to The Internet -Archive: Q1140317, -Q458656, -Q458656, -Q470560, -Q743340, -Q822580, -Q480696, -Q128761, -Q1307059, -Q1335091, -Q1537166, -Q1438334, -Q1479751, -Q1497200, -Q1498122, -Q865973, -Q834269, -Q841781, -Q841781, -Q1548193, -Q499031, -Q1564769, -Q1585239, -Q1585569, -Q1624236, -Q4796595, -Q4853469, -Q4873046, -Q915016, -Q4660396, -Q4677708, -Q4738449, -Q4756096, -Q4766785, -Q880357, -Q882066, -Q882066, -Q204191, -Q204191, -Q1194170, -Q940014, -Q946863, -Q172837, -Q573077, -Q1219005, -Q1219599, -Q1643798, -Q1656352, -Q1659549, -Q1660007, -Q1698154, -Q1737980, -Q1877284, -Q1199354, -Q1199354, -Q1199451, -Q1211871, -Q1212179, -Q1238382, -Q4906454, -Q320219, -Q1148649, -Q645094, -Q5050350, -Q5166548, -Q2677926, -Q2698139, -Q2707305, -Q2740725, -Q2024780, -Q2117418, -Q2138984, -Q1127992, -Q1058087, -Q1070484, -Q1080080, -Q1090813, -Q1251918, -Q1254110, -Q1257070, -Q1257079, -Q1197410, -Q1198423, -Q706951, -Q723239, -Q2079261, -Q1171364, -Q617858, -Q5166611, -Q5166611, -Q324513, -Q374172, -Q7533269, -Q970386, -Q976849, -Q7458614, -Q5347416, -Q5460005, -Q5463392, -Q3038555, -Q5288458, -Q2346516, -Q5183645, -Q5185497, -Q5216127, -Q5223127, -Q5261159, -Q1300759, -Q5521241, -Q7733434, -Q7736264, -Q7737032, -Q7882671, -Q7719427, -Q7719444, -Q7722575, -Q2629763, -Q2640346, -Q2649671, -Q7703851, -Q7747041, -Q6544949, -Q6672759, -Q2445896, -Q12124891, -Q3127044, -Q2511262, -Q2517672, -Q2543165, -Q426628, -Q426628, -Q12126890, -Q13359969, -Q13359969, -Q2294295, -Q2294295, -Q2559509, -Q2559912, -Q7760469, -Q6703974, -Q4744, -Q7766962, -Q7768516, -Q7769205, -Q7769988, -Q2946945, -Q3212086, -Q3212086, -Q18218448, -Q18218448, -Q18218448, -Q6909175, -Q7405709, -Q7416149, -Q7239952, -Q7317332, -Q7783674, -Q7783704, -Q7857590, -Q3372526, -Q3372642, -Q3372816, -Q3372909, -Q7959649, -Q7977485, -Q7992684, -Q3817966, -Q3821852, -Q3420907, -Q3429733, -Q774474

I keep finding more data sources. I found the cinemovies source +just a few days ago, and as you can see from the summary, it extended +my list with 63 movies. Check out the mklist-* scripts in the git +repository if you are curious how the lists are created. Many of the +titles are extracted using searches on IMDB, where I look for the +title and year, and accept search results with only one movie listed +if the year matches. This allow me to automatically use many lists of +movies without IMDB title ID references at the cost of increasing the +risk of wrongly identify a IMDB title ID as public domain. So far my +random manual checks have indicated that the method is solid, but I +really wish all lists of public domain movies would include unique +movie identifier like the IMDB title ID. It would make the job of +counting movies in the public domain a lot easier.

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

- Tags: english, opphavsrett. + Tags: english, opphavsrett, verkidetfri.

@@ -440,26 +242,402 @@ Archive: Q1140317,

A one-way wall on the border?

14th October 2017

I find it fascinating how many of the people being locked inside -the proposed border wall between USA and Mexico support the idea. The -proposal to keep Mexicans out reminds me of -the -propaganda twist from the East Germany government calling the wall -the âAntifascist Bulwarkâ after erecting the Berlin Wall, claiming -that the wall was erected to keep enemies from creeping into East -Germany, while it was obvious to the people locked inside it that it -was erected to keep the people from escaping.

- -

Do the people in USA supporting this wall really believe it is a -one way wall, only keeping people on the outside from getting in, -while not keeping people in the inside from getting out?

Kommentarer til Â«Evaluation of (il)legalityÂ» for Popcorn Time

20th December 2017

I gÃ¥r var jeg i Follo tingrett som sakkyndig vitne og presenterte + mine undersÃ¸kelser rundt + telling + av filmverk i det fri, relatert til + foreningen NUUGs involvering i + saken om + Ãkokrims beslag og senere inndragning av DNS-domenet + popcorn-time.no. Jeg snakket om flere ting, men mest om min + vurdering av hvordan filmbransjen har mÃ¥lt hvor ulovlig Popcorn Time + er. Filmbransjens mÃ¥ling er sÃ¥ vidt jeg kan se videreformidlet uten + endringer av norsk politi, og domstolene har lagt mÃ¥lingen til grunn + nÃ¥r de har vurdert Popcorn Time bÃ¥de i Norge og i utlandet (tallet + 99% er referert ogsÃ¥ i utenlandske domsavgjÃ¸relser).

+ +

I forkant av mitt vitnemÃ¥l skrev jeg et notat, mest til meg selv, + med de punktene jeg Ã¸nsket Ã¥ fÃ¥ frem. Her er en kopi av notatet jeg + skrev og ga til aktoratet. Merkelig nok ville ikke dommerene ha + notatet, sÃ¥ hvis jeg forsto rettsprosessen riktig ble kun + histogram-grafen lagt inn i dokumentasjonen i saken. Dommerne var + visst bare interessert i Ã¥ forholde seg til det jeg sa i retten, + ikke det jeg hadde skrevet i forkant. Uansett sÃ¥ antar jeg at flere + enn meg kan ha glede av teksten, og publiserer den derfor her. + Legger ved avskrift av dokument 09,13, som er det sentrale + dokumentet jeg kommenterer.

+ +

Kommentarer til Â«Evaluation of (il)legalityÂ» for Popcorn + Time

+ +

Oppsummering

+ +

MÃ¥lemetoden som Ãkokrim har lagt til grunn nÃ¥r de pÃ¥stÃ¥r at 99% av + filmene tilgjengelig fra Popcorn Time deles ulovlig har + svakheter.

+ +

De eller den som har vurdert hvorvidt filmer kan lovlig deles har + ikke lyktes med Ã¥ identifisere filmer som kan deles lovlig og har + tilsynelatende antatt at kun veldig gamle filmer kan deles lovlig. + Ãkokrim legger til grunn at det bare finnes Ã¨n film, Charlie + Chaplin-filmen Â«The CircusÂ» fra 1928, som kan deles fritt blant de + som ble observert tilgjengelig via ulike Popcorn Time-varianter. + Jeg finner tre flere blant de observerte filmene: Â«The Brain That + Wouldn't DieÂ» fra 1962, Â«Godâs Little AcreÂ» fra 1958 og Â«She Wore a + Yellow RibbonÂ» fra 1949. Det er godt mulig det finnes flere. Det + finnes dermed minst fire ganger sÃ¥ mange filmer som lovlig kan deles + pÃ¥ Internett i datasettet Ãkokrim har lagt til grunn nÃ¥r det pÃ¥stÃ¥s + at mindre enn 1 % kan deles lovlig.

+ +

Dernest, utplukket som gjÃ¸res ved sÃ¸k pÃ¥ tilfeldige ord hentet fra + ordlisten til Dale-Chall avviker fra Ã¥rsfordelingen til de brukte + filmkatalogene som helhet, hvilket pÃ¥virker fordelingen mellom + filmer som kan lovlig deles og filmer som ikke kan lovlig deles. I + tillegg gir valg av Ã¸vre del (de fem fÃ¸rste) av sÃ¸keresultatene et + avvik fra riktig Ã¥rsfordeling, hvilket pÃ¥virker fordelingen av verk + i det fri i sÃ¸keresultatet.

+ +

Det som mÃ¥les er ikke (u)lovligheten knyttet til bruken av Popcorn + Time, men (u)lovligheten til innholdet i bittorrent-filmkataloger + som vedlikeholdes uavhengig av Popcorn Time.

+ +

Omtalte dokumenter: 09,12, 09,13, 09,14, +09,18, 09,19, 09,20.

+ +

Utfyllende kommentarer

+ +

Ãkokrim har forklart domstolene at minst 99% av alt som er + tilgjengelig fra ulike Popcorn Time-varianter deles ulovlig pÃ¥ + Internet. Jeg ble nysgjerrig pÃ¥ hvordan de er kommet frem til dette + tallet, og dette notatet er en samling kommentarer rundt mÃ¥lingen + Ãkokrim henviser til. Litt av bakgrunnen for at jeg valgte Ã¥ se pÃ¥ + saken er at jeg er interessert i Ã¥ identifisere og telle hvor mange + kunstneriske verk som er falt i det fri eller av andre grunner kan + lovlig deles pÃ¥ Internett, og dermed var interessert i hvordan en + hadde funnet den ene prosenten som kanskje deles lovlig.

+ +

Andelen pÃ¥ 99% kommer fra et ukreditert og udatert notatet som tar + mÃ¥l av seg Ã¥ dokumentere en metode for Ã¥ mÃ¥le hvor (u)lovlig ulike + Popcorn Time-varianter er.

+ +

Raskt oppsummert, sÃ¥ forteller metodedokumentet at pÃ¥ grunn av at + det ikke er mulig Ã¥ fÃ¥ tak i komplett liste over alle filmtitler + tilgjengelig via Popcorn Time, sÃ¥ lages noe som skal vÃ¦re et + representativt utvalg ved Ã¥ velge 50 sÃ¸keord stÃ¸rre enn tre tegn fra + ordlisten kjent som Dale-Chall. For hvert sÃ¸keord gjÃ¸res et sÃ¸k og + de fÃ¸rste fem filmene i sÃ¸keresultatet samles inn inntil 100 unike + filmtitler er funnet. Hvis 50 sÃ¸keord ikke var tilstrekkelig for Ã¥ + nÃ¥ 100 unike filmtitler ble flere filmer fra hvert sÃ¸keresultat lagt + til. Hvis dette heller ikke var tilstrekkelig, sÃ¥ ble det hentet ut + og sÃ¸kt pÃ¥ flere tilfeldig valgte sÃ¸keord inntil 100 unike + filmtitler var identifisert.

+ +

Deretter ble for hver av filmtitlene Â«vurdert hvorvidt det var + rimelig Ã¥ forvente om at verket var vernet av copyright, ved Ã¥ se pÃ¥ + om filmen var tilgjengelig i IMDB, samt se pÃ¥ regissÃ¸r, + utgivelsesÃ¥r, nÃ¥r det var utgitt for bestemte markedsomrÃ¥der samt + hvilke produksjons- og distribusjonsselskap som var registrertÂ» (min + oversettelse).

+ +

Metoden er gjengitt bÃ¥de i de ukrediterte dokumentene 09,13 og + 09,19, samt beskrevet fra side 47 i dokument 09,20, lysark datert + 2017-02-01. Sistnevnte er kreditert Geerart Bourlon fra Motion + Picture Association EMEA. Metoden virker Ã¥ ha flere svakheter som + gir resultatene en slagside. Den starter med Ã¥ slÃ¥ fast at det ikke + er mulig Ã¥ hente ut en komplett liste over alle filmtitler som er + tilgjengelig, og at dette er bakgrunnen for metodevalget. Denne + forutsetningen er ikke i trÃ¥d med det som stÃ¥r i dokument 09,12, som + ikke heller har oppgitt forfatter og dato. Dokument 09,12 forteller + hvordan hele kataloginnholdet ble lasted ned og talt opp. Dokument + 09,12 er muligens samme rapport som ble referert til i dom fra Oslo + Tingrett 2017-11-03 + (sak + 17-093347TVI-OTIR/05) som rapport av 1. juni 2017 av Alexander + Kind Petersen, men jeg har ikke sammenlignet dokumentene ord for ord + for Ã¥ kontrollere dette.

+ +

IMDB er en forkortelse for The Internet Movie Database, en + anerkjent kommersiell nettjeneste som brukes aktivt av bÃ¥de + filmbransjen og andre til Ã¥ holde rede pÃ¥ hvilke spillefilmer (og + endel andre filmer) som finnes eller er under produksjon, og + informasjon om disse filmene. Datakvaliteten er hÃ¸y, med fÃ¥ feil og + fÃ¥ filmer som mangler. IMDB viser ikke informasjon om + opphavsrettslig status for filmene pÃ¥ infosiden for hver film. Som + del av IMDB-tjenesten finnes det lister med filmer laget av + frivillige som lister opp det som antas Ã¥ vÃ¦re verk i det fri.

+ +

Det finnes flere kilder som kan brukes til Ã¥ finne filmer som er + allemannseie (public domain) eller har bruksvilkÃ¥r som gjÃ¸r det + lovlig for alleÃ¥ dele dem pÃ¥ Internett. Jeg har de siste ukene + forsÃ¸kt Ã¥ samle og krysskoble disse listene for Ã¥ forsÃ¸ke Ã¥ telle + antall filmer i det fri. Ved Ã¥ ta utgangspunkt i slike lister (og + publiserte filmer for Internett-arkivets del), har jeg sÃ¥ langt + klart Ã¥ identifisere over 11 000 filmer, hovedsaklig spillefilmer. + +

De aller fleste oppfÃ¸ringene er hentet fra IMDB selv, basert pÃ¥ det + faktum at alle filmer laget i USA fÃ¸r 1923 er falt i det fri. + Tilsvarende tidsgrense for Storbritannia er 1912-07-01, men dette + utgjÃ¸r bare veldig liten del av spillefilmene i IMDB (19 totalt). + En annen stor andel kommer fra Internett-arkivet, der jeg har + identifisert filmer med referanse til IMDB. Internett-arkivet, som + holder til i USA, har som + policy Ã¥ kun publisere + filmer som det er lovlig Ã¥ distribuere. Jeg har under arbeidet + kommet over flere filmer som har blitt fjernet fra + Internett-arkivet, hvilket gjÃ¸r at jeg konkluderer med at folkene + som kontrollerer Internett-arkivet har et aktivt forhold til Ã¥ kun + ha lovlig innhold der, selv om det i stor grad er drevet av + frivillige. En annen stor liste med filmer kommer fra det + kommersielle selskapet Retro Film Vault, som selger allemannseide + filmer til TV- og filmbransjen, Jeg har ogsÃ¥ benyttet meg av lister + over filmer som hevdes Ã¥ vÃ¦re allemannseie, det vÃ¦re seg Public + Domain Review, Public Domain Torrents og Public Domain Movies (.net + og .info), samt lister over filmer med Creative Commons-lisensiering + fra Wikipedia, VODO og The Hill Productions. Jeg har gjort endel + stikkontroll ved Ã¥ vurdere filmer som kun omtales pÃ¥ en liste. Der + jeg har funnet feil som har gjort meg i tvil om vurderingen til de + som har laget listen har jeg forkastet listen fullstendig (gjelder + en liste fra IMDB).

+ +

Ved Ã¥ ta utgangspunkt i verk som kan antas Ã¥ vÃ¦re lovlig delt pÃ¥ + Internett (fra blant annet Internett-arkivet, Public Domain + Torrents, Public Domain Reivew og Public Domain Movies), og knytte + dem til oppfÃ¸ringer i IMDB, sÃ¥ har jeg sÃ¥ langt klart Ã¥ identifisere + over 11 000 filmer (hovedsaklig spillefilmer) det er grunn til Ã¥ tro + kan lovlig distribueres av alle pÃ¥ Internett. Som ekstra kilder er + det brukt lister over filmer som antas/pÃ¥stÃ¥s Ã¥ vÃ¦re allemannseie. + Disse kildene kommer fra miljÃ¸er som jobber for Ã¥ gjÃ¸re tilgjengelig + for almennheten alle verk som er falt i det fri eller har + bruksvilkÃ¥r som tillater deling. + +

I tillegg til de over 11 000 filmene der tittel-ID i IMDB er + identifisert, har jeg funnet mer enn 20 000 oppfÃ¸ringer der jeg ennÃ¥ + ikke har hatt kapasitet til Ã¥ spore opp tittel-ID i IMDB. Noen av + disse er nok duplikater av de IMDB-oppfÃ¸ringene som er identifisert + sÃ¥ langt, men neppe alle. Retro Film Vault hevder Ã¥ ha 44 000 + filmverk i det fri i sin katalog, sÃ¥ det er mulig at det reelle + tallet er betydelig hÃ¸yere enn de jeg har klart Ã¥ identifisere sÃ¥ + langt. Konklusjonen er at tallet 11 000 er nedre grense for hvor + mange filmer i IMDB som kan lovlig deles pÃ¥ Internett. I fÃ¸lge statistikk fra IMDB er det 4.6 + millioner titler registrert, hvorav 3 millioner er TV-serieepisoder. + Jeg har ikke funnet ut hvordan de fordeler seg per Ã¥r.

+ +

Hvis en fordeler pÃ¥ Ã¥r alle tittel-IDene i IMDB som hevdes Ã¥ lovlig + kunne deles pÃ¥ Internett, fÃ¥r en fÃ¸lgende histogram:

+ +

En kan i histogrammet se at effekten av manglende registrering + eller fornying av registrering er at mange filmer gitt ut i USA fÃ¸r + 1978 er allemannseie i dag. I tillegg kan en se at det finnes flere + filmer gitt ut de siste Ã¥rene med bruksvilkÃ¥r som tillater deling, + muligens pÃ¥ grunn av fremveksten av + Creative + Commons-bevegelsen..

+ +

For maskinell analyse av katalogene har jeg laget et lite program + som kobler seg til bittorrent-katalogene som brukes av ulike Popcorn + Time-varianter og laster ned komplett liste over filmer i + katalogene, noe som bekrefter at det er mulig Ã¥ hente ned komplett + liste med alle filmtitler som er tilgjengelig. Jeg har sett pÃ¥ fire + bittorrent-kataloger. Den ene brukes av klienten tilgjengelig fra + www.popcorntime.sh og er navngitt 'sh' i dette dokumentet. Den + andre brukes i fÃ¸lge dokument 09,12 av klienten tilgjengelig fra + popcorntime.ag og popcorntime.sh og er navngitt 'yts' i dette + dokumentet. Den tredje brukes av websidene tilgjengelig fra + popcorntime-online.tv og er navngitt 'apidomain' i dette dokumentet. + Den fjerde brukes av klienten tilgjenglig fra popcorn-time.to i + fÃ¸lge dokument 09,12, og er navngitt 'ukrfnlge' i dette + dokumentet.

+ +

Metoden Ãkokrim legger til grunn skriver i sitt punkt fire at + skjÃ¸nn er en egnet metode for Ã¥ finne ut om en film kan lovlig deles + pÃ¥ Internett eller ikke, og sier at det ble Â«vurdert hvorvidt det + var rimelig Ã¥ forvente om at verket var vernet av copyrightÂ». For + det fÃ¸rste er det ikke nok Ã¥ slÃ¥ fast om en film er Â«vernet av + copyrightÂ» for Ã¥ vite om det er lovlig Ã¥ dele den pÃ¥ Internett eller + ikke, da det finnes flere filmer med opphavsrettslige bruksvilkÃ¥r + som tillater deling pÃ¥ Internett. Eksempler pÃ¥ dette er Creative + Commons-lisensierte filmer som Citizenfour fra 2014 og Sintel fra + 2010. I tillegg til slike finnes det flere filmer som nÃ¥ er + allemannseie (public domain) pÃ¥ grunn av manglende registrering + eller fornying av registrering selv om bÃ¥de regisÃ¸r, + produksjonsselskap og distributÃ¸r Ã¸nsker seg vern. Eksempler pÃ¥ + dette er Plan 9 from Outer Space fra 1959 og Night of the Living + Dead fra 1968. Alle filmer fra USA som var allemannseie fÃ¸r + 1989-03-01 forble i det fri da Bern-konvensjonen, som tok effekt i + USA pÃ¥ det tidspunktet, ikke ble gitt tilbakevirkende kraft. Hvis + det er noe + historien + om sangen Â«Happy birthdayÂ» forteller oss, der betaling for bruk + har vÃ¦rt krevd inn i flere tiÃ¥r selv om sangen ikke egentlig var + vernet av Ã¥ndsverksloven, sÃ¥ er det at hvert enkelt verk mÃ¥ vurderes + nÃ¸ye og i detalj fÃ¸r en kan slÃ¥ fast om verket er allemannseie eller + ikke, det holder ikke Ã¥ tro pÃ¥ selverklÃ¦rte rettighetshavere. Flere + eksempel pÃ¥ verk i det fri som feilklassifiseres som vernet er fra + dokument 09,18, som lister opp sÃ¸keresultater for klienten omtalt + som popcorntime.sh og i fÃ¸lge notatet kun inneholder en film (The + Circus fra 1928) som under tvil kan antas Ã¥ vÃ¦re allemannseie.

+ +

Ved rask gjennomlesning av dokument 09,18, som inneholder + skjermbilder fra bruk av en Popcorn Time-variant, fant jeg omtalt + bÃ¥de filmen Â«The Brain That Wouldn't DieÂ» fra 1962 som er + tilgjengelig + fra Internett-arkivet og som + i + fÃ¸lge Wikipedia er allemannseie i USA da den ble gitt ut i + 1962 uten 'copyright'-merking, og filmen Â«Godâs Little AcreÂ» fra + 1958 som + er lagt ut pÃ¥ Wikipedia, der det fortelles at + sort/hvit-utgaven er allemannseie. Det fremgÃ¥r ikke fra dokument + 09,18 om filmen omtalt der er sort/hvit-utgaven. Av + kapasitetsÃ¥rsaker og pÃ¥ grunn av at filmoversikten i dokument 09,18 + ikke er maskinlesbart har jeg ikke forsÃ¸kt Ã¥ sjekke alle filmene som + listes opp der om mot liste med filmer som er antatt lovlig kan + distribueres pÃ¥ Internet.

+ +

Ved maskinell gjennomgang av listen med IMDB-referanser under + regnearkfanen 'Unique titles' i dokument 09.14, fant jeg i tillegg + filmen Â«She Wore a Yellow RibbonÂ» fra 1949) som nok ogsÃ¥ er + feilklassifisert. Filmen Â«She Wore a Yellow RibbonÂ» er tilgjengelig + fra Internett-arkivet og markert som allemannseie der. Det virker + dermed Ã¥ vÃ¦re minst fire ganger sÃ¥ mange filmer som kan lovlig deles + pÃ¥ Internett enn det som er lagt til grunn nÃ¥r en pÃ¥stÃ¥r at minst + 99% av innholdet er ulovlig. Jeg ser ikke bort fra at nÃ¦rmere + undersÃ¸kelser kan avdekke flere. Poenget er uansett at metodens + punkt om Â«rimelig Ã¥ forvente om at verket var vernet av copyrightÂ» + gjÃ¸r metoden upÃ¥litelig.

+ +

Den omtalte mÃ¥lemetoden velger ut tilfeldige sÃ¸ketermer fra + ordlisten Dale-Chall. Den ordlisten inneholder 3000 enkle engelske + som fjerdeklassinger i USA er forventet Ã¥ forstÃ¥. Det fremgÃ¥r ikke + hvorfor akkurat denne ordlisten er valgt, og det er uklart for meg + om den er egnet til Ã¥ fÃ¥ et representativt utvalg av filmer. Mange + av ordene gir tomt sÃ¸keresultat. Ved Ã¥ simulerte tilsvarende sÃ¸k + ser jeg store avvik fra fordelingen i katalogen for enkeltmÃ¥linger. + Dette antyder at enkeltmÃ¥linger av 100 filmer slik mÃ¥lemetoden + beskriver er gjort, ikke er velegnet til Ã¥ finne andel ulovlig + innhold i bittorrent-katalogene.

+ +

En kan motvirke dette store avviket for enkeltmÃ¥linger ved Ã¥ gjÃ¸re + mange sÃ¸k og slÃ¥ sammen resultatet. Jeg har testet ved Ã¥ + gjennomfÃ¸re 100 enkeltmÃ¥linger (dvs. mÃ¥ling av (100x100=) 10 000 + tilfeldig valgte filmer) som gir mindre, men fortsatt betydelig + avvik, i forhold til telling av filmer pr Ã¥r i hele katalogen.

+ +

MÃ¥lemetoden henter ut de fem Ã¸verste i sÃ¸keresultatet. + SÃ¸keresultatene er sortert pÃ¥ antall bittorrent-klienter registrert + som delere i katalogene, hvilket kan gi en slagside mot hvilke + filmer som er populÃ¦re blant de som bruker bittorrent-katalogene, + uten at det forteller noe om hvilket innhold som er tilgjengelig + eller hvilket innhold som deles med Popcorn Time-klienter. Jeg har + forsÃ¸kt Ã¥ mÃ¥le hvor stor en slik slagside eventuelt er ved Ã¥ + sammenligne fordelingen hvis en tar de 5 nederste i sÃ¸keresultatet i + stedet. Avviket for disse to metodene for endel kataloger er godt + synlig pÃ¥ histogramet. Her er histogram over filmer funnet i den + komplette katalogen (grÃ¸nn strek), og filmer funnet ved sÃ¸k etter + ord i Dale-Chall. Grafer merket 'top' henter fra de 5 fÃ¸rste i + sÃ¸keresultatet, mens de merket 'bottom' henter fra de 5 siste. En + kan her se at resultatene pÃ¥virkes betydelig av hvorvidt en ser pÃ¥ + de fÃ¸rste eller de siste filmene i et sÃ¸ketreff.

+ +

+ + +
+ + +
+ + +
+ + +

+ +

Det er verdt Ã¥ bemerke at de omtalte bittorrent-katalogene ikke er + laget for bruk med Popcorn Time. Eksempelvis tilhÃ¸rer katalogen + YTS, som brukes av klientet som ble lastes ned fra popcorntime.sh, + et selvstendig fildelings-relatert nettsted YTS.AG med et separat + brukermiljÃ¸. MÃ¥lemetoden foreslÃ¥tt av Ãkokrim mÃ¥ler dermed ikke + (u)lovligheten rundt bruken av Popcorn Time, men (u)lovligheten til + innholdet i disse katalogene.

+ +

Metoden fra Ãkokrims dokument 09,13 i straffesaken +om DNS-beslag.

+ +

1. Evaluation of (il)legality

+ +

1.1. Methodology + +

Due to its technical configuration, Popcorn Time applications don't +allow to make a full list of all titles made available. In order to +evaluate the level of illegal operation of PCT, the following +methodology was applied:

+ +

A random selection of 50 keywords, greater than 3 letters, was + made from the Dale-Chall list that contains 3000 simple English + words1. The selection was made by using a Random Number + Generator2.
For each keyword, starting with the first randomly selected + keyword, a search query was conducted in the movie section of the + respective Popcorn Time application. For each keyword, the first + five results were added to the title list until the number of 100 + unique titles was reached (duplicates were removed).
For one fork, .CH, insufficient titles were generated via this + approach to reach 100 titles. This was solved by adding any + additional query results above five for each of the 50 keywords. + Since this still was not enough, another 42 random keywords were + selected to finally reach 100 titles.
It was verified whether or not there is a reasonable expectation + that the work is copyrighted by checking if they are available on + IMDb, also verifying the director, the year when the title was + released, the release date for a certain market, the production + company/ies of the title and the distribution company/ies.

+ +

1.2. Results

+ +

Between 6 and 9 June 2016, four forks of Popcorn Time were +investigated: popcorn-time.to, popcorntime.ag, popcorntime.sh and +popcorntime.ch. An excel sheet with the results is included in +Appendix 1. Screenshots were secured in separate Appendixes for each +respective fork, see Appendix 2-5.

+ +

For each fork, out of 100, de-duplicated titles it was possible to +retrieve data according to the parameters set out above that indicate +that the title is commercially available. Per fork, there was 1 title +that presumably falls within the public domain, i.e. the 1928 movie +"The Circus" by and with Charles Chaplin.

+ +

Based on the above it is reasonable to assume that 99% of the movie +content of each fork is copyright protected and is made available +illegally.

+ +

This exercise was not repeated for TV series, but considering that +besides production companies and distribution companies also +broadcasters may have relevant rights, it is reasonable to assume that +at least a similar level of infringement will be established.

+ +

Based on the above it is reasonable to assume that 99% of all the +content of each fork is copyright protected and are made available +illegally.

- Tags: english. + Tags: fildeling, freeculture, norsk, nuug, opphavsrett, verkidetfri, video.

@@ -467,43 +645,38 @@ while not keeping people in the inside from getting out?

Generating 3D prints in Debian using Cura and Slic3r(-prusa)

9th October 2017

At my nearby maker space, -Sonen, I heard the story that it -was easier to generate gcode files for theyr 3D printers (Ultimake 2+) -on Windows and MacOS X than Linux, because the software involved had -to be manually compiled and set up on Linux while premade packages -worked out of the box on Windows and MacOS X. I found this annoying, -as the software involved, -Cura, is free software -and should be trivial to get up and running on Linux if someone took -the time to package it for the relevant distributions. I even found -a request for adding into -Debian from 2013, which had seem some activity over the years but -never resulted in the software showing up in Debian. So a few days -ago I offered my help to try to improve the situation.

- -

Now I am very happy to see that all the packages required by a -working Cura in Debian are uploaded into Debian and waiting in the NEW -queue for the ftpmasters to have a look. You can track the progress -on -the -status page for the 3D printer team.

- -

The uploaded packages are a bit behind upstream, and was uploaded -now to get slots in the NEW -queue while we work up updating the packages to the latest -upstream version.

- -

On a related note, two competitors for Cura, which I found harder -to use and was unable to configure correctly for Ultimaker 2+ in the -short time I spent on it, are already in Debian. If you are looking -for 3D printer "slicers" and want something already available in -Debian, check out -slic3r and -slic3r-prusa. -The latter is a fork of the former.

Cura, the nice 3D print slicer, is now in Debian Unstable

17th December 2017

After several months of working and waiting, I am happy to report +that the nice and user friendly 3D printer slicer software Cura just +entered Debian Unstable. It consist of five packages, +cura, +cura-engine, +libarcus, +fdm-materials, +libsavitar and +uranium. The last +two, uranium and cura, entered Unstable yesterday. This should make +it easier for Debian users to print on at least the Ultimaker class of +3D printers. My nearest 3D printer is an Ultimaker 2+, so it will +make life easier for at least me. :)

+ +

The work to make this happen was done by Gregor Riepl, and I was +happy to assist him in sponsoring the packages. With the introduction +of Cura, Debian is up to three 3D printer slicers at your service, +Cura, Slic3r and Slic3r Prusa. If you own or have access to a 3D +printer, give it a go. :)

+ +

The 3D printer software is maintained by the 3D printer Debian +team, flocking together on the +3dprinter-general +mailing list and the +#debian-3dprinting +IRC channel.

+ +

The next step for Cura in Debian is to update the cura package to +version 3.0.3 and then update the entire set of packages to version +3.1.0 which showed up the last few days.

@@ -516,30 +689,85 @@ The latter is a fork of the former.

Mangler du en skrue, eller har du en skrue lÃ¸s?

4th October 2017

NÃ¥r jeg holder pÃ¥ med ulike prosjekter, sÃ¥ trenger jeg stadig ulike -skruer. Det siste prosjektet jeg holder pÃ¥ med er Ã¥ lage -en boks til en -HDMI-touch-skjerm som skal brukes med Raspberry Pi. Boksen settes -sammen med skruer og bolter, og jeg har vÃ¦rt i tvil om hvor jeg kan -fÃ¥ tak i de riktige skruene. Clas Ohlson og Jernia i nÃ¦rheten har -sjelden hatt det jeg trenger. Men her om dagen fikk jeg et fantastisk -tips for oss som bor i Oslo. -Zachariassen Jernvare AS i -Hegermannsgate -23A pÃ¥ Torshov har et fantastisk utvalg, og Ã¥pent mellom 09:00 og -17:00. De selger skruer, muttere, bolter, skiver etc i lÃ¸s vekt, og -sÃ¥ langt har jeg fÃ¥tt alt jeg har lett etter. De har i tillegg det -meste av annen jernvare, som verktÃ¸y, lamper, ledninger, etc. Jeg -hÃ¥per de har nok kunder til Ã¥ holde det gÃ¥ende lenge, da dette er en -butikk jeg kommer til Ã¥ besÃ¸ke ofte. Butikken er et funn Ã¥ ha i -nabolaget for oss som liker Ã¥ bygge litt selv. :)

Idea for finding all public domain movies in the USA

13th December 2017

While looking at +the scanned copies +for the copyright renewal entries for movies published in the USA, +an idea occurred to me. The number of renewals are so few per year, it +should be fairly quick to transcribe them all and add references to +the corresponding IMDB title ID. This would give the (presumably) +complete list of movies published 28 years earlier that did _not_ +enter the public domain for the transcribed year. By fetching the +list of USA movies published 28 years earlier and subtract the movies +with renewals, we should be left with movies registered in IMDB that +are now in the public domain. For the year 1955 (which is the one I +have looked at the most), the total number of pages to transcribe is +21. For the 28 years from 1950 to 1978, it should be in the range +500-600 pages. It is just a few days of work, and spread among a +small group of people it should be doable in a few weeks of spare +time.

+ +

A typical copyright renewal entry look like this (the first one +listed for 1955):

+ +

+ ADAM AND EVIL, a photoplay in seven reels by Metro-Goldwyn-Mayer + Distribution Corp. (c) 17Aug27; L24293. Loew's Incorporated (PWH); + 10Jun55; R151558. +

+ +

The movie title as well as registration and renewal dates are easy +enough to locate by a program (split on first comma and look for +DDmmmYY). The rest of the text is not required to find the movie in +IMDB, but is useful to confirm the correct movie is found. I am not +quite sure what the L and R numbers mean, but suspect they are +reference numbers into the archive of the US Copyright Office.

+ +

Tracking down the equivalent IMDB title ID is probably going to be +a manual task, but given the year it is fairly easy to search for the +movie title using for example +http://www.imdb.com/find?q=adam+and+evil+1927&s=all. +Using this search, I find that the equivalent IMDB title ID for the +first renewal entry from 1955 is +http://www.imdb.com/title/tt0017588/.

+ +

I suspect the best way to do this would be to make a specialised +web service to make it easy for contributors to transcribe and track +down IMDB title IDs. In the web service, once a entry is transcribed, +the title and year could be extracted from the text, a search in IMDB +conducted for the user to pick the equivalent IMDB title ID right +away. By spreading out the work among volunteers, it would also be +possible to make at least two persons transcribe the same entries to +be able to discover any typos introduced. But I will need help to +make this happen, as I lack the spare time to do all of this on my +own. If you would like to help, please get in touch. Perhaps you can +draft a web service for crowd sourcing the task?

+ +

Note, Project Gutenberg already have some +transcribed +copies of the US Copyright Office renewal protocols, but I have +not been able to find any film renewals there, so I suspect they only +have copies of renewal for written works. I have not been able to find +any transcribed versions of movie renewals so far. Perhaps they exist +somewhere?

+ +

I would love to figure out methods for finding all the public +domain works in other countries too, but it is a lot harder. At least +for Norway and Great Britain, such work involve tracking down the +people involved in making the movie and figuring out when they died. +It is hard enough to figure out who was part of making a movie, but I +do not know how to automate such procedure without a registry of every +person involved in making movies and their death year.

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

- Tags: norsk. + Tags: english, opphavsrett, verkidetfri.

@@ -547,64 +775,49 @@ nabolaget for oss som liker Ã¥ bygge litt selv. :)

Visualizing GSM radio chatter using gr-gsm and Hopglass

29th September 2017

Every mobile phone announce its existence over radio to the nearby -mobile cell towers. And this radio chatter is available for anyone -with a radio receiver capable of receiving them. Details about the -mobile phones with very good accuracy is of course collected by the -phone companies, but this is not the topic of this blog post. The -mobile phone radio chatter make it possible to figure out when a cell -phone is nearby, as it include the SIM card ID (IMSI). By paying -attention over time, one can see when a phone arrive and when it leave -an area. I believe it would be nice to make this information more -available to the general public, to make more people aware of how -their phones are announcing their whereabouts to anyone that care to -listen.

- -

I am very happy to report that we managed to get something -visualizing this information up and running for -Oslo Skaperfestival 2017 -(Oslo Makers Festival) taking place today and tomorrow at Deichmanske -library. The solution is based on the -simple -recipe for listening to GSM chatter I posted a few days ago, and -will show up at the stand of Ãpen -Sone from the Computer Science department of the University of -Oslo. The presentation will show the nearby mobile phones (aka -IMSIs) as dots in a web browser graph, with lines to the dot -representing mobile base station it is talking to. It was working in -the lab yesterday, and was moved into place this morning.

- -

We set up a fairly powerful desktop machine using Debian -Buster/Testing with several (five, I believe) RTL2838 DVB-T receivers -connected and visualize the visible cell phone towers using an -English version of -Hopglass. A fairly powerfull machine is needed as the -grgsm_livemon_headless processes from -gr-gsm converting -the radio signal to data packages is quite CPU intensive.

- -

The frequencies to listen to, are identified using a slightly -patched scan-and-livemon (to set the --args values for each receiver), -and the Hopglass data is generated using the -patches -in my meshviewer-output branch. For some reason we could not get -more than four SDRs working. There is also a geographical map trying -to show the location of the base stations, but I believe their -coordinates are hardcoded to some random location in Germany, I -believe. The code should be replaced with code to look up location in -a text file, a sqlite database or one of the online databases -mentioned in -the github -issue for the topic. - -

If this sound interesting, visit the stand at the festival!

Is the short movie Â«Empty SocksÂ» from 1927 in the public domain or not?

5th December 2017

Three years ago, a presumed lost animation film, +Empty Socks from +1927, was discovered in the Norwegian National Library. At the +time it was discovered, it was generally assumed to be copyrighted by +The Walt Disney Company, and I blogged about +my +reasoning to conclude that it would would enter the Norwegian +equivalent of the public domain in 2053, based on my understanding of +Norwegian Copyright Law. But a few days ago, I came across +a +blog post claiming the movie was already in the public domain, at +least in USA. The reasoning is as follows: The film was released in +November or Desember 1927 (sources disagree), and presumably +registered its copyright that year. At that time, right holders of +movies registered by the copyright office received government +protection for there work for 28 years. After 28 years, the copyright +had to be renewed if the wanted the government to protect it further. +The blog post I found claim such renewal did not happen for this +movie, and thus it entered the public domain in 1956. Yet someone +claim the copyright was renewed and the movie is still copyright +protected. Can anyone help me to figure out which claim is correct? +I have not been able to find Empty Socks in Catalog of copyright +entries. Ser.3 pt.12-13 v.9-12 1955-1958 Motion Pictures +available +from the University of Pennsylvania, neither in +page +45 for the first half of 1955, nor in +page +119 for the second half of 1955. It is of course possible that +the renewal entry was left out of the printed catalog by mistake. Is +there some way to rule out this possibility? Please help, and update +the wikipedia page with your findings. + +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

- Tags: debian, english, personvern, surveillance. + Tags: english, freeculture, opphavsrett, verkidetfri, video.

@@ -612,83 +825,124 @@ issue for the topic.

Easier recipe to observe the cell phones around you

24th September 2017

A little more than a month ago I wrote -how -to observe the SIM card ID (aka IMSI number) of mobile phones talking -to nearby mobile phone base stations using Debian GNU/Linux and a -cheap USB software defined radio, and thus being able to pinpoint -the location of people and equipment (like cars and trains) with an -accuracy of a few kilometer. Since then we have worked to make the -procedure even simpler, and it is now possible to do this without any -manual frequency tuning and without building your own packages.

- -

The gr-gsm -package is now included in Debian testing and unstable, and the -IMSI-catcher code no longer require root access to fetch and decode -the GSM data collected using gr-gsm.

- -

Here is an updated recipe, using packages built by Debian and a git -clone of two python scripts:

- -

Start with a Debian machine running the Buster version (aka - testing).
Run 'apt install gr-gsm python-numpy python-scipy - python-scapy' as root to install required packages.

Metadata proposal for movies on the Internet Archive

28th November 2017

It would be easier to locate the movie you want to watch in +the Internet Archive, if the +metadata about each movie was more complete and accurate. In the +archiving community, a well known saying state that good metadata is a +love letter to the future. The metadata in the Internet Archive could +use a face lift for the future to love us back. Here is a proposal +for a small improvement that would make the metadata more useful +today. I've been unable to find any document describing the various +standard fields available when uploading videos to the archive, so +this proposal is based on my best quess and searching through several +of the existing movies.

+ +

I have a few use cases in mind. First of all, I would like to be +able to count the number of distinct movies in the Internet Archive, +without duplicates. I would further like to identify the IMDB title +ID of the movies in the Internet Archive, to be able to look up a IMDB +title ID and know if I can fetch the video from there and share it +with my friends.

+ +

Second, I would like the Butter data provider for The Internet +archive +(available +from github), to list as many of the good movies as possible. The +plugin currently do a search in the archive with the following +parameters:

Fetch the code decoding GSM packages using 'git clone - github.com/Oros42/IMSI-catcher.git'.

- -

Insert USB software defined radio supported by GNU Radio.

- -

Enter the IMSI-catcher directory and run 'python - scan-and-livemon' to locate the frequency of nearby base - stations and start listening for GSM packages on one of them.

- -

Enter the IMSI-catcher directory and run 'python - simple_IMSI-catcher.py' to display the collected information.

- -

+collection:moviesandfilms
+AND NOT collection:movie_trailers
+AND -mediatype:collection
+AND format:"Archive BitTorrent"
+AND year
+

Note, due to a bug somewhere the scan-and-livemon program (actually -its underlying -program grgsm_scanner) do not work with the HackRF radio. It does -work with RTL 8232 and other similar USB radio receivers you can get -very cheaply -(for example -from ebay), so for now the solution is to scan using the RTL radio -and only use HackRF for fetching GSM data.

- -

As far as I can tell, a cell phone only show up on one of the -frequencies at the time, so if you are going to track and count every -cell phone around you, you need to listen to all the frequencies used. -To listen to several frequencies, use the --numrecv argument to -scan-and-livemon to use several receivers. Further, I am not sure if -phones using 3G or 4G will show as talking GSM to base stations, so -this approach might not see all phones around you. I typically see -0-400 IMSI numbers an hour when looking around where I live.

- -

I've tried to run the scanner on a -Raspberry Pi 2 and 3 -running Debian Buster, but the grgsm_livemon_headless process seem -to be too CPU intensive to keep up. When GNU Radio print 'O' to -stdout, I am told there it is caused by a buffer overflow between the -radio and GNU Radio, caused by the program being unable to read the -GSM data fast enough. If you see a stream of 'O's from the terminal -where you started scan-and-livemon, you need a give the process more -CPU power. Perhaps someone are able to optimize the code to a point -where it become possible to set up RPi3 based GSM sniffers? I tried -using Raspbian instead of Debian, but there seem to be something wrong -with GNU Radio on raspbian, causing glibc to abort().

Most of the cool movies that fail to show up in Butter do so +because the 'year' field is missing. The 'year' field is populated by +the year part from the 'date' field, and should be when the movie was +released (date or year). Two such examples are +Ben Hur +from 1905 and +Caminandes +2: Gran Dillama from 2013, where the year metadata field is +missing.

+ +So, my proposal is simply, for every movie in The Internet Archive +where an IMDB title ID exist, please fill in these metadata fields +(note, they can be updated also long after the video was uploaded, but +as far as I can tell, only by the uploader): + +

mediatype: Should be 'movie' for movies.
collection: Should contain 'moviesandfilms'.
title: The title of the movie, without the publication year.
date: The data or year the movie was released. This make the movie show +up in Butter, as well as make it possible to know the age of the +movie and is useful to figure out copyright status.
director: The director of the movie. This make it easier to know if the +correct movie is found in movie databases.
publisher: The production company making the movie. Also useful for +identifying the correct movie.
links: Add a link to the IMDB title page, for example like this: <a +href="http://www.imdb.com/title/tt0028496/">Movie in +IMDB</a>. This make it easier to find duplicates and allow for +counting of number of unique movies in the Archive. Other external +references, like to TMDB, could be added like this too.

+ +

I did consider proposing a Custom field for the IMDB title ID (for +example 'imdb_title_url', 'imdb_code' or simply 'imdb', but suspect it +will be easier to simply place it in the links free text field.

+ +

I created +a +list of IMDB title IDs for several thousand movies in the Internet +Archive, but I also got a list of several thousand movies without +such IMDB title ID (and quite a few duplicates). It would be great if +this data set could be integrated into the Internet Archive metadata +to be available for everyone in the future, but with the current +policy of leaving metadata editing to the uploaders, it will take a +while before this happen. If you have uploaded movies into the +Internet Archive, you can help. Please consider following my proposal +above for your movies, to ensure that movie is properly +counted. :)

+ +

The list is mostly generated using wikidata, which based on +Wikipedia articles make it possible to link between IMDB and movies in +the Internet Archive. But there are lots of movies without a +Wikipedia article, and some movies where only a collection page exist +(like for the +Caminandes example above, where there are three movies but only +one Wikidata entry).

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

- Tags: debian, english, personvern, surveillance. + Tags: english, opphavsrett, verkidetfri.

@@ -696,54 +950,76 @@ with GNU Radio on raspbian, causing glibc to abort().

Datalagringsdirektivet kaster skygger over HÃ¸yre og Arbeiderpartiet

7th September 2017

For noen dager siden publiserte Jon Wessel-Aas en bloggpost om -Â«Konklusjonen om datalagring som -EU-kommisjonen ikke ville at vi skulle fÃ¥ seÂ». Det er en -interessant gjennomgang av EU-domstolens syn pÃ¥ snurpenotovervÃ¥kning -av befolkningen, som er klar pÃ¥ at det er i strid med -EU-lovgivingen.

- -

Valgkampen gÃ¥r for fullt i Norge, og om noen fÃ¥ dager er siste -frist for Ã¥ avgi stemme. En ting er sikkert, HÃ¸yre og Arbeiderpartiet -fÃ¥r ikke min stemme -denne -gangen heller. Jeg har ikke glemt at de tvang igjennom loven som -skulle pÃ¥legge alle data- og teletjenesteleverandÃ¸rer Ã¥ overvÃ¥ke alle -sine kunder. En lov som er vedtatt, og aldri opphevet igjen.

- -

Det er tydelig fra diskusjonen rundt grenselÃ¸s digital overvÃ¥kning -(eller "Digital Grenseforsvar" som det kalles i Orvellisk nytale) at -hverken HÃ¸yre og Arbeiderpartiet har noen prinsipielle sperrer mot Ã¥ -overvÃ¥ke hele befolkningen, og diskusjonen sÃ¥ langt tyder pÃ¥ at flere -av de andre partiene heller ikke har det. Mange av -de som stemte -for Datalagringsdirektivet i Stortinget (64 fra Arbeiderpartiet, -25 fra HÃ¸yre) er fortsatt aktive og argumenterer fortsatt for Ã¥ radere -vekk mer av innbyggernes privatsfÃ¦re.

- -

NÃ¥r myndighetene demonstrerer sin mistillit til folket, tror jeg -folket selv bÃ¸r legge litt innsats i Ã¥ verne sitt privatliv, ved Ã¥ ta -i bruk ende-til-ende-kryptert kommunikasjon med sine kjente og kjÃ¦re, -og begrense hvor mye privat informasjon som deles med uvedkommende. -Det er jo ingenting som tyder pÃ¥ at myndighetene kommer til Ã¥ vÃ¦re vÃ¥r -privatsfÃ¦re. -Det -er mange muligheter. Selv har jeg litt sans for -Ring, som er basert pÃ¥ p2p-teknologi -uten sentral kontroll, er fri programvare, og stÃ¸tter meldinger, tale -og video. Systemet er tilgjengelig ut av boksen fra -Debian og -Ubuntu, og det -finnes pakker for Android, MacOSX og Windows. ForelÃ¸pig er det fÃ¥ -brukere med Ring, slik at jeg ogsÃ¥ bruker -Signal som nettleserutvidelse.

Legal to share more than 3000 movies listed on IMDB?

18th November 2017

A month ago, I blogged about my work to +automatically +check the copyright status of IMDB entries, and try to count the +number of movies listed in IMDB that is legal to distribute on the +Internet. I have continued to look for good data sources, and +identified a few more. The code used to extract information from +various data sources is available in +a +git repository, currently available from github.

+ +

So far I have identified 3186 unique IMDB title IDs. To gain +better understanding of the structure of the data set, I created a +histogram of the year associated with each movie (typically release +year). It is interesting to notice where the peaks and dips in the +graph are located. I wonder why they are placed there. I suspect +World War II caused the dip around 1940, but what caused the peak +around 2010?

+ +

I've so far identified ten sources for IMDB title IDs for movies in +the public domain or with a free license. This is the statistics +reported when running 'make stats' in the git repository:

+ +

+  249 entries (    6 unique) with and   288 without IMDB title ID in free-movies-archive-org-butter.json
+ 2301 entries (  540 unique) with and     0 without IMDB title ID in free-movies-archive-org-wikidata.json
+  830 entries (   29 unique) with and     0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json
+ 2109 entries (  377 unique) with and     0 without IMDB title ID in free-movies-imdb-pd.json
+  291 entries (  122 unique) with and     0 without IMDB title ID in free-movies-letterboxd-pd.json
+  144 entries (  135 unique) with and     0 without IMDB title ID in free-movies-manual.json
+  350 entries (    1 unique) with and   801 without IMDB title ID in free-movies-publicdomainmovies.json
+    4 entries (    0 unique) with and   124 without IMDB title ID in free-movies-publicdomainreview.json
+  698 entries (  119 unique) with and   118 without IMDB title ID in free-movies-publicdomaintorrents.json
+    8 entries (    8 unique) with and   196 without IMDB title ID in free-movies-vodo.json
+ 3186 unique IMDB title IDs in total
+

+ +

The entries without IMDB title ID are candidates to increase the +data set, but might equally well be duplicates of entries already +listed with IMDB title ID in one of the other sources, or represent +movies that lack a IMDB title ID. I've seen examples of all these +situations when peeking at the entries without IMDB title ID. Based +on these data sources, the lower bound for movies listed in IMDB that +are legal to distribute on the Internet is between 3186 and 4713. + +

It would be great for improving the accuracy of this measurement, +if the various sources added IMDB title ID to their metadata. I have +tried to reach the people behind the various sources to ask if they +are interested in doing this, without any replies so far. Perhaps you +can help me get in touch with the people behind VODO, Public Domain +Torrents, Public Domain Movies and Public Domain Review to try to +convince them to add more metadata to their movie entries?

+ +

Another way you could help is by adding pages to Wikipedia about +movies that are legal to distribute on the Internet. If such page +exist and include a link to both IMDB and The Internet Archive, the +script used to generate free-movies-archive-org-wikidata.json should +pick up the mapping as soon as wikidata is updates.

+ +

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

- Tags: dld, norsk, personvern, stortinget, surveillance, valg. + Tags: english, opphavsrett, verkidetfri.

@@ -751,96 +1027,87 @@ brukere med Ring, slik at jeg ogsÃ¥ bruker

Simpler recipe on how to make a simple $7 IMSI Catcher using Debian

9th August 2017

On friday, I came across an interesting article in the Norwegian -web based ICT news magazine digi.no on -how -to collect the IMSI numbers of nearby cell phones using the cheap -DVB-T software defined radios. The article refered to instructions -and a recipe by -Keld Norman on Youtube on how to make a simple $7 IMSI Catcher, and I decided to test them out.

- -

The instructions said to use Ubuntu, install pip using apt (to -bypass apt), use pip to install pybombs (to bypass both apt and pip), -and the ask pybombs to fetch and build everything you need from -scratch. I wanted to see if I could do the same on the most recent -Debian packages, but this did not work because pybombs tried to build -stuff that no longer build with the most recent openssl library or -some other version skew problem. While trying to get this recipe -working, I learned that the apt->pip->pybombs route was a long detour, -and the only piece of software dependency missing in Debian was the -gr-gsm package. I also found out that the lead upstream developer of -gr-gsm (the name stand for GNU Radio GSM) project already had a set of -Debian packages provided in an Ubuntu PPA repository. All I needed to -do was to dget the Debian source package and built it.

- -

The IMSI collector is a python script listening for packages on the -loopback network device and printing to the terminal some specific GSM -packages with IMSI numbers in them. The code is fairly short and easy -to understand. The reason this work is because gr-gsm include a tool -to read GSM data from a software defined radio like a DVB-T USB stick -and other software defined radios, decode them and inject them into a -network device on your Linux machine (using the loopback device by -default). This proved to work just fine, and I've been testing the -collector for a few days now.

- -

The updated and simpler recipe is thus to

Some notes on fault tolerant storage systems

1st November 2017

If you care about how fault tolerant your storage is, you might +find these articles and papers interesting. They have formed how I +think of when designing a storage system.

start with a Debian machine running Stretch or newer,
USENIX :login; Redundancy +Does Not Imply Fault Tolerance. Analysis of Distributed Storage +Reactions to Single Errors and Corruptions by Aishwarya Ganesan, +Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi +H. Arpaci-Dusseau
build and install the gr-gsm package available from -http://ppa.launchpad.net/ptrkrysik/gr-gsm/ubuntu/pool/main/g/gr-gsm/,
ZDNet +Why +RAID 5 stops working in 2009 by Robin Harris
clone the git repostory from https://github.com/Oros42/IMSI-catcher,
ZDNet +Why +RAID 6 stops working in 2019 by Robin Harris
run grgsm_livemon and adjust the frequency until the terminal -where it was started is filled with a stream of text (meaning you -found a GSM station).
USENIX FAST'07 +Failure +Trends in a Large Disk Drive Population by Eduardo Pinheiro, +Wolf-Dietrich Weber and Luiz AndreÌ Barroso
go into the IMSI-catcher directory and run 'sudo python simple_IMSI-catcher.py' to extract the IMSI numbers.
USENIX ;login: Data +Integrity. Finding Truth in a World of Guesses and Lies by Doug +Hughes

USENIX FAST'08 +An +Analysis of Data Corruption in the Storage Stack by +L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. +Arpaci-Dusseau, and R. H. Arpaci-Dusseau

+ +

USENIX FAST'07 Disk +failures in the real world: what does an MTTF of 1,000,000 hours mean +to you? by B. Schroeder and G. A. Gibson.

+ +

USENIX ;login: Are +Disks the Dominant Contributor for Storage Failures? A Comprehensive +Study of Storage Subsystem Failure Characteristics by Weihang +Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky

+ +

SIGMETRICS 2007 +An +analysis of latent sector errors in disk drives by +L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler

+ + + +

Several of these research papers are based on data collected from +hundred thousands or millions of disk, and their findings are eye +opening. The short story is simply do not implicitly trust RAID or +redundant storage systems. Details matter. And unfortunately there +are few options on Linux addressing all the identified issues. Both +ZFS and Btrfs are doing a fairly good job, but have legal and +practical issues on their own. I wonder how cluster file systems like +Ceph do in this regard. After all, there is an old saying, you know +you have a distributed system when the crash of a computer you have +never heard of stops you from getting any work done. The same holds +true if fault tolerance do not work.

+ +

Just remember, in the end, it do not matter how redundant, or how +fault tolerant your storage is, if you do not continuously monitor its +status to detect and replace failed disks.

To make it even easier in the future to get this sniffer up and -running, I decided to package -the gr-gsm project -for Debian (WNPP -#871055), and the package was uploaded into the NEW queue today. -Luckily the gnuradio maintainer has promised to help me, as I do not -know much about gnuradio stuff yet.

- -

I doubt this "IMSI cacher" is anywhere near as powerfull as -commercial tools like -The -Spy Phone Portable IMSI / IMEI Catcher or the -Harris -Stingray, but I hope the existance of cheap alternatives can make -more people realise how their whereabouts when carrying a cell phone -is easily tracked. Seeing the data flow on the screen, realizing that -I live close to a police station and knowing that the police is also -wearing cell phones, I wonder how hard it would be for criminals to -track the position of the police officers to discover when there are -police near by, or for foreign military forces to track the location -of the Norwegian military forces, or for anyone to track the location -of government officials...

- -

It is worth noting that the data reported by the IMSI-catcher -script mentioned above is only a fraction of the data broadcasted on -the GSM network. It will only collect one frequency at the time, -while a typical phone will be using several frequencies, and not all -phones will be using the frequencies tracked by the grgsm_livemod -program. Also, there is a lot of radio chatter being ignored by the -simple_IMSI-catcher script, which would be collected by extending the -parser code. I wonder if gr-gsm can be set up to listen to more than -one frequency?

As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

- Tags: debian, english, personvern, surveillance. + Tags: english, raid, sysadmin.

@@ -855,6 +1122,15 @@ one frequency?