+
Recently, I needed to automatically check the copyright status of a
+set of The Internet Movie database
+(IMDB) entries, to figure out which one of the movies they refer
+to can be freely distributed on the Internet. This proved to be
+harder than it sounds. IMDB for sure list movies without any
+copyright protection, where the copyright protection has expired or
+where the movie is lisenced using a permissive license like one from
+Creative Commons. These are mixed with copyright protected movies,
+and there seem to be no way to separate these classes of movies using
+the information in IMDB.
+
+
First I tried to look up entries manually in IMDB,
+Wikipedia and
+The Internet Archive, to get a
+feel how to do this. It is hard to know for sure using these sources,
+but it should be possible to be reasonable confident a movie is "out
+of copyright" with a few hours work per movie. As I needed to check
+almost 20,000 entries, this approach was not sustainable. I simply
+can not work around the clock for about 6 years to check this data
+set.
+
+
I asked the people behind The Internet Archive if they could
+introduce a new metadata field in their metadata XML for IMDB ID, but
+was told that they leave it completely to the uploaders to update the
+metadata. Some of the metadata entries had IMDB links in the
+description, but I found no way to download all metadata files in bulk
+to locate those ones and put that approach aside.
+
+
In the process I noticed several Wikipedia articles about movies
+had links to both IMDB and The Internet Archive, and it occured to me
+that I could use the Wikipedia RDF data set to locate entries with
+both, to at least get a lower bound on the number of movies on The
+Internet Archive with a IMDB ID. This is useful based on the
+assumption that movies distributed by The Internet Archive can be
+legally distributed on the Internet. With some help from the RDF
+community (thank you DanC), I was able to come up with this query to
+pass to the SPARQL interface on
+Wikidata:
+
+
+SELECT ?work ?imdb ?ia ?when ?label
+WHERE
+{
+ ?work wdt:P31/wdt:P279* wd:Q11424.
+ ?work wdt:P345 ?imdb.
+ ?work wdt:P724 ?ia.
+ OPTIONAL {
+ ?work wdt:P577 ?when.
+ ?work rdfs:label ?label.
+ FILTER(LANG(?label) = "en").
+ }
+}
+
+
+
If I understand the query right, for every film entry anywhere in
+Wikpedia, it will return the IMDB ID and The Internet Archive ID, and
+when the movie was released and its English title, if either or both
+of the latter two are available. At the moment the result set contain
+2338 entries. Of course, it depend on volunteers including both
+correct IMDB and The Internet Archive IDs in the wikipedia articles
+for the movie. It should be noted that the result will include
+duplicates if the movie have entries in several languages. There are
+some bogus entries, either because The Internet Archive ID contain a
+typo or because the movie is not available from The Internet Archive.
+I did not verify the IMDB IDs, as I am unsure how to do that
+automatically.
+
+
I wrote a small python script to extract the data set from Wikidata
+and check if the XML metadata for the movie is available from The
+Internet Archive, and after around 1.5 hour it produced a list of 2097
+free movies and their IMDB ID. In total, 171 entries in Wikidata lack
+the refered Internet Archive entry. I assume the 70 "disappearing"
+entries (ie 2338-2097-171) are duplicate entries.
+
+
This is not too bad, given that The Internet Archive report to
+contain 5331
+feature films at the moment, but it also mean more than 3000
+movies are missing on Wikipedia or are missing the pair of references
+on Wikipedia.
+
+
I was curious about the distribution by release year, and made a
+little graph to show how the amount of free movies is spread over the
+years:
+
+

+
+
I expect the relative distribution of the remaining 3000 movies to
+be similar.
+
+
If you want to help, and want to ensure Wikipedia can be used to
+cross reference The Internet Archive and The Internet Movie Database,
+please make sure entries like this are listed under the "External
+links" heading on the Wikipedia article for the movie:
+
+
+* {{Internet Archive film|id=FightingLady}}
+* {{IMDb title|id=0036823|title=The Fighting Lady}}
+
+
+
Please verify the links on the final page, to make sure you did not
+introduce a typo.
+
+
Here is the complete list, if you want to correct the 171
+identified Wikipedia entries with broken links to The Internet
+Archive: Q1140317,
+Q458656,
+Q458656,
+Q470560,
+Q743340,
+Q822580,
+Q480696,
+Q128761,
+Q1307059,
+Q1335091,
+Q1537166,
+Q1438334,
+Q1479751,
+Q1497200,
+Q1498122,
+Q865973,
+Q834269,
+Q841781,
+Q841781,
+Q1548193,
+Q499031,
+Q1564769,
+Q1585239,
+Q1585569,
+Q1624236,
+Q4796595,
+Q4853469,
+Q4873046,
+Q915016,
+Q4660396,
+Q4677708,
+Q4738449,
+Q4756096,
+Q4766785,
+Q880357,
+Q882066,
+Q882066,
+Q204191,
+Q204191,
+Q1194170,
+Q940014,
+Q946863,
+Q172837,
+Q573077,
+Q1219005,
+Q1219599,
+Q1643798,
+Q1656352,
+Q1659549,
+Q1660007,
+Q1698154,
+Q1737980,
+Q1877284,
+Q1199354,
+Q1199354,
+Q1199451,
+Q1211871,
+Q1212179,
+Q1238382,
+Q4906454,
+Q320219,
+Q1148649,
+Q645094,
+Q5050350,
+Q5166548,
+Q2677926,
+Q2698139,
+Q2707305,
+Q2740725,
+Q2024780,
+Q2117418,
+Q2138984,
+Q1127992,
+Q1058087,
+Q1070484,
+Q1080080,
+Q1090813,
+Q1251918,
+Q1254110,
+Q1257070,
+Q1257079,
+Q1197410,
+Q1198423,
+Q706951,
+Q723239,
+Q2079261,
+Q1171364,
+Q617858,
+Q5166611,
+Q5166611,
+Q324513,
+Q374172,
+Q7533269,
+Q970386,
+Q976849,
+Q7458614,
+Q5347416,
+Q5460005,
+Q5463392,
+Q3038555,
+Q5288458,
+Q2346516,
+Q5183645,
+Q5185497,
+Q5216127,
+Q5223127,
+Q5261159,
+Q1300759,
+Q5521241,
+Q7733434,
+Q7736264,
+Q7737032,
+Q7882671,
+Q7719427,
+Q7719444,
+Q7722575,
+Q2629763,
+Q2640346,
+Q2649671,
+Q7703851,
+Q7747041,
+Q6544949,
+Q6672759,
+Q2445896,
+Q12124891,
+Q3127044,
+Q2511262,
+Q2517672,
+Q2543165,
+Q426628,
+Q426628,
+Q12126890,
+Q13359969,
+Q13359969,
+Q2294295,
+Q2294295,
+Q2559509,
+Q2559912,
+Q7760469,
+Q6703974,
+Q4744,
+Q7766962,
+Q7768516,
+Q7769205,
+Q7769988,
+Q2946945,
+Q3212086,
+Q3212086,
+Q18218448,
+Q18218448,
+Q18218448,
+Q6909175,
+Q7405709,
+Q7416149,
+Q7239952,
+Q7317332,
+Q7783674,
+Q7783704,
+Q7857590,
+Q3372526,
+Q3372642,
+Q3372816,
+Q3372909,
+Q7959649,
+Q7977485,
+Q7992684,
+Q3817966,
+Q3821852,
+Q3420907,
+Q3429733,
+Q774474
+
+
As usual, if you use Bitcoin and want to show your support of my
+activities, please send Bitcoin donations to my address
+15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
+
+