Recently, I needed to automatically check the copyright status of a
-set of The Internet Movie database
-(IMDB) entries, to figure out which one of the movies they refer
-to can be freely distributed on the Internet. This proved to be
-harder than it sounds. IMDB for sure list movies without any
-copyright protection, where the copyright protection has expired or
-where the movie is lisenced using a permissive license like one from
-Creative Commons. These are mixed with copyright protected movies,
-and there seem to be no way to separate these classes of movies using
-the information in IMDB.
-
-
First I tried to look up entries manually in IMDB,
-Wikipedia and
-The Internet Archive, to get a
-feel how to do this. It is hard to know for sure using these sources,
-but it should be possible to be reasonable confident a movie is "out
-of copyright" with a few hours work per movie. As I needed to check
-almost 20,000 entries, this approach was not sustainable. I simply
-can not work around the clock for about 6 years to check this data
-set.
-
-
I asked the people behind The Internet Archive if they could
-introduce a new metadata field in their metadata XML for IMDB ID, but
-was told that they leave it completely to the uploaders to update the
-metadata. Some of the metadata entries had IMDB links in the
-description, but I found no way to download all metadata files in bulk
-to locate those ones and put that approach aside.
-
-
In the process I noticed several Wikipedia articles about movies
-had links to both IMDB and The Internet Archive, and it occured to me
-that I could use the Wikipedia RDF data set to locate entries with
-both, to at least get a lower bound on the number of movies on The
-Internet Archive with a IMDB ID. This is useful based on the
-assumption that movies distributed by The Internet Archive can be
-legally distributed on the Internet. With some help from the RDF
-community (thank you DanC), I was able to come up with this query to
-pass to the SPARQL interface on
-Wikidata:
-
-
-SELECT ?work ?imdb ?ia ?when ?label
-WHERE
-{
- ?work wdt:P31/wdt:P279* wd:Q11424.
- ?work wdt:P345 ?imdb.
- ?work wdt:P724 ?ia.
- OPTIONAL {
- ?work wdt:P577 ?when.
- ?work rdfs:label ?label.
- FILTER(LANG(?label) = "en").
- }
-}
-
-
-
If I understand the query right, for every film entry anywhere in
-Wikpedia, it will return the IMDB ID and The Internet Archive ID, and
-when the movie was released and its English title, if either or both
-of the latter two are available. At the moment the result set contain
-2338 entries. Of course, it depend on volunteers including both
-correct IMDB and The Internet Archive IDs in the wikipedia articles
-for the movie. It should be noted that the result will include
-duplicates if the movie have entries in several languages. There are
-some bogus entries, either because The Internet Archive ID contain a
-typo or because the movie is not available from The Internet Archive.
-I did not verify the IMDB IDs, as I am unsure how to do that
-automatically.
-
-
I wrote a small python script to extract the data set from Wikidata
-and check if the XML metadata for the movie is available from The
-Internet Archive, and after around 1.5 hour it produced a list of 2097
-free movies and their IMDB ID. In total, 171 entries in Wikidata lack
-the refered Internet Archive entry. I assume the 70 "disappearing"
-entries (ie 2338-2097-171) are duplicate entries.
-
-
This is not too bad, given that The Internet Archive report to
-contain 5331
-feature films at the moment, but it also mean more than 3000
-movies are missing on Wikipedia or are missing the pair of references
-on Wikipedia.
-
-
I was curious about the distribution by release year, and made a
-little graph to show how the amount of free movies is spread over the
-years:
-
-

-
-
I expect the relative distribution of the remaining 3000 movies to
-be similar.
-
-
If you want to help, and want to ensure Wikipedia can be used to
-cross reference The Internet Archive and The Internet Movie Database,
-please make sure entries like this are listed under the "External
-links" heading on the Wikipedia article for the movie:
-
-
-* {{Internet Archive film|id=FightingLady}}
-* {{IMDb title|id=0036823|title=The Fighting Lady}}
-
-
-
Please verify the links on the final page, to make sure you did not
-introduce a typo.
-
-
Here is the complete list, if you want to correct the 171
-identified Wikipedia entries with broken links to The Internet
-Archive: Q1140317,
-Q458656,
-Q458656,
-Q470560,
-Q743340,
-Q822580,
-Q480696,
-Q128761,
-Q1307059,
-Q1335091,
-Q1537166,
-Q1438334,
-Q1479751,
-Q1497200,
-Q1498122,
-Q865973,
-Q834269,
-Q841781,
-Q841781,
-Q1548193,
-Q499031,
-Q1564769,
-Q1585239,
-Q1585569,
-Q1624236,
-Q4796595,
-Q4853469,
-Q4873046,
-Q915016,
-Q4660396,
-Q4677708,
-Q4738449,
-Q4756096,
-Q4766785,
-Q880357,
-Q882066,
-Q882066,
-Q204191,
-Q204191,
-Q1194170,
-Q940014,
-Q946863,
-Q172837,
-Q573077,
-Q1219005,
-Q1219599,
-Q1643798,
-Q1656352,
-Q1659549,
-Q1660007,
-Q1698154,
-Q1737980,
-Q1877284,
-Q1199354,
-Q1199354,
-Q1199451,
-Q1211871,
-Q1212179,
-Q1238382,
-Q4906454,
-Q320219,
-Q1148649,
-Q645094,
-Q5050350,
-Q5166548,
-Q2677926,
-Q2698139,
-Q2707305,
-Q2740725,
-Q2024780,
-Q2117418,
-Q2138984,
-Q1127992,
-Q1058087,
-Q1070484,
-Q1080080,
-Q1090813,
-Q1251918,
-Q1254110,
-Q1257070,
-Q1257079,
-Q1197410,
-Q1198423,
-Q706951,
-Q723239,
-Q2079261,
-Q1171364,
-Q617858,
-Q5166611,
-Q5166611,
-Q324513,
-Q374172,
-Q7533269,
-Q970386,
-Q976849,
-Q7458614,
-Q5347416,
-Q5460005,
-Q5463392,
-Q3038555,
-Q5288458,
-Q2346516,
-Q5183645,
-Q5185497,
-Q5216127,
-Q5223127,
-Q5261159,
-Q1300759,
-Q5521241,
-Q7733434,
-Q7736264,
-Q7737032,
-Q7882671,
-Q7719427,
-Q7719444,
-Q7722575,
-Q2629763,
-Q2640346,
-Q2649671,
-Q7703851,
-Q7747041,
-Q6544949,
-Q6672759,
-Q2445896,
-Q12124891,
-Q3127044,
-Q2511262,
-Q2517672,
-Q2543165,
-Q426628,
-Q426628,
-Q12126890,
-Q13359969,
-Q13359969,
-Q2294295,
-Q2294295,
-Q2559509,
-Q2559912,
-Q7760469,
-Q6703974,
-Q4744,
-Q7766962,
-Q7768516,
-Q7769205,
-Q7769988,
-Q2946945,
-Q3212086,
-Q3212086,
-Q18218448,
-Q18218448,
-Q18218448,
-Q6909175,
-Q7405709,
-Q7416149,
-Q7239952,
-Q7317332,
-Q7783674,
-Q7783704,
-Q7857590,
-Q3372526,
-Q3372642,
-Q3372816,
-Q3372909,
-Q7959649,
-Q7977485,
-Q7992684,
-Q3817966,
-Q3821852,
-Q3420907,
-Q3429733,
-Q774474
-
-
As usual, if you use Bitcoin and want to show your support of my
-activities, please send Bitcoin donations to my address
-15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
-