A month ago, I blogged about my work to automatically check the
copyright status of IMDB entries, and try to count the number of
movies listed in IMDB where it is legal to distribute it the Internet.
I have continued to look for good data sources, and identified a few
more. The code used to extract information from various data sources
is available in
So far I have identified 3186 unique IMDB title IDs. To gain better understanding of the structure of the data set, I created a histogram of the year associated with each movie (typically release year). It is interesting to notice where the peaks and dips in the graph are located. I wonder why they are placed there. I suspect World Word II caused the dip around 1940, but what caused the peak around 2010?
I've so far identified ten sources for IMDB title IDs for movies in the public domain or with a free license. This is the statistics reported when running 'make stats' in the git repository:
249 entries ( 6 unique) with and 288 without IMDB title ID in free-movies-archive-org-butter.json 2301 entries ( 540 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json 830 entries ( 29 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json 2109 entries ( 377 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json 291 entries ( 122 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json 144 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-manual.json 350 entries ( 1 unique) with and 801 without IMDB title ID in free-movies-publicdomainmovies.json 4 entries ( 0 unique) with and 124 without IMDB title ID in free-movies-publicdomainreview.json 698 entries ( 119 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json 8 entries ( 8 unique) with and 196 without IMDB title ID in free-movies-vodo.json 3186 unique IMDB title IDs in total
The entries without IMDB title ID are candidates to increase the data set, but might equally well be duplicates of entries already listed with IMDB title ID in one of the other sources, or represent movies that lack a IMDB title ID. I've seen examples of all these situations when peeking at the entries without IMDB title ID. Based on these data sources, the lower bound for movies listed in IMDB that are legal to distribute on the Internet is between 3186 and 4713.
It would be great for improving the accuracy of this measurement, if the various sources added IMDB title ID to their metadata. I have tried to reach the people behind the various sources to ask if they are interested in doing this, without any positive replies so far. Perhaps you can help me get in touch with the people behind VODO, Public Domain Torrents, Public Domain Movies and Public Domain Review to try to convince them to add more metadata to their movie entries?
Another way you could help is by adding pages to Wikipedia about movies that are legal to distribute on the Internet. If such page exist and include a link to both IMDB and The Internet Archive, the script used to generate free-movies-archive-org-wikidata.json should pick up the mapping as soon as wikidata is updates.