I've continued to track down list of movies that are legal to +distribute on the Internet, and identified more than 11,000 title IDs +in The Internet Movie Database so far. Most of them (57%) are feature +films from USA published before 1923. I've also tracked down more +than 24,000 movies I have not yet been able to map to IMDB title ID, +so the real number could be a lot higher. According to the front web +page for Retro Film Vault, +there are 44,000 public domain films, so I guess there are still some +left to identify.
+ +The complete data set is available from +a +public git repository, including the scripts used to create it. +Most of the data is collected using web scraping, for example from the +"product catalog" of companies selling copies of public domain movies, +but any source I find believable is used. I've so far had to throw +out three sources because I did not trust the public domain status of +the movies listed.
+ +Anyway, this is the summary of the 28 collected data sources so +far:
+ ++ 2352 entries ( 66 unique) with and 15983 without IMDB title ID in free-movies-archive-org-search.json + 2302 entries ( 120 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json + 195 entries ( 63 unique) with and 200 without IMDB title ID in free-movies-cinemovies.json + 89 entries ( 52 unique) with and 38 without IMDB title ID in free-movies-creative-commons.json + 344 entries ( 28 unique) with and 655 without IMDB title ID in free-movies-fesfilm.json + 668 entries ( 209 unique) with and 1064 without IMDB title ID in free-movies-filmchest-com.json + 830 entries ( 21 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json + 19 entries ( 19 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-gb.json + 6822 entries ( 6669 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-us.json + 137 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-imdb-externlist.json + 1205 entries ( 57 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json + 84 entries ( 20 unique) with and 167 without IMDB title ID in free-movies-infodigi-pd.json + 158 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-letterboxd-looney-tunes.json + 113 entries ( 4 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json + 182 entries ( 100 unique) with and 0 without IMDB title ID in free-movies-letterboxd-silent.json + 229 entries ( 87 unique) with and 1 without IMDB title ID in free-movies-manual.json + 44 entries ( 2 unique) with and 64 without IMDB title ID in free-movies-openflix.json + 291 entries ( 33 unique) with and 474 without IMDB title ID in free-movies-profilms-pd.json + 211 entries ( 7 unique) with and 0 without IMDB title ID in free-movies-publicdomainmovies-info.json + 1232 entries ( 57 unique) with and 1875 without IMDB title ID in free-movies-publicdomainmovies-net.json + 46 entries ( 13 unique) with and 81 without IMDB title ID in free-movies-publicdomainreview.json + 698 entries ( 64 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json + 1758 entries ( 882 unique) with and 3786 without IMDB title ID in free-movies-retrofilmvault.json + 16 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-thehillproductions.json + 63 entries ( 16 unique) with and 141 without IMDB title ID in free-movies-vodo.json +11583 unique IMDB title IDs in total, 8724 only in one list, 24647 without IMDB title ID ++ +
I keep finding more data sources. I found the cinemovies source +just a few days ago, and as you can see from the summary, it extended +my list with 63 movies. Check out the mklist-* scripts in the git +repository if you are curious how the lists are created. Many of the +titles are extracted using searches on IMDB, where I look for the +title and year, and accept search results with only one movie listed +if the year matches. This allow me to automatically use many lists of +movies without IMDB title ID references at the cost of increasing the +risk of wrongly identify a IMDB title ID as public domain. So far my +random manual checks have indicated that the method is solid, but I +really wish all lists of public domain movies would include unique +movie identifier like the IMDB title ID. It would make the job of +counting movies in the public domain a lot easier.
+