I've continued to track down list of movies that are legal to
-distribute on the Internet, and identified more than 11,000 title IDs
-in The Internet Movie Database (IMDB) so far. Most of them (57%) are
-feature films from USA published before 1923. I've also tracked down
-more than 24,000 movies I have not yet been able to map to IMDB title
-ID, so the real number could be a lot higher. According to the front
-web page for Retro Film
-Vault, there are 44,000 public domain films, so I guess there are
-still some left to identify.
-
-
The complete data set is available from
-a
-public git repository, including the scripts used to create it.
-Most of the data is collected using web scraping, for example from the
-"product catalog" of companies selling copies of public domain movies,
-but any source I find believable is used. I've so far had to throw
-out three sources because I did not trust the public domain status of
-the movies listed.
-
-
Anyway, this is the summary of the 28 collected data sources so
-far:
-
-
- 2352 entries ( 66 unique) with and 15983 without IMDB title ID in free-movies-archive-org-search.json
- 2302 entries ( 120 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json
- 195 entries ( 63 unique) with and 200 without IMDB title ID in free-movies-cinemovies.json
- 89 entries ( 52 unique) with and 38 without IMDB title ID in free-movies-creative-commons.json
- 344 entries ( 28 unique) with and 655 without IMDB title ID in free-movies-fesfilm.json
- 668 entries ( 209 unique) with and 1064 without IMDB title ID in free-movies-filmchest-com.json
- 830 entries ( 21 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json
- 19 entries ( 19 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-gb.json
- 6822 entries ( 6669 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-us.json
- 137 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-imdb-externlist.json
- 1205 entries ( 57 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json
- 84 entries ( 20 unique) with and 167 without IMDB title ID in free-movies-infodigi-pd.json
- 158 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-letterboxd-looney-tunes.json
- 113 entries ( 4 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json
- 182 entries ( 100 unique) with and 0 without IMDB title ID in free-movies-letterboxd-silent.json
- 229 entries ( 87 unique) with and 1 without IMDB title ID in free-movies-manual.json
- 44 entries ( 2 unique) with and 64 without IMDB title ID in free-movies-openflix.json
- 291 entries ( 33 unique) with and 474 without IMDB title ID in free-movies-profilms-pd.json
- 211 entries ( 7 unique) with and 0 without IMDB title ID in free-movies-publicdomainmovies-info.json
- 1232 entries ( 57 unique) with and 1875 without IMDB title ID in free-movies-publicdomainmovies-net.json
- 46 entries ( 13 unique) with and 81 without IMDB title ID in free-movies-publicdomainreview.json
- 698 entries ( 64 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json
- 1758 entries ( 882 unique) with and 3786 without IMDB title ID in free-movies-retrofilmvault.json
- 16 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-thehillproductions.json
- 63 entries ( 16 unique) with and 141 without IMDB title ID in free-movies-vodo.json
-11583 unique IMDB title IDs in total, 8724 only in one list, 24647 without IMDB title ID
-
-
-
I keep finding more data sources. I found the cinemovies source
-just a few days ago, and as you can see from the summary, it extended
-my list with 63 movies. Check out the mklist-* scripts in the git
-repository if you are curious how the lists are created. Many of the
-titles are extracted using searches on IMDB, where I look for the
-title and year, and accept search results with only one movie listed
-if the year matches. This allow me to automatically use many lists of
-movies without IMDB title ID references at the cost of increasing the
-risk of wrongly identify a IMDB title ID as public domain. So far my
-random manual checks have indicated that the method is solid, but I
-really wish all lists of public domain movies would include unique
-movie identifier like the IMDB title ID. It would make the job of
-counting movies in the public domain a lot easier.
-
-
As usual, if you use Bitcoin and want to show your support of my
-activities, please send Bitcoin donations to my address
-15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
-