+ <div class="entry">
+ <div class="title">
+ <a href="http://people.skolelinux.org/pere/blog/Metadata_proposal_for_movies_on_the_Internet_Archive.html">Metadata proposal for movies on the Internet Archive</a>
+ </div>
+ <div class="date">
+ 28th November 2017
+ </div>
+ <div class="body">
+ <p>It would be easier to locate the movie you want to watch in
+<a href="https://www.archive.org/">the Internet Archive</a>, if the
+metadata about each movie was more complete and accurate. In the
+archiving community, a well known saying state that good metadata is a
+love letter to the future. The metadata in the Internet Archive could
+use a face lift for the future to love us back. Here is a proposal
+for a small improvement that would make the metadata more useful
+today. I've been unable to find any document describing the various
+standard fields available when uploading videos to the archive, so
+this proposal is based on my best quess and searching through several
+of the existing movies.</p>
+
+<p>I have a few use cases in mind. First of all, I would like to be
+able to count the number of distinct movies in the Internet Archive,
+without duplicates. I would further like to identify the IMDB title
+ID of the movies in the Internet Archive, to be able to look up a IMDB
+title ID and know if I can fetch the video from there and share it
+with my friends.</p>
+
+<p>Second, I would like the Butter data provider for The Internet
+archive
+(<a href="https://github.com/butterproviders/butter-provider-archive">available
+from github</a>), to list as many of the good movies as possible. The
+plugin currently do a search in the archive with the following
+parameters:</p>
+
+<p><pre>
+collection:moviesandfilms
+AND NOT collection:movie_trailers
+AND -mediatype:collection
+AND format:"Archive BitTorrent"
+AND year
+</pre></p>
+
+<p>Most of the cool movies that fail to show up in Butter do so
+because the 'year' field is missing. The 'year' field is populated by
+the year part from the 'date' field, and should be when the movie was
+released (date or year). Two such examples are
+<a href="https://archive.org/details/SidneyOlcottsBen-hur1905">Ben Hur
+from 1905</a> and
+<a href="https://archive.org/details/Caminandes2GranDillama">Caminandes
+2: Gran Dillama from 2013</a>, where the year metadata field is
+missing.</p>
+
+So, my proposal is simply, for every movie in The Internet Archive
+where an IMDB title ID exist, please fill in these metadata fields
+(note, they can be updated also long after the video was uploaded, but
+as far as I can tell, only by the uploader):
+
+<dl>
+
+<dt>mediatype</dt>
+<dd>Should be 'movie' for movies.</dd>
+
+<dt>collection</dt>
+<dd>Should contain 'moviesandfilms'.</dd>
+
+<dt>title</dt>
+<dd>The title of the movie, without the publication year.</dd>
+
+<dt>date</dt>
+<dd>The data or year the movie was released. This make the movie show
+up in Butter, as well as make it possible to know the age of the
+movie and is useful to figure out copyright status.</dd>
+
+<dt>director</dt>
+<dd>The director of the movie. This make it easier to know if the
+correct movie is found in movie databases.</dd>
+
+<dt>publisher</dt>
+<dd>The production company making the movie. Also useful for
+identifying the correct movie.</dd>
+
+<dt>links</dt>
+
+<dd>Add a link to the IMDB title page, for example like this: <a
+href="http://www.imdb.com/title/tt0028496/">Movie in
+IMDB</a>. This make it easier to find duplicates and allow for
+counting of number of unique movies in the Archive. Other external
+references, like to TMDB, could be added like this too.</dd>
+
+</dl>
+
+<p>I did consider proposing a Custom field for the IMDB title ID (for
+example 'imdb_title_url', 'imdb_code' or simply 'imdb', but suspect it
+will be easier to simply place it in the links free text field.</p>
+
+<p>I created
+<a href="https://github.com/petterreinholdtsen/public-domain-free-imdb">a
+list of IMDB title IDs for several thousand movies in the Internet
+Archive</a>, but I also got a list of several thousand movies without
+such IMDB title ID (and quite a few duplicates). It would be great if
+this data set could be integrated into the Internet Archive metadata
+to be available for everyone in the future, but with the current
+policy of leaving metadata editing to the uploaders, it will take a
+while before this happen. If you have uploaded movies into the
+Internet Archive, you can help. Please consider following my proposal
+above for your movies, to ensure that movie is properly
+counted. :)</p>
+
+<p>The list is mostly generated using wikidata, which based on
+Wikipedia articles make it possible to link between IMDB and movies in
+the Internet Archive. But there are lots of movies without a
+Wikipedia article, and some movies where only a collection page exist
+(like for <a href="https://en.wikipedia.org/wiki/Caminandes">the
+Caminandes example above</a>, where there are three movies but only
+one Wikidata entry).</p>
+
+<p>As usual, if you use Bitcoin and want to show your support of my
+activities, please send Bitcoin donations to my address
+<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>
+
+ </div>
+ <div class="tags">
+
+
+ Tags: <a href="http://people.skolelinux.org/pere/blog/tags/english">english</a>, <a href="http://people.skolelinux.org/pere/blog/tags/opphavsrett">opphavsrett</a>, <a href="http://people.skolelinux.org/pere/blog/tags/verkidetfri">verkidetfri</a>.
+
+
+ </div>
+ </div>
+ <div class="padding"></div>
+
+ <div class="entry">
+ <div class="title">
+ <a href="http://people.skolelinux.org/pere/blog/Legal_to_share_more_than_3000_movies_listed_on_IMDB_.html">Legal to share more than 3000 movies listed on IMDB?</a>
+ </div>
+ <div class="date">
+ 18th November 2017
+ </div>
+ <div class="body">
+ <p>A month ago, I blogged about my work to
+<a href="http://people.skolelinux.org/pere/blog/Locating_IMDB_IDs_of_movies_in_the_Internet_Archive_using_Wikidata.html">automatically
+check the copyright status of IMDB entries</a>, and try to count the
+number of movies listed in IMDB that is legal to distribute on the
+Internet. I have continued to look for good data sources, and
+identified a few more. The code used to extract information from
+various data sources is available in
+<a href="https://github.com/petterreinholdtsen/public-domain-free-imdb">a
+git repository</a>, currently available from github.</p>
+
+<p>So far I have identified 3186 unique IMDB title IDs. To gain
+better understanding of the structure of the data set, I created a
+histogram of the year associated with each movie (typically release
+year). It is interesting to notice where the peaks and dips in the
+graph are located. I wonder why they are placed there. I suspect
+World War II caused the dip around 1940, but what caused the peak
+around 2010?</p>
+
+<p align="center"><img src="http://people.skolelinux.org/pere/blog/images/2017-11-18-verk-i-det-fri-filmer.png" /></p>
+
+<p>I've so far identified ten sources for IMDB title IDs for movies in
+the public domain or with a free license. This is the statistics
+reported when running 'make stats' in the git repository:</p>
+
+<pre>
+ 249 entries ( 6 unique) with and 288 without IMDB title ID in free-movies-archive-org-butter.json
+ 2301 entries ( 540 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json
+ 830 entries ( 29 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json
+ 2109 entries ( 377 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json
+ 291 entries ( 122 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json
+ 144 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-manual.json
+ 350 entries ( 1 unique) with and 801 without IMDB title ID in free-movies-publicdomainmovies.json
+ 4 entries ( 0 unique) with and 124 without IMDB title ID in free-movies-publicdomainreview.json
+ 698 entries ( 119 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json
+ 8 entries ( 8 unique) with and 196 without IMDB title ID in free-movies-vodo.json
+ 3186 unique IMDB title IDs in total
+</pre>
+
+<p>The entries without IMDB title ID are candidates to increase the
+data set, but might equally well be duplicates of entries already
+listed with IMDB title ID in one of the other sources, or represent
+movies that lack a IMDB title ID. I've seen examples of all these
+situations when peeking at the entries without IMDB title ID. Based
+on these data sources, the lower bound for movies listed in IMDB that
+are legal to distribute on the Internet is between 3186 and 4713.
+
+<p>It would be great for improving the accuracy of this measurement,
+if the various sources added IMDB title ID to their metadata. I have
+tried to reach the people behind the various sources to ask if they
+are interested in doing this, without any replies so far. Perhaps you
+can help me get in touch with the people behind VODO, Public Domain
+Torrents, Public Domain Movies and Public Domain Review to try to
+convince them to add more metadata to their movie entries?</p>
+
+<p>Another way you could help is by adding pages to Wikipedia about
+movies that are legal to distribute on the Internet. If such page
+exist and include a link to both IMDB and The Internet Archive, the
+script used to generate free-movies-archive-org-wikidata.json should
+pick up the mapping as soon as wikidata is updates.</p>
+
+<p>As usual, if you use Bitcoin and want to show your support of my
+activities, please send Bitcoin donations to my address
+<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>
+
+ </div>
+ <div class="tags">
+
+
+ Tags: <a href="http://people.skolelinux.org/pere/blog/tags/english">english</a>, <a href="http://people.skolelinux.org/pere/blog/tags/opphavsrett">opphavsrett</a>, <a href="http://people.skolelinux.org/pere/blog/tags/verkidetfri">verkidetfri</a>.
+
+
+ </div>
+ </div>
+ <div class="padding"></div>
+