1 <?xml version=
"1.0" encoding=
"ISO-8859-1"?>
2 <rss version='
2.0' xmlns:lj='http://www.livejournal.org/rss/lj/
1.0/'
>
4 <title>Petter Reinholdtsen - Entries from November
2017</title>
5 <description>Entries from November
2017</description>
6 <link>http://people.skolelinux.org/pere/blog/
</link>
10 <title>Legal to share more than
3000 movies listed on IMDB?
</title>
11 <link>http://people.skolelinux.org/pere/blog/Legal_to_share_more_than_3000_movies_listed_on_IMDB_.html
</link>
12 <guid isPermaLink=
"true">http://people.skolelinux.org/pere/blog/Legal_to_share_more_than_3000_movies_listed_on_IMDB_.html
</guid>
13 <pubDate>Sat,
18 Nov
2017 21:
20:
00 +
0100</pubDate>
14 <description><p
>A month ago, I blogged about my work to
15 <a href=
"http://people.skolelinux.org/pere/blog/Locating_IMDB_IDs_of_movies_in_the_Internet_Archive_using_Wikidata.html
">automatically
16 check the copyright status of IMDB entries
</a
>, and try to count the
17 number of movies listed in IMDB that is legal to distribute on the
18 Internet. I have continued to look for good data sources, and
19 identified a few more. The code used to extract information from
20 various data sources is available in
21 <a href=
"https://github.com/petterreinholdtsen/public-domain-free-imdb
">a
22 git repository
</a
>, currently available from github.
</p
>
24 <p
>So far I have identified
3186 unique IMDB title IDs. To gain
25 better understanding of the structure of the data set, I created a
26 histogram of the year associated with each movie (typically release
27 year). It is interesting to notice where the peaks and dips in the
28 graph are located. I wonder why they are placed there. I suspect
29 World War II caused the dip around
1940, but what caused the peak
30 around
2010?
</p
>
32 <p align=
"center
"><img src=
"http://people.skolelinux.org/pere/blog/images/
2017-
11-
18-verk-i-det-fri-filmer.png
" /
></p
>
34 <p
>I
've so far identified ten sources for IMDB title IDs for movies in
35 the public domain or with a free license. This is the statistics
36 reported when running
'make stats
' in the git repository:
</p
>
39 249 entries (
6 unique) with and
288 without IMDB title ID in free-movies-archive-org-butter.json
40 2301 entries (
540 unique) with and
0 without IMDB title ID in free-movies-archive-org-wikidata.json
41 830 entries (
29 unique) with and
0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json
42 2109 entries (
377 unique) with and
0 without IMDB title ID in free-movies-imdb-pd.json
43 291 entries (
122 unique) with and
0 without IMDB title ID in free-movies-letterboxd-pd.json
44 144 entries (
135 unique) with and
0 without IMDB title ID in free-movies-manual.json
45 350 entries (
1 unique) with and
801 without IMDB title ID in free-movies-publicdomainmovies.json
46 4 entries (
0 unique) with and
124 without IMDB title ID in free-movies-publicdomainreview.json
47 698 entries (
119 unique) with and
118 without IMDB title ID in free-movies-publicdomaintorrents.json
48 8 entries (
8 unique) with and
196 without IMDB title ID in free-movies-vodo.json
49 3186 unique IMDB title IDs in total
52 <p
>The entries without IMDB title ID are candidates to increase the
53 data set, but might equally well be duplicates of entries already
54 listed with IMDB title ID in one of the other sources, or represent
55 movies that lack a IMDB title ID. I
've seen examples of all these
56 situations when peeking at the entries without IMDB title ID. Based
57 on these data sources, the lower bound for movies listed in IMDB that
58 are legal to distribute on the Internet is between
3186 and
4713.
60 <p
>It would be great for improving the accuracy of this measurement,
61 if the various sources added IMDB title ID to their metadata. I have
62 tried to reach the people behind the various sources to ask if they
63 are interested in doing this, without any replies so far. Perhaps you
64 can help me get in touch with the people behind VODO, Public Domain
65 Torrents, Public Domain Movies and Public Domain Review to try to
66 convince them to add more metadata to their movie entries?
</p
>
68 <p
>Another way you could help is by adding pages to Wikipedia about
69 movies that are legal to distribute on the Internet. If such page
70 exist and include a link to both IMDB and The Internet Archive, the
71 script used to generate free-movies-archive-org-wikidata.json should
72 pick up the mapping as soon as wikidata is updates.
</p
>
74 <p
>As usual, if you use Bitcoin and want to show your support of my
75 activities, please send Bitcoin donations to my address
76 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>
81 <title>Some notes on fault tolerant storage systems
</title>
82 <link>http://people.skolelinux.org/pere/blog/Some_notes_on_fault_tolerant_storage_systems.html
</link>
83 <guid isPermaLink=
"true">http://people.skolelinux.org/pere/blog/Some_notes_on_fault_tolerant_storage_systems.html
</guid>
84 <pubDate>Wed,
1 Nov
2017 15:
35:
00 +
0100</pubDate>
85 <description><p
>If you care about how fault tolerant your storage is, you might
86 find these articles and papers interesting. They have formed how I
87 think of when designing a storage system.
</p
>
91 <li
>USENIX :login;
<a
92 href=
"https://www.usenix.org/publications/login/summer2017/ganesan
">Redundancy
93 Does Not Imply Fault Tolerance. Analysis of Distributed Storage
94 Reactions to Single Errors and Corruptions
</a
> by Aishwarya Ganesan,
95 Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi
96 H. Arpaci-Dusseau
</li
>
99 <a href=
"http://www.zdnet.com/article/why-raid-
5-stops-working-in-
2009/
">Why
100 RAID
5 stops working in
2009</a
> by Robin Harris
</li
>
103 <a href=
"http://www.zdnet.com/article/why-raid-
6-stops-working-in-
2019/
">Why
104 RAID
6 stops working in
2019</a
> by Robin Harris
</li
>
106 <li
>USENIX FAST
'07
107 <a href=
"http://research.google.com/archive/disk_failures.pdf
">Failure
108 Trends in a Large Disk Drive Population
</a
> by Eduardo Pinheiro,
109 Wolf-Dietrich Weber and Luiz AndreĢ Barroso
</li
>
111 <li
>USENIX ;login:
<a
112 href=
"https://www.usenix.org/system/files/login/articles/hughes12-
04.pdf
">Data
113 Integrity. Finding Truth in a World of Guesses and Lies
</a
> by Doug
116 <li
>USENIX FAST
'08
117 <a href=
"https://www.usenix.org/events/fast08/tech/full_papers/bairavasundaram/bairavasundaram_html/
">An
118 Analysis of Data Corruption in the Storage Stack
</a
> by
119 L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C.
120 Arpaci-Dusseau, and R. H. Arpaci-Dusseau
</li
>
122 <li
>USENIX FAST
'07 <a
123 href=
"https://www.usenix.org/legacy/events/fast07/tech/schroeder/schroeder_html/
">Disk
124 failures in the real world: what does an MTTF of
1,
000,
000 hours mean
125 to you?
</a
> by B. Schroeder and G. A. Gibson.
</li
>
127 <li
>USENIX ;login:
<a
128 href=
"https://www.usenix.org/events/fast08/tech/full_papers/jiang/jiang_html/
">Are
129 Disks the Dominant Contributor for Storage Failures? A Comprehensive
130 Study of Storage Subsystem Failure Characteristics
</a
> by Weihang
131 Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky
</li
>
133 <li
>SIGMETRICS
2007
134 <a href=
"http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.pdf
">An
135 analysis of latent sector errors in disk drives
</a
> by
136 L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler
</li
>
140 <p
>Several of these research papers are based on data collected from
141 hundred thousands or millions of disk, and their findings are eye
142 opening. The short story is simply do not implicitly trust RAID or
143 redundant storage systems. Details matter. And unfortunately there
144 are few options on Linux addressing all the identified issues. Both
145 ZFS and Btrfs are doing a fairly good job, but have legal and
146 practical issues on their own. I wonder how cluster file systems like
147 Ceph do in this regard. After all, there is an old saying, you know
148 you have a distributed system when the crash of a computer you have
149 never heard of stops you from getting any work done. The same holds
150 true if fault tolerance do not work.
</p
>
152 <p
>Just remember, in the end, it do not matter how redundant, or how
153 fault tolerant your storage is, if you do not continuously monitor its
154 status to detect and replace failed disks.
</p
>
156 <p
>As usual, if you use Bitcoin and want to show your support of my
157 activities, please send Bitcoin donations to my address
158 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>