]> pere.pagekite.me Git - homepage.git/blob - blog/archive/2017/11/11.rss
Generated.
[homepage.git] / blog / archive / 2017 / 11 / 11.rss
1 <?xml version="1.0" encoding="ISO-8859-1"?>
2 <rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/'>
3 <channel>
4 <title>Petter Reinholdtsen - Entries from November 2017</title>
5 <description>Entries from November 2017</description>
6 <link>http://people.skolelinux.org/pere/blog/</link>
7
8
9 <item>
10 <title>Legal to share more than 3000 movies listed on IMDB?</title>
11 <link>http://people.skolelinux.org/pere/blog/Legal_to_share_more_than_3000_movies_listed_on_IMDB_.html</link>
12 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Legal_to_share_more_than_3000_movies_listed_on_IMDB_.html</guid>
13 <pubDate>Sat, 18 Nov 2017 21:20:00 +0100</pubDate>
14 <description>&lt;p&gt;A month ago, I blogged about my work to automatically check the
15 copyright status of IMDB entries, and try to count the number of
16 movies listed in IMDB where it is legal to distribute it the Internet.
17 I have continued to look for good data sources, and identified a few
18 more. The code used to extract information from various data sources
19 is available in
20 &lt;ahref=&quot;https://github.com/petterreinholdtsen/public-domain-free-imdb&quot;&gt;a
21 git repository&lt;/a&gt;, currently available from github.&lt;/p&gt;
22
23 &lt;p&gt;So far I have identified 3186 unique IMDB title IDs. To gain
24 better understanding of the structure of the data set, I created a
25 histogram of the year associated with each movie (typically release
26 year). It is interesting to notice where the peaks and dips in the
27 graph are located. I wonder why they are placed there. I suspect
28 World Word II caused the dip around 1940, but what caused the peak
29 around 2010?&lt;/p&gt;
30
31 &lt;p&gt;&lt;img src=&quot;http://people.skolelinux.org/pere/blog/images/2017-11-18-verk-i-det-fri-filmer.png&quot; /&gt;&lt;/p&gt;
32
33 &lt;p&gt;I&#39;ve so far identified ten sources for IMDB title IDs for movies in
34 the public domain or with a free license. This is the statistics
35 reported when running &#39;make stats&#39; in the git repository:&lt;/p&gt;
36
37 &lt;pre&gt;
38 249 entries ( 6 unique) with and 288 without IMDB title ID in free-movies-archive-org-butter.json
39 2301 entries ( 540 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json
40 830 entries ( 29 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json
41 2109 entries ( 377 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json
42 291 entries ( 122 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json
43 144 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-manual.json
44 350 entries ( 1 unique) with and 801 without IMDB title ID in free-movies-publicdomainmovies.json
45 4 entries ( 0 unique) with and 124 without IMDB title ID in free-movies-publicdomainreview.json
46 698 entries ( 119 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json
47 8 entries ( 8 unique) with and 196 without IMDB title ID in free-movies-vodo.json
48 3186 unique IMDB title IDs in total
49 &lt;/pre&gt;
50
51 &lt;p&gt;The entries without IMDB title ID are candidates to increase the
52 data set, but might equally well be duplicates of entries already
53 listed with IMDB title ID in one of the other sources, or represent
54 movies that lack a IMDB title ID. I&#39;ve seen examples of all these
55 situations when peeking at the entries without IMDB title ID. Based
56 on these data sources, the lower bound for movies listed in IMDB that
57 are legal to distribute on the Internet is between 3186 and 4713.
58
59 &lt;p&gt;It would be great for improving the accuracy of this measurement,
60 if the various sources added IMDB title ID to their metadata. I have
61 tried to reach the people behind the various sources to ask if they
62 are interested in doing this, without any positive replies so far.
63 Perhaps you can help me get in touch with the people behind VODO,
64 Public Domain Torrents, Public Domain Movies and Public Domain Review
65 to try to convince them to add more metadata to their movie entries?&lt;/p&gt;
66
67 &lt;p&gt;Another way you could help is by adding pages to Wikipedia about
68 movies that are legal to distribute on the Internet. If such page
69 exist and include a link to both IMDB and The Internet Archive, the
70 script used to generate free-movies-archive-org-wikidata.json should
71 pick up the mapping as soon as wikidata is updates.&lt;/p&gt;
72 </description>
73 </item>
74
75 <item>
76 <title>Some notes on fault tolerant storage systems</title>
77 <link>http://people.skolelinux.org/pere/blog/Some_notes_on_fault_tolerant_storage_systems.html</link>
78 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Some_notes_on_fault_tolerant_storage_systems.html</guid>
79 <pubDate>Wed, 1 Nov 2017 15:35:00 +0100</pubDate>
80 <description>&lt;p&gt;If you care about how fault tolerant your storage is, you might
81 find these articles and papers interesting. They have formed how I
82 think of when designing a storage system.&lt;/p&gt;
83
84 &lt;ul&gt;
85
86 &lt;li&gt;USENIX :login; &lt;a
87 href=&quot;https://www.usenix.org/publications/login/summer2017/ganesan&quot;&gt;Redundancy
88 Does Not Imply Fault Tolerance. Analysis of Distributed Storage
89 Reactions to Single Errors and Corruptions&lt;/a&gt; by Aishwarya Ganesan,
90 Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi
91 H. Arpaci-Dusseau&lt;/li&gt;
92
93 &lt;li&gt;ZDNet
94 &lt;a href=&quot;http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/&quot;&gt;Why
95 RAID 5 stops working in 2009&lt;/a&gt; by Robin Harris&lt;/li&gt;
96
97 &lt;li&gt;ZDNet
98 &lt;a href=&quot;http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/&quot;&gt;Why
99 RAID 6 stops working in 2019&lt;/a&gt; by Robin Harris&lt;/li&gt;
100
101 &lt;li&gt;USENIX FAST&#39;07
102 &lt;a href=&quot;http://research.google.com/archive/disk_failures.pdf&quot;&gt;Failure
103 Trends in a Large Disk Drive Population&lt;/a&gt; by Eduardo Pinheiro,
104 Wolf-Dietrich Weber and Luiz André Barroso&lt;/li&gt;
105
106 &lt;li&gt;USENIX ;login: &lt;a
107 href=&quot;https://www.usenix.org/system/files/login/articles/hughes12-04.pdf&quot;&gt;Data
108 Integrity. Finding Truth in a World of Guesses and Lies&lt;/a&gt; by Doug
109 Hughes&lt;/li&gt;
110
111 &lt;li&gt;USENIX FAST&#39;08
112 &lt;a href=&quot;https://www.usenix.org/events/fast08/tech/full_papers/bairavasundaram/bairavasundaram_html/&quot;&gt;An
113 Analysis of Data Corruption in the Storage Stack&lt;/a&gt; by
114 L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C.
115 Arpaci-Dusseau, and R. H. Arpaci-Dusseau&lt;/li&gt;
116
117 &lt;li&gt;USENIX FAST&#39;07 &lt;a
118 href=&quot;https://www.usenix.org/legacy/events/fast07/tech/schroeder/schroeder_html/&quot;&gt;Disk
119 failures in the real world: what does an MTTF of 1,000,000 hours mean
120 to you?&lt;/a&gt; by B. Schroeder and G. A. Gibson.&lt;/li&gt;
121
122 &lt;li&gt;USENIX ;login: &lt;a
123 href=&quot;https://www.usenix.org/events/fast08/tech/full_papers/jiang/jiang_html/&quot;&gt;Are
124 Disks the Dominant Contributor for Storage Failures? A Comprehensive
125 Study of Storage Subsystem Failure Characteristics&lt;/a&gt; by Weihang
126 Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky&lt;/li&gt;
127
128 &lt;li&gt;SIGMETRICS 2007
129 &lt;a href=&quot;http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.pdf&quot;&gt;An
130 analysis of latent sector errors in disk drives&lt;/a&gt; by
131 L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler&lt;/li&gt;
132
133 &lt;/ul&gt;
134
135 &lt;p&gt;Several of these research papers are based on data collected from
136 hundred thousands or millions of disk, and their findings are eye
137 opening. The short story is simply do not implicitly trust RAID or
138 redundant storage systems. Details matter. And unfortunately there
139 are few options on Linux addressing all the identified issues. Both
140 ZFS and Btrfs are doing a fairly good job, but have legal and
141 practical issues on their own. I wonder how cluster file systems like
142 Ceph do in this regard. After all, there is an old saying, you know
143 you have a distributed system when the crash of a computer you have
144 never heard of stops you from getting any work done. The same holds
145 true if fault tolerance do not work.&lt;/p&gt;
146
147 &lt;p&gt;Just remember, in the end, it do not matter how redundant, or how
148 fault tolerant your storage is, if you do not continuously monitor its
149 status to detect and replace failed disks.&lt;/p&gt;
150 </description>
151 </item>
152
153 </channel>
154 </rss>