A month ago, I blogged about my work to -automatically -check the copyright status of IMDB entries, and try to count the -number of movies listed in IMDB that is legal to distribute on the -Internet. I have continued to look for good data sources, and -identified a few more. The code used to extract information from -various data sources is available in -a -git repository, currently available from github.
- -So far I have identified 3186 unique IMDB title IDs. To gain -better understanding of the structure of the data set, I created a -histogram of the year associated with each movie (typically release -year). It is interesting to notice where the peaks and dips in the -graph are located. I wonder why they are placed there. I suspect -World Word II caused the dip around 1940, but what caused the peak -around 2010?
- -I've so far identified ten sources for IMDB title IDs for movies in -the public domain or with a free license. This is the statistics -reported when running 'make stats' in the git repository:
- -- 249 entries ( 6 unique) with and 288 without IMDB title ID in free-movies-archive-org-butter.json - 2301 entries ( 540 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json - 830 entries ( 29 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json - 2109 entries ( 377 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json - 291 entries ( 122 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json - 144 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-manual.json - 350 entries ( 1 unique) with and 801 without IMDB title ID in free-movies-publicdomainmovies.json - 4 entries ( 0 unique) with and 124 without IMDB title ID in free-movies-publicdomainreview.json - 698 entries ( 119 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json - 8 entries ( 8 unique) with and 196 without IMDB title ID in free-movies-vodo.json - 3186 unique IMDB title IDs in total -- -
The entries without IMDB title ID are candidates to increase the -data set, but might equally well be duplicates of entries already -listed with IMDB title ID in one of the other sources, or represent -movies that lack a IMDB title ID. I've seen examples of all these -situations when peeking at the entries without IMDB title ID. Based -on these data sources, the lower bound for movies listed in IMDB that -are legal to distribute on the Internet is between 3186 and 4713. - -
It would be great for improving the accuracy of this measurement, -if the various sources added IMDB title ID to their metadata. I have -tried to reach the people behind the various sources to ask if they -are interested in doing this, without any replies so far. Perhaps you -can help me get in touch with the people behind VODO, Public Domain -Torrents, Public Domain Movies and Public Domain Review to try to -convince them to add more metadata to their movie entries?
- -Another way you could help is by adding pages to Wikipedia about -movies that are legal to distribute on the Internet. If such page -exist and include a link to both IMDB and The Internet Archive, the -script used to generate free-movies-archive-org-wikidata.json should -pick up the mapping as soon as wikidata is updates.
+ +My movie playing setup involve Kodi, +OpenELEC (probably soon to be +replaced with LibreELEC) and an +Infocus IN76 video projector. My projector can be controlled via both +a infrared remote controller, and a RS-232 serial line. The vendor of +my projector, InFocus, had been +sensible enough to document the serial protocol in its user manual, so +it is easily available, and I used it some years ago to write +a +small script to control the projector. For a while now, I longed +for a setup where the projector was controlled by Kodi, for example in +such a way that when the screen saver went on, the projector was +turned off, and when the screen saver exited, the projector was turned +on again.
+ +A few days ago, with very good help from parts of my family, I +managed to find a Kodi Add-on for controlling a Epson projector, and +got in touch with its author to see if we could join forces and make a +Add-on with support for several projectors. To my pleasure, he was +positive to the idea, and we set out to add InFocus support to his +add-on, and make the add-on suitable for the official Kodi add-on +repository.
+ +The Add-on is now working (for me, at least), with a few minor +adjustments. The most important change I do relative to the master +branch in the github repository is embedding the +pyserial module in +the add-on. The long term solution is to make a "script" type +pyserial module for Kodi, that can be pulled in as a dependency in +Kodi. But until that in place, I embed it.
+ +The add-on can be configured to turn on the projector when Kodi +starts, off when Kodi stops as well as turn the projector off when the +screensaver start and on when the screesaver stops. It can also be +told to set the projector source when turning on the projector. + +
If this sound interesting to you, check out +the +project github repository. Perhaps you can send patches to +support your projector too? As soon as we find time to wrap up the +latest changes, it should be available for easy installation using any +Kodi instance.
+ +For future improvements, I would like to add projector model +detection and the ability to adjust the brightness level of the +projector from within Kodi. We also need to figure out how to handle +the cooling period of the projector. My projector refuses to turn on +for 60 seconds after it was turned off. This is not handled well by +the add-on at the moment.
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
If you care about how fault tolerant your storage is, you might -find these articles and papers interesting. They have formed how I -think of when designing a storage system.
+ +I VHS-kassettenes +tid var det rett frem å ta vare på et TV-program en ønsket å kunne se +senere, uten å være avhengig av at programmet ble sendt på nytt. +Kanskje ønsket en å se programmet på hytten der det ikke var +TV-signal, eller av andre grunner ha det tilgjengelig for fremtidig +fornøyelse. Dette er blitt vanskeligere med introduksjon av +digital-TV og webstreaming, der opptak til harddisk er utenfor de +flestes kontroll hvis de bruker ufri programvare og bokser kontrollert +av andre. Men for NRK her i Norge, finnes det heldigvis flere fri +programvare-alternativer, som jeg har +skrevet +om +før. +Så lenge kilden for nedlastingen er lovlig lagt ut på nett (hvilket +jeg antar NRK gjør), så er slik lagring til privat bruk også lovlig i +Norge.
+ +Sist jeg så på saken, i 2016, nevnte jeg at +youtube-dl ikke kunne +bake undertekster fra NRK inn i videofilene, og at jeg derfor +foretrakk andre alternativer. Nylig oppdaget jeg at dette har endret +seg. Fordelen med youtube-dl er at den er tilgjengelig direkte fra +Linux-distribusjoner som Debian +og Ubuntu, slik at en slipper å +finne ut selv hvordan en skal få dem til å virke.
+ +For å laste ned et NRK-innslag med undertekster, og få den norske +underteksten pakket inn i videofilen, så kan følgende kommando +brukes:
--
-
-
- USENIX :login; Redundancy -Does Not Imply Fault Tolerance. Analysis of Distributed Storage -Reactions to Single Errors and Corruptions by Aishwarya Ganesan, -Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi -H. Arpaci-Dusseau - -
- ZDNet -Why -RAID 5 stops working in 2009 by Robin Harris - -
- ZDNet -Why -RAID 6 stops working in 2019 by Robin Harris - -
- USENIX FAST'07 -Failure -Trends in a Large Disk Drive Population by Eduardo Pinheiro, -Wolf-Dietrich Weber and Luiz AndreÌ Barroso - -
- USENIX ;login: Data -Integrity. Finding Truth in a World of Guesses and Lies by Doug -Hughes - -
- USENIX FAST'08 -An -Analysis of Data Corruption in the Storage Stack by -L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. -Arpaci-Dusseau, and R. H. Arpaci-Dusseau - -
- USENIX FAST'07 Disk -failures in the real world: what does an MTTF of 1,000,000 hours mean -to you? by B. Schroeder and G. A. Gibson. - -
- USENIX ;login: Are -Disks the Dominant Contributor for Storage Failures? A Comprehensive -Study of Storage Subsystem Failure Characteristics by Weihang -Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky - -
- SIGMETRICS 2007 -An -analysis of latent sector errors in disk drives by -L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler - -
+youtube-dl --write-sub --sub-format ttml \ + --convert-subtitles srt --embed-subs \ + https://tv.nrk.no/serie/ramm-ferdig-gaa/MUHU11000316/27-04-2018 +-
Several of these research papers are based on data collected from -hundred thousands or millions of disk, and their findings are eye -opening. The short story is simply do not implicitly trust RAID or -redundant storage systems. Details matter. And unfortunately there -are few options on Linux addressing all the identified issues. Both -ZFS and Btrfs are doing a fairly good job, but have legal and -practical issues on their own. I wonder how cluster file systems like -Ceph do in this regard. After all, there is an old saying, you know -you have a distributed system when the crash of a computer you have -never heard of stops you from getting any work done. The same holds -true if fault tolerance do not work.
- -Just remember, in the end, it do not matter how redundant, or how -fault tolerant your storage is, if you do not continuously monitor its -status to detect and replace failed disks.
+URL-eksemplet er dagens toppsak på tv.nrk.no. Resultatet er en +MP4-fil med filmen og undertekster som kan spilles av med VLC. Merk +at VLC ikke viser frem undertekster før du aktiverer dem. For å gjøre +det, høyreklikk med musa i fremviservinduet, velg menyvalget for +undertekst og så norsk språk. Jeg testet også '--write-auto-sub', +men det kommandolinjeargumentet ser ikke ut til å fungere, så jeg +endte opp med settet med argumentlisten over, som jeg fant i en +feilrapport i youtube-dl-prosjektets samling over feilrapporter.
+ +Denne støtten i youtube-dl gjør det svært enkelt å lagre +NRK-innslag, det være seg nyheter, filmer, serier eller dokumentater, +for å ha dem tilgjengelig for fremtidig referanse og bruk, uavhengig +av hvor lenge innslagene ligger tilgjengelig hos NRK. Så får det ikke +hjelpe at NRKs jurister mener at det er +vesensforskjellig +å legge tilgjengelig for nedlasting og for streaming, når det rent +teknisk er samme sak.
+ +Programmet youtube-dl støtter også en rekke andre nettsteder, se +prosjektoversikten for +en +komplett liste.
I was surprised today to learn that a friend in academia did not -know there are easily available web services available for writing -LaTeX documents as a team. I thought it was common knowledge, but to -make sure at least my readers are aware of it, I would like to mention -these useful services for writing LaTeX documents. Some of them even -provide a WYSIWYG editor to ease writing even further.
- -There are two commercial services available, -ShareLaTeX and -Overleaf. They are very easy to -use. Just start a new document, select which publisher to write for -(ie which LaTeX style to use), and start writing. Note, these two -have announced their intention to join forces, so soon it will only be -one joint service. I've used both for different documents, and they -work just fine. While -ShareLaTeX is free -software, while the latter is not. According to a -announcement from Overleaf, they plan to keep the ShareLaTeX code -base maintained as free software.
- -But these two are not the only alternatives. -Fidus Writer is another free -software solution with the -source available on github. I have not used it myself. Several -others can be found on the nice -alterntiveTo -web service. - -If you like Google Docs or Etherpad, but would like to write -documents in LaTeX, you should check out these services. You can even -host your own, if you want to. :)
- + +VG, +Dagbladet +og +NRK +melder i dag at flertallet i Familie- og kulturkomiteen på Stortinget +har bestemt seg for å introdusere en ny sensurinfrastruktur i Norge. +Fra før har Norge en «frivillig» sensurinfrastruktur basert på +DNS-navn, der de største ISP-ene basert på en liste med DNS-navn +forgifter DNS-svar og omdirigerer til et annet IP-nummer enn det som +ligger i DNS. Nå kommer altså IP-basert omdirigering i tillegg. Når +infrastrukturen er på plass, er sensur av IP-adresser redusert et +spørsmål om hvilke IP-nummer som skal blokkeres. Listen over +IP-adresser vil naturligvis endre seg etter hvert som myndighetene +endrer seg. Det er ingen betryggende tanke.
Recently, I needed to automatically check the copyright status of a -set of The Internet Movie database -(IMDB) entries, to figure out which one of the movies they refer -to can be freely distributed on the Internet. This proved to be -harder than it sounds. IMDB for sure list movies without any -copyright protection, where the copyright protection has expired or -where the movie is lisenced using a permissive license like one from -Creative Commons. These are mixed with copyright protected movies, -and there seem to be no way to separate these classes of movies using -the information in IMDB.
- -First I tried to look up entries manually in IMDB, -Wikipedia and -The Internet Archive, to get a -feel how to do this. It is hard to know for sure using these sources, -but it should be possible to be reasonable confident a movie is "out -of copyright" with a few hours work per movie. As I needed to check -almost 20,000 entries, this approach was not sustainable. I simply -can not work around the clock for about 6 years to check this data -set.
- -I asked the people behind The Internet Archive if they could -introduce a new metadata field in their metadata XML for IMDB ID, but -was told that they leave it completely to the uploaders to update the -metadata. Some of the metadata entries had IMDB links in the -description, but I found no way to download all metadata files in bulk -to locate those ones and put that approach aside.
- -In the process I noticed several Wikipedia articles about movies -had links to both IMDB and The Internet Archive, and it occured to me -that I could use the Wikipedia RDF data set to locate entries with -both, to at least get a lower bound on the number of movies on The -Internet Archive with a IMDB ID. This is useful based on the -assumption that movies distributed by The Internet Archive can be -legally distributed on the Internet. With some help from the RDF -community (thank you DanC), I was able to come up with this query to -pass to the SPARQL interface on -Wikidata: - -
-SELECT ?work ?imdb ?ia ?when ?label -WHERE -{ - ?work wdt:P31/wdt:P279* wd:Q11424. - ?work wdt:P345 ?imdb. - ?work wdt:P724 ?ia. - OPTIONAL { - ?work wdt:P577 ?when. - ?work rdfs:label ?label. - FILTER(LANG(?label) = "en"). - } -} -- -
If I understand the query right, for every film entry anywhere in -Wikpedia, it will return the IMDB ID and The Internet Archive ID, and -when the movie was released and its English title, if either or both -of the latter two are available. At the moment the result set contain -2338 entries. Of course, it depend on volunteers including both -correct IMDB and The Internet Archive IDs in the wikipedia articles -for the movie. It should be noted that the result will include -duplicates if the movie have entries in several languages. There are -some bogus entries, either because The Internet Archive ID contain a -typo or because the movie is not available from The Internet Archive. -I did not verify the IMDB IDs, as I am unsure how to do that -automatically.
- -I wrote a small python script to extract the data set from Wikidata -and check if the XML metadata for the movie is available from The -Internet Archive, and after around 1.5 hour it produced a list of 2097 -free movies and their IMDB ID. In total, 171 entries in Wikidata lack -the refered Internet Archive entry. I assume the 70 "disappearing" -entries (ie 2338-2097-171) are duplicate entries.
- -This is not too bad, given that The Internet Archive report to -contain 5331 -feature films at the moment, but it also mean more than 3000 -movies are missing on Wikipedia or are missing the pair of references -on Wikipedia.
- -I was curious about the distribution by release year, and made a -little graph to show how the amount of free movies is spread over the -years:
- -
I expect the relative distribution of the remaining 3000 movies to -be similar.
- -If you want to help, and want to ensure Wikipedia can be used to -cross reference The Internet Archive and The Internet Movie Database, -please make sure entries like this are listed under the "External -links" heading on the Wikipedia article for the movie:
- --* {{Internet Archive film|id=FightingLady}} -* {{IMDb title|id=0036823|title=The Fighting Lady}} -- -
Please verify the links on the final page, to make sure you did not -introduce a typo.
- -Here is the complete list, if you want to correct the 171 -identified Wikipedia entries with broken links to The Internet -Archive: Q1140317, -Q458656, -Q458656, -Q470560, -Q743340, -Q822580, -Q480696, -Q128761, -Q1307059, -Q1335091, -Q1537166, -Q1438334, -Q1479751, -Q1497200, -Q1498122, -Q865973, -Q834269, -Q841781, -Q841781, -Q1548193, -Q499031, -Q1564769, -Q1585239, -Q1585569, -Q1624236, -Q4796595, -Q4853469, -Q4873046, -Q915016, -Q4660396, -Q4677708, -Q4738449, -Q4756096, -Q4766785, -Q880357, -Q882066, -Q882066, -Q204191, -Q204191, -Q1194170, -Q940014, -Q946863, -Q172837, -Q573077, -Q1219005, -Q1219599, -Q1643798, -Q1656352, -Q1659549, -Q1660007, -Q1698154, -Q1737980, -Q1877284, -Q1199354, -Q1199354, -Q1199451, -Q1211871, -Q1212179, -Q1238382, -Q4906454, -Q320219, -Q1148649, -Q645094, -Q5050350, -Q5166548, -Q2677926, -Q2698139, -Q2707305, -Q2740725, -Q2024780, -Q2117418, -Q2138984, -Q1127992, -Q1058087, -Q1070484, -Q1080080, -Q1090813, -Q1251918, -Q1254110, -Q1257070, -Q1257079, -Q1197410, -Q1198423, -Q706951, -Q723239, -Q2079261, -Q1171364, -Q617858, -Q5166611, -Q5166611, -Q324513, -Q374172, -Q7533269, -Q970386, -Q976849, -Q7458614, -Q5347416, -Q5460005, -Q5463392, -Q3038555, -Q5288458, -Q2346516, -Q5183645, -Q5185497, -Q5216127, -Q5223127, -Q5261159, -Q1300759, -Q5521241, -Q7733434, -Q7736264, -Q7737032, -Q7882671, -Q7719427, -Q7719444, -Q7722575, -Q2629763, -Q2640346, -Q2649671, -Q7703851, -Q7747041, -Q6544949, -Q6672759, -Q2445896, -Q12124891, -Q3127044, -Q2511262, -Q2517672, -Q2543165, -Q426628, -Q426628, -Q12126890, -Q13359969, -Q13359969, -Q2294295, -Q2294295, -Q2559509, -Q2559912, -Q7760469, -Q6703974, -Q4744, -Q7766962, -Q7768516, -Q7769205, -Q7769988, -Q2946945, -Q3212086, -Q3212086, -Q18218448, -Q18218448, -Q18218448, -Q6909175, -Q7405709, -Q7416149, -Q7239952, -Q7317332, -Q7783674, -Q7783704, -Q7857590, -Q3372526, -Q3372642, -Q3372816, -Q3372909, -Q7959649, -Q7977485, -Q7992684, -Q3817966, -Q3821852, -Q3420907, -Q3429733, -Q774474
+ +Brevpost er beskyttet av straffelovens bestemmelse som gjør det +kriminelt å åpne andres brev. Dette følger av (ny) straffelovs +§ 205 +(Krenkelse av retten til privat kommunikasjon), som sier at «Med +bot eller fengsel inntil 2 år straffes den som uberettiget ... c) +åpner brev eller annen lukket skriftlig meddelelse som er adressert +til en annen, eller på annen måte skaffer seg uberettiget tilgang til +innholdet.» Dette gjelder såvel postbud som alle andre som har +befatning med brevet etter at avsender har befatning med et lukket +brev. Tilsvarende står også tidligere utgaver av den norske +straffeloven.
+ +Når en registrerer seg på usikre digitale postkasseløsningene, som +f.eks. Digipost og e-Boks, og slik tar disse i bruk, så gir en de som +står bak løsningene tillatelse til å åpne sine brev. Dette er +nødvendig for at innholdet i digital post skal kunne vises frem til +mottaker via tjenestens websider. Dermed gjelder ikke straffelovens +paragraf om forbud mot å åpne brev, da tilgangen ikke lenger er +uberettiget. En gir altså fremmede tilgang til å lese sin +korrespondanse. I tillegg vil bruk av slike usikre digitale +postbokser føre til at det blir registrert når du leser brevene, hvor +du befinner deg (vha. tilkoblingens IP-adresse), hvilket utstyr du +bruker og en rekke annen personlig informasjon som ikke er +tilgjengelig når papirpost brukes. Jeg foretrekker at det er +lovmessig beskyttelse av min korrespondanse, som jo inneholder privat +og personlig informasjon. Det bidrar til litt bedre vern av personlig +integritet i dagens norske samfunn.
I find it fascinating how many of the people being locked inside -the proposed border wall between USA and Mexico support the idea. The -proposal to keep Mexicans out reminds me of -the -propaganda twist from the East Germany government calling the wall -the âAntifascist Bulwarkâ after erecting the Berlin Wall, claiming -that the wall was erected to keep enemies from creeping into East -Germany, while it was obvious to the people locked inside it that it -was erected to keep the people from escaping.
- -Do the people in USA supporting this wall really believe it is a -one way wall, only keeping people on the outside from getting in, -while not keeping people in the inside from getting out?
+ +The leaders of the worlds have started to congratulate the +re-elected Russian head of state, and this causes some criticism. I +am though a little fascinated by a comment from USA senator John McCain, +sited +by The Hill and others: + +
++ +"An American president does not lead the Free World by +congratulating dictators on winning sham elections."
+
While I totally agree with the senator here, the way the quote is +phrased make me suspect that he is unaware of the simple fact that USA +have not lead the Free World since at least before its government +kidnapped a +completely innocent Canadian citizen in transit on his way home to +Canada via John F. Kennedy International Airport in September 2002 and +sent him to be tortured in Syria for a year.
+ +USA might be running ahead, but the path they are taking is not the +one taken by any Free World.