Over the years, administrating thousand of NFS mounting linux -computers at the time, I often needed a way to detect if the machine -was experiencing NFS hang. If you try to use df or look at a -file or directory affected by the hang, the process (and possibly the -shell) will hang too. So you want to be able to detect this without -risking the detection process getting stuck too. It has not been -obvious how to do this. When the hang has lasted a while, it is -possible to find messages like these in dmesg:
+ +Jeg lar meg fascinere av en artikkel +i +Dagbladet om Kinas håndtering av Xinjiang, spesielt følgende +utsnitt:
-nfs: server nfsserver not responding, still trying -- -
nfs: server nfsserver OK + +«I den sørvestlige byen Kashgar nærmere grensa til +Sentral-Asia meldes det nå at 120.000 uigurer er internert i såkalte +omskoleringsleirer. Samtidig er det innført et omfattende +helsesjekk-program med innsamling og lagring av DNA-prøver fra +absolutt alle innbyggerne. De mest avanserte overvåkingsmetodene +testes ut her. Programmer for å gjenkjenne ansikter og stemmer er på +plass i regionen. Der har de lokale myndighetene begynt å installere +GPS-systemer i alle kjøretøy og egne sporingsapper i +mobiltelefoner.
+ +Politimetodene griper så dypt inn i folks dagligliv at motstanden +mot Beijing-regimet øker.»
+
It is hard to know if the hang is still going on, and it is hard to -be sure looking in dmesg is going to work. If there are lots of other -messages in dmesg the lines might have rotated out of site before they -are noticed.
- -While reading through the nfs client implementation in linux kernel -code, I came across some statistics that seem to give a way to detect -it. The om_timeouts sunrpc value in the kernel will increase every -time the above log entry is inserted into dmesg. And after digging a -bit further, I discovered that this value show up in -/proc/self/mountstats on Linux.
- -The mountstats content seem to be shared between files using the -same file system context, so it is enough to check one of the -mountstats files to get the state of the mount point for the machine. -I assume this will not show lazy umounted NFS points, nor NFS mount -points in a different process context (ie with a different filesystem -view), but that does not worry me.
- -The content for a NFS mount point look similar to this:
- -- --[...] -device /dev/mapper/Debian-var mounted on /var with fstype ext3 -device nfsserver:/mnt/nfsserver/home0 mounted on /mnt/nfsserver/home0 with fstype nfs statvers=1.1 - opts: rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=129.240.3.145,mountvers=3,mountport=4048,mountproto=udp,local_lock=all - age: 7863311 - caps: caps=0x3fe7,wtmult=4096,dtsize=8192,bsize=0,namlen=255 - sec: flavor=1,pseudoflavor=1 - events: 61063112 732346265 1028140 35486205 16220064 8162542 761447191 71714012 37189 3891185 45561809 110486139 4850138 420353 15449177 296502 52736725 13523379 0 52182 9016896 1231 0 0 0 0 0 - bytes: 166253035039 219519120027 0 0 40783504807 185466229638 11677877 45561809 - RPC iostats version: 1.0 p/v: 100003/3 (nfs) - xprt: tcp 925 1 6810 0 0 111505412 111480497 109 2672418560317 0 248 53869103 22481820 - per-op statistics - NULL: 0 0 0 0 0 0 0 0 - GETATTR: 61063106 61063108 0 9621383060 6839064400 453650 77291321 78926132 - SETATTR: 463469 463470 0 92005440 66739536 63787 603235 687943 - LOOKUP: 17021657 17021657 0 3354097764 4013442928 57216 35125459 35566511 - ACCESS: 14281703 14290009 5 2318400592 1713803640 1709282 4865144 7130140 - READLINK: 125 125 0 20472 18620 0 1112 1118 - READ: 4214236 4214237 0 715608524 41328653212 89884 22622768 22806693 - WRITE: 8479010 8494376 22 187695798568 1356087148 178264904 51506907 231671771 - CREATE: 171708 171708 0 38084748 46702272 873 1041833 1050398 - MKDIR: 3680 3680 0 773980 993920 26 23990 24245 - SYMLINK: 903 903 0 233428 245488 6 5865 5917 - MKNOD: 80 80 0 20148 21760 0 299 304 - REMOVE: 429921 429921 0 79796004 61908192 3313 2710416 2741636 - RMDIR: 3367 3367 0 645112 484848 22 5782 6002 - RENAME: 466201 466201 0 130026184 121212260 7075 5935207 5961288 - LINK: 289155 289155 0 72775556 67083960 2199 2565060 2585579 - READDIR: 2933237 2933237 0 516506204 13973833412 10385 3190199 3297917 - READDIRPLUS: 1652839 1652839 0 298640972 6895997744 84735 14307895 14448937 - FSSTAT: 6144 6144 0 1010516 1032192 51 9654 10022 - FSINFO: 2 2 0 232 328 0 1 1 - PATHCONF: 1 1 0 116 140 0 0 0 - COMMIT: 0 0 0 0 0 0 0 0 - -device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc -[...] -
The key number to look at is the third number in the per-op list. -It is the number of NFS timeouts experiences per file system -operation. Here 22 write timeouts and 5 access timeouts. If these -numbers are increasing, I believe the machine is experiencing NFS -hang. Unfortunately the timeout value do not start to increase right -away. The NFS operations need to time out first, and this can take a -while. The exact timeout value depend on the setup. For example the -defaults for TCP and UDP mount points are quite different, and the -timeout value is affected by the soft, hard, timeo and retrans NFS -mount options.
- -The only way I have been able to get working on Debian and RedHat
-Enterprise Linux for getting the timeout count is to peek in /proc/.
-But according to
-
Is there a better way to figure out if a Linux NFS client is -experiencing NFS hangs? Is there a way to detect which processes are -affected? Is there a way to get the NFS mount going quickly once the -network problem causing the NFS hang has been cleared? I would very -much welcome some clues, as we regularly run into NFS hangs.
+ +Beskrivelsen avviker jo desverre ikke så veldig mye fra tilstanden +her i Norge.
+ +Dataregistrering | +Kina | +Norge | + +
---|---|---|
Innsamling og lagring av DNA-prøver fra befolkningen | +Ja | +Delvis, planlagt for alle nyfødte. | +
Ansiktsgjenkjenning | +Ja | +Ja | +
Stemmegjenkjenning | +Ja | +Nei | +
Posisjons-sporing av mobiltelefoner | +Ja | +Ja | +
Posisjons-sporing av biler | +Ja | +Ja | +
I Norge har jo situasjonen rundt Folkehelseinstituttets lagring av +DNA-informasjon på vegne av politiet, der de nektet å slette +informasjon politiet ikke hadde lov til å ta vare på, gjort det klart +at DNA tar vare på ganske lenge. I tillegg finnes det utallige +biobanker som lagres til evig tid, og det er planer om å innføre +evig +lagring av DNA-materiale fra alle spebarn som fødes (med mulighet +for å be om sletting).
+ +I Norge er det system på plass for ansiktsgjenkjenning, som +en +NRK-artikkel fra 2015 forteller er aktiv på Gardermoen, samt +brukes +til å analysere bilder innsamlet av myndighetene. Brukes det også +flere plasser? Det er tett med overvåkningskamera kontrollert av +politi og andre myndigheter i for eksempel Oslo sentrum.
+ +Jeg er ikke kjent med at Norge har noe system for identifisering av +personer ved hjelp av stemmegjenkjenning.
+ +Posisjons-sporing av mobiltelefoner er ruinemessig tilgjengelig for +blant annet politi, NAV og Finanstilsynet, i tråd med krav i +telefonselskapenes konsesjon. I tillegg rapporterer smarttelefoner +sin posisjon til utviklerne av utallige mobil-apper, der myndigheter +og andre kan hente ut informasjon ved behov. Det er intet behov for +noen egen app for dette.
+ +Posisjons-sporing av biler er rutinemessig tilgjengelig via et tett +nett av målepunkter på veiene (automatiske bomstasjoner, +køfribrikke-registrering, automatiske fartsmålere og andre veikamera). +Det er i tillegg vedtatt at alle nye biler skal selges med utstyr for +GPS-sporing (eCall).
+ +Det er jammen godt vi lever i et liberalt demokrati, og ikke en +overvåkningsstat, eller?
So the new president in the United States of America claim to be -surprised to discover that he was wiretapped during the election -before he was elected president. He even claim this must be illegal. -Well, doh, if it is one thing the confirmations from Snowden -documented, it is that the entire population in USA is wiretapped, one -way or another. Of course the president candidates were wiretapped, -alongside the senators, judges and the rest of the people in USA.
- -Next, the Federal Bureau of Investigation ask the Department of -Justice to go public rejecting the claims that Donald Trump was -wiretapped illegally. I fail to see the relevance, given that I am -sure the surveillance industry in USA believe they have all the legal -backing they need to conduct mass surveillance on the entire -world.
- -There is even the director of the FBI stating that he never saw an -order requesting wiretapping of Donald Trump. That is not very -surprising, given how the FISA court work, with all its activity being -secret. Perhaps he only heard about it?
- -What I find most sad in this story is how Norwegian journalists -present it. In a news reports the other day in the radio from the -Norwegian National broadcasting Company (NRK), I heard the journalist -claim that 'the FBI denies any wiretapping', while the reality is that -'the FBI denies any illegal wiretapping'. There is a fundamental and -important difference, and it make me sad that the journalists are -unable to grasp it.
- -Update 2017-03-13: Look like -The -Intercept report that US Senator Rand Paul confirm what I state above.
+ +
We write 2018, and it is 30 years since Unicode was introduced. +Most of us in Norway have come to expect the use of our alphabet to +just work with any computer system. But it is apparently beyond reach +of the computers printing recites at a restaurant. Recently I visited +a Peppes pizza resturant, and noticed a few details on the recite. +Notice how 'ø' and 'å' are replaced with strange symbols in +'Servitør', 'à BETALE', 'Beløp pr. gjest', 'Takk for besøket.' and 'Vi +gleder oss til å se deg igjen'.
+ +I would say that this state is passed sad and over in embarrassing.
+ +I removed personal and private information to be nice.
For almost a year now, we have been working on making a Norwegian -Bokmål edition of The Debian -Administrator's Handbook. Now, thanks to the tireless effort of -Ole-Erik, Ingrid and Andreas, the initial translation is complete, and -we are working on the proof reading to ensure consistent language and -use of correct computer science terms. The plan is to make the book -available on paper, as well as in electronic form. For that to -happen, the proof reading must be completed and all the figures need -to be translated. If you want to help out, get in touch.
- -A - -fresh PDF edition in A4 format (the final book will have smaller -pages) of the book created every morning is available for -proofreading. If you find any errors, please -visit -Weblate and correct the error. The -state -of the translation including figures is a useful source for those -provide Norwegian bokmål screen shots and figures.
+ +I've continued to track down list of movies that are legal to +distribute on the Internet, and identified more than 11,000 title IDs +in The Internet Movie Database (IMDB) so far. Most of them (57%) are +feature films from USA published before 1923. I've also tracked down +more than 24,000 movies I have not yet been able to map to IMDB title +ID, so the real number could be a lot higher. According to the front +web page for Retro Film +Vault, there are 44,000 public domain films, so I guess there are +still some left to identify.
+ +The complete data set is available from +a +public git repository, including the scripts used to create it. +Most of the data is collected using web scraping, for example from the +"product catalog" of companies selling copies of public domain movies, +but any source I find believable is used. I've so far had to throw +out three sources because I did not trust the public domain status of +the movies listed.
+ +Anyway, this is the summary of the 28 collected data sources so +far:
+ ++ 2352 entries ( 66 unique) with and 15983 without IMDB title ID in free-movies-archive-org-search.json + 2302 entries ( 120 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json + 195 entries ( 63 unique) with and 200 without IMDB title ID in free-movies-cinemovies.json + 89 entries ( 52 unique) with and 38 without IMDB title ID in free-movies-creative-commons.json + 344 entries ( 28 unique) with and 655 without IMDB title ID in free-movies-fesfilm.json + 668 entries ( 209 unique) with and 1064 without IMDB title ID in free-movies-filmchest-com.json + 830 entries ( 21 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json + 19 entries ( 19 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-gb.json + 6822 entries ( 6669 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-us.json + 137 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-imdb-externlist.json + 1205 entries ( 57 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json + 84 entries ( 20 unique) with and 167 without IMDB title ID in free-movies-infodigi-pd.json + 158 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-letterboxd-looney-tunes.json + 113 entries ( 4 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json + 182 entries ( 100 unique) with and 0 without IMDB title ID in free-movies-letterboxd-silent.json + 229 entries ( 87 unique) with and 1 without IMDB title ID in free-movies-manual.json + 44 entries ( 2 unique) with and 64 without IMDB title ID in free-movies-openflix.json + 291 entries ( 33 unique) with and 474 without IMDB title ID in free-movies-profilms-pd.json + 211 entries ( 7 unique) with and 0 without IMDB title ID in free-movies-publicdomainmovies-info.json + 1232 entries ( 57 unique) with and 1875 without IMDB title ID in free-movies-publicdomainmovies-net.json + 46 entries ( 13 unique) with and 81 without IMDB title ID in free-movies-publicdomainreview.json + 698 entries ( 64 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json + 1758 entries ( 882 unique) with and 3786 without IMDB title ID in free-movies-retrofilmvault.json + 16 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-thehillproductions.json + 63 entries ( 16 unique) with and 141 without IMDB title ID in free-movies-vodo.json +11583 unique IMDB title IDs in total, 8724 only in one list, 24647 without IMDB title ID ++ +
I keep finding more data sources. I found the cinemovies source +just a few days ago, and as you can see from the summary, it extended +my list with 63 movies. Check out the mklist-* scripts in the git +repository if you are curious how the lists are created. Many of the +titles are extracted using searches on IMDB, where I look for the +title and year, and accept search results with only one movie listed +if the year matches. This allow me to automatically use many lists of +movies without IMDB title ID references at the cost of increasing the +risk of wrongly identify a IMDB title ID as public domain. So far my +random manual checks have indicated that the method is solid, but I +really wish all lists of public domain movies would include unique +movie identifier like the IMDB title ID. It would make the job of +counting movies in the public domain a lot easier.
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
A few days ago I ordered a small batch of -the ChaosKey, a small -USB dongle for generating entropy created by Bdale Garbee and Keith -Packard. Yesterday it arrived, and I am very happy to report that it -work great! According to its designers, to get it to work out of the -box, you need the Linux kernel version 4.1 or later. I tested on a -Debian Stretch machine (kernel version 4.9), and there it worked just -fine, increasing the available entropy very quickly. I wrote a small -test oneliner to test. It first print the current entropy level, -drain /dev/random, and then print the entropy level for five seconds. -Here is the situation without the ChaosKey inserted:
- -- --% cat /proc/sys/kernel/random/entropy_avail; \ - dd bs=1M if=/dev/random of=/dev/null count=1; \ - for n in $(seq 1 5); do \ - cat /proc/sys/kernel/random/entropy_avail; \ - sleep 1; \ - done -300 -0+1 oppføringer inn -0+1 oppføringer ut -28 byte kopiert, 0,000264565 s, 106 kB/s -4 -8 -12 -17 -21 -% -
The entropy level increases by 3-4 every second. In such case any -application requiring random bits (like a HTTPS enabled web server) -will halt and wait for more entrpy. And here is the situation with -the ChaosKey inserted:
- -- --% cat /proc/sys/kernel/random/entropy_avail; \ - dd bs=1M if=/dev/random of=/dev/null count=1; \ - for n in $(seq 1 5); do \ - cat /proc/sys/kernel/random/entropy_avail; \ - sleep 1; \ - done -1079 -0+1 oppføringer inn -0+1 oppføringer ut -104 byte kopiert, 0,000487647 s, 213 kB/s -433 -1028 -1031 -1035 -1038 -% -
Quite the difference. :) I bought a few more than I need, in case -someone want to buy one here in Norway. :)
- -Update: The dongle was presented at Debconf last year. You might -find the talk -recording illuminating. It explains exactly what the source of -randomness is, if you are unable to spot it from the schema drawing -available from the ChaosKey web site linked at the start of this blog -post.
+ +I gÃ¥r var jeg i Follo tingrett som sakkyndig vitne og presenterte + mine undersøkelser rundt + telling + av filmverk i det fri, relatert til + foreningen NUUGs involvering i + saken om + Ãkokrims beslag og senere inndragning av DNS-domenet + popcorn-time.no. Jeg snakket om flere ting, men mest om min + vurdering av hvordan filmbransjen har mÃ¥lt hvor ulovlig Popcorn Time + er. Filmbransjens mÃ¥ling er sÃ¥ vidt jeg kan se videreformidlet uten + endringer av norsk politi, og domstolene har lagt mÃ¥lingen til grunn + nÃ¥r de har vurdert Popcorn Time bÃ¥de i Norge og i utlandet (tallet + 99% er referert ogsÃ¥ i utenlandske domsavgjørelser).
+ +I forkant av mitt vitnemål skrev jeg et notat, mest til meg selv, + med de punktene jeg ønsket å få frem. Her er en kopi av notatet jeg + skrev og ga til aktoratet. Merkelig nok ville ikke dommerene ha + notatet, så hvis jeg forsto rettsprosessen riktig ble kun + histogram-grafen lagt inn i dokumentasjonen i saken. Dommerne var + visst bare interessert i å forholde seg til det jeg sa i retten, + ikke det jeg hadde skrevet i forkant. Uansett så antar jeg at flere + enn meg kan ha glede av teksten, og publiserer den derfor her. + Legger ved avskrift av dokument 09,13, som er det sentrale + dokumentet jeg kommenterer.
+ +Kommentarer til «Evaluation of (il)legality» for Popcorn + Time
+ +Oppsummering
+ +MÃ¥lemetoden som Ãkokrim har lagt til grunn nÃ¥r de pÃ¥stÃ¥r at 99% av + filmene tilgjengelig fra Popcorn Time deles ulovlig har + svakheter.
+ +De eller den som har vurdert hvorvidt filmer kan lovlig deles har + ikke lyktes med Ã¥ identifisere filmer som kan deles lovlig og har + tilsynelatende antatt at kun veldig gamle filmer kan deles lovlig. + Ãkokrim legger til grunn at det bare finnes èn film, Charlie + Chaplin-filmen «The Circus» fra 1928, som kan deles fritt blant de + som ble observert tilgjengelig via ulike Popcorn Time-varianter. + Jeg finner tre flere blant de observerte filmene: «The Brain That + Wouldn't Die» fra 1962, «Godâs Little Acre» fra 1958 og «She Wore a + Yellow Ribbon» fra 1949. Det er godt mulig det finnes flere. Det + finnes dermed minst fire ganger sÃ¥ mange filmer som lovlig kan deles + pÃ¥ Internett i datasettet Ãkokrim har lagt til grunn nÃ¥r det pÃ¥stÃ¥s + at mindre enn 1 % kan deles lovlig.
+ +Dernest, utplukket som gjøres ved søk på tilfeldige ord hentet fra + ordlisten til Dale-Chall avviker fra årsfordelingen til de brukte + filmkatalogene som helhet, hvilket påvirker fordelingen mellom + filmer som kan lovlig deles og filmer som ikke kan lovlig deles. I + tillegg gir valg av øvre del (de fem første) av søkeresultatene et + avvik fra riktig årsfordeling, hvilket påvirker fordelingen av verk + i det fri i søkeresultatet.
+ +Det som måles er ikke (u)lovligheten knyttet til bruken av Popcorn + Time, men (u)lovligheten til innholdet i bittorrent-filmkataloger + som vedlikeholdes uavhengig av Popcorn Time.
+ +Omtalte dokumenter: 09,12, 09,13, 09,14, +09,18, 09,19, 09,20.
+ +Utfyllende kommentarer
+ +Ãkokrim har forklart domstolene at minst 99% av alt som er + tilgjengelig fra ulike Popcorn Time-varianter deles ulovlig pÃ¥ + Internet. Jeg ble nysgjerrig pÃ¥ hvordan de er kommet frem til dette + tallet, og dette notatet er en samling kommentarer rundt mÃ¥lingen + Ãkokrim henviser til. Litt av bakgrunnen for at jeg valgte Ã¥ se pÃ¥ + saken er at jeg er interessert i Ã¥ identifisere og telle hvor mange + kunstneriske verk som er falt i det fri eller av andre grunner kan + lovlig deles pÃ¥ Internett, og dermed var interessert i hvordan en + hadde funnet den ene prosenten som kanskje deles lovlig.
+ +Andelen på 99% kommer fra et ukreditert og udatert notatet som tar + mål av seg å dokumentere en metode for å måle hvor (u)lovlig ulike + Popcorn Time-varianter er.
+ +Raskt oppsummert, så forteller metodedokumentet at på grunn av at + det ikke er mulig å få tak i komplett liste over alle filmtitler + tilgjengelig via Popcorn Time, så lages noe som skal være et + representativt utvalg ved å velge 50 søkeord større enn tre tegn fra + ordlisten kjent som Dale-Chall. For hvert søkeord gjøres et søk og + de første fem filmene i søkeresultatet samles inn inntil 100 unike + filmtitler er funnet. Hvis 50 søkeord ikke var tilstrekkelig for å + nå 100 unike filmtitler ble flere filmer fra hvert søkeresultat lagt + til. Hvis dette heller ikke var tilstrekkelig, så ble det hentet ut + og søkt på flere tilfeldig valgte søkeord inntil 100 unike + filmtitler var identifisert.
+ +Deretter ble for hver av filmtitlene «vurdert hvorvidt det var + rimelig å forvente om at verket var vernet av copyright, ved å se på + om filmen var tilgjengelig i IMDB, samt se på regissør, + utgivelsesår, når det var utgitt for bestemte markedsområder samt + hvilke produksjons- og distribusjonsselskap som var registrert» (min + oversettelse).
+ +Metoden er gjengitt både i de ukrediterte dokumentene 09,13 og + 09,19, samt beskrevet fra side 47 i dokument 09,20, lysark datert + 2017-02-01. Sistnevnte er kreditert Geerart Bourlon fra Motion + Picture Association EMEA. Metoden virker å ha flere svakheter som + gir resultatene en slagside. Den starter med å slå fast at det ikke + er mulig å hente ut en komplett liste over alle filmtitler som er + tilgjengelig, og at dette er bakgrunnen for metodevalget. Denne + forutsetningen er ikke i tråd med det som står i dokument 09,12, som + ikke heller har oppgitt forfatter og dato. Dokument 09,12 forteller + hvordan hele kataloginnholdet ble lasted ned og talt opp. Dokument + 09,12 er muligens samme rapport som ble referert til i dom fra Oslo + Tingrett 2017-11-03 + (sak + 17-093347TVI-OTIR/05) som rapport av 1. juni 2017 av Alexander + Kind Petersen, men jeg har ikke sammenlignet dokumentene ord for ord + for å kontrollere dette.
+ +IMDB er en forkortelse for The Internet Movie Database, en + anerkjent kommersiell nettjeneste som brukes aktivt av både + filmbransjen og andre til å holde rede på hvilke spillefilmer (og + endel andre filmer) som finnes eller er under produksjon, og + informasjon om disse filmene. Datakvaliteten er høy, med få feil og + få filmer som mangler. IMDB viser ikke informasjon om + opphavsrettslig status for filmene på infosiden for hver film. Som + del av IMDB-tjenesten finnes det lister med filmer laget av + frivillige som lister opp det som antas å være verk i det fri.
+ +Det finnes flere kilder som kan brukes til å finne filmer som er + allemannseie (public domain) eller har bruksvilkår som gjør det + lovlig for alleå dele dem på Internett. Jeg har de siste ukene + forsøkt å samle og krysskoble disse listene for å forsøke å telle + antall filmer i det fri. Ved å ta utgangspunkt i slike lister (og + publiserte filmer for Internett-arkivets del), har jeg så langt + klart å identifisere over 11 000 filmer, hovedsaklig spillefilmer. + +
De aller fleste oppføringene er hentet fra IMDB selv, basert på det + faktum at alle filmer laget i USA før 1923 er falt i det fri. + Tilsvarende tidsgrense for Storbritannia er 1912-07-01, men dette + utgjør bare veldig liten del av spillefilmene i IMDB (19 totalt). + En annen stor andel kommer fra Internett-arkivet, der jeg har + identifisert filmer med referanse til IMDB. Internett-arkivet, som + holder til i USA, har som + policy å kun publisere + filmer som det er lovlig å distribuere. Jeg har under arbeidet + kommet over flere filmer som har blitt fjernet fra + Internett-arkivet, hvilket gjør at jeg konkluderer med at folkene + som kontrollerer Internett-arkivet har et aktivt forhold til å kun + ha lovlig innhold der, selv om det i stor grad er drevet av + frivillige. En annen stor liste med filmer kommer fra det + kommersielle selskapet Retro Film Vault, som selger allemannseide + filmer til TV- og filmbransjen, Jeg har også benyttet meg av lister + over filmer som hevdes å være allemannseie, det være seg Public + Domain Review, Public Domain Torrents og Public Domain Movies (.net + og .info), samt lister over filmer med Creative Commons-lisensiering + fra Wikipedia, VODO og The Hill Productions. Jeg har gjort endel + stikkontroll ved å vurdere filmer som kun omtales på en liste. Der + jeg har funnet feil som har gjort meg i tvil om vurderingen til de + som har laget listen har jeg forkastet listen fullstendig (gjelder + en liste fra IMDB).
+ +Ved å ta utgangspunkt i verk som kan antas å være lovlig delt på + Internett (fra blant annet Internett-arkivet, Public Domain + Torrents, Public Domain Reivew og Public Domain Movies), og knytte + dem til oppføringer i IMDB, så har jeg så langt klart å identifisere + over 11 000 filmer (hovedsaklig spillefilmer) det er grunn til å tro + kan lovlig distribueres av alle på Internett. Som ekstra kilder er + det brukt lister over filmer som antas/påstås å være allemannseie. + Disse kildene kommer fra miljøer som jobber for å gjøre tilgjengelig + for almennheten alle verk som er falt i det fri eller har + bruksvilkår som tillater deling. + +
I tillegg til de over 11 000 filmene der tittel-ID i IMDB er + identifisert, har jeg funnet mer enn 20 000 oppføringer der jeg ennå + ikke har hatt kapasitet til å spore opp tittel-ID i IMDB. Noen av + disse er nok duplikater av de IMDB-oppføringene som er identifisert + så langt, men neppe alle. Retro Film Vault hevder å ha 44 000 + filmverk i det fri i sin katalog, så det er mulig at det reelle + tallet er betydelig høyere enn de jeg har klart å identifisere så + langt. Konklusjonen er at tallet 11 000 er nedre grense for hvor + mange filmer i IMDB som kan lovlig deles på Internett. I følge statistikk fra IMDB er det 4.6 + millioner titler registrert, hvorav 3 millioner er TV-serieepisoder. + Jeg har ikke funnet ut hvordan de fordeler seg per år.
+ +Hvis en fordeler på år alle tittel-IDene i IMDB som hevdes å lovlig + kunne deles på Internett, får en følgende histogram:
+ +En kan i histogrammet se at effekten av manglende registrering + eller fornying av registrering er at mange filmer gitt ut i USA før + 1978 er allemannseie i dag. I tillegg kan en se at det finnes flere + filmer gitt ut de siste årene med bruksvilkår som tillater deling, + muligens på grunn av fremveksten av + Creative + Commons-bevegelsen..
+ +For maskinell analyse av katalogene har jeg laget et lite program + som kobler seg til bittorrent-katalogene som brukes av ulike Popcorn + Time-varianter og laster ned komplett liste over filmer i + katalogene, noe som bekrefter at det er mulig å hente ned komplett + liste med alle filmtitler som er tilgjengelig. Jeg har sett på fire + bittorrent-kataloger. Den ene brukes av klienten tilgjengelig fra + www.popcorntime.sh og er navngitt 'sh' i dette dokumentet. Den + andre brukes i følge dokument 09,12 av klienten tilgjengelig fra + popcorntime.ag og popcorntime.sh og er navngitt 'yts' i dette + dokumentet. Den tredje brukes av websidene tilgjengelig fra + popcorntime-online.tv og er navngitt 'apidomain' i dette dokumentet. + Den fjerde brukes av klienten tilgjenglig fra popcorn-time.to i + følge dokument 09,12, og er navngitt 'ukrfnlge' i dette + dokumentet.
+ +Metoden Ãkokrim legger til grunn skriver i sitt punkt fire at + skjønn er en egnet metode for Ã¥ finne ut om en film kan lovlig deles + pÃ¥ Internett eller ikke, og sier at det ble «vurdert hvorvidt det + var rimelig Ã¥ forvente om at verket var vernet av copyright». For + det første er det ikke nok Ã¥ slÃ¥ fast om en film er «vernet av + copyright» for Ã¥ vite om det er lovlig Ã¥ dele den pÃ¥ Internett eller + ikke, da det finnes flere filmer med opphavsrettslige bruksvilkÃ¥r + som tillater deling pÃ¥ Internett. Eksempler pÃ¥ dette er Creative + Commons-lisensierte filmer som Citizenfour fra 2014 og Sintel fra + 2010. I tillegg til slike finnes det flere filmer som nÃ¥ er + allemannseie (public domain) pÃ¥ grunn av manglende registrering + eller fornying av registrering selv om bÃ¥de regisør, + produksjonsselskap og distributør ønsker seg vern. Eksempler pÃ¥ + dette er Plan 9 from Outer Space fra 1959 og Night of the Living + Dead fra 1968. Alle filmer fra USA som var allemannseie før + 1989-03-01 forble i det fri da Bern-konvensjonen, som tok effekt i + USA pÃ¥ det tidspunktet, ikke ble gitt tilbakevirkende kraft. Hvis + det er noe + historien + om sangen «Happy birthday» forteller oss, der betaling for bruk + har vært krevd inn i flere tiÃ¥r selv om sangen ikke egentlig var + vernet av Ã¥ndsverksloven, sÃ¥ er det at hvert enkelt verk mÃ¥ vurderes + nøye og i detalj før en kan slÃ¥ fast om verket er allemannseie eller + ikke, det holder ikke Ã¥ tro pÃ¥ selverklærte rettighetshavere. Flere + eksempel pÃ¥ verk i det fri som feilklassifiseres som vernet er fra + dokument 09,18, som lister opp søkeresultater for klienten omtalt + som popcorntime.sh og i følge notatet kun inneholder en film (The + Circus fra 1928) som under tvil kan antas Ã¥ være allemannseie.
+ +Ved rask gjennomlesning av dokument 09,18, som inneholder + skjermbilder fra bruk av en Popcorn Time-variant, fant jeg omtalt + bÃ¥de filmen «The Brain That Wouldn't Die» fra 1962 som er + tilgjengelig + fra Internett-arkivet og som + i + følge Wikipedia er allemannseie i USA da den ble gitt ut i + 1962 uten 'copyright'-merking, og filmen «Godâs Little Acre» fra + 1958 som + er lagt ut pÃ¥ Wikipedia, der det fortelles at + sort/hvit-utgaven er allemannseie. Det fremgÃ¥r ikke fra dokument + 09,18 om filmen omtalt der er sort/hvit-utgaven. Av + kapasitetsÃ¥rsaker og pÃ¥ grunn av at filmoversikten i dokument 09,18 + ikke er maskinlesbart har jeg ikke forsøkt Ã¥ sjekke alle filmene som + listes opp der om mot liste med filmer som er antatt lovlig kan + distribueres pÃ¥ Internet.
+ +Ved maskinell gjennomgang av listen med IMDB-referanser under + regnearkfanen 'Unique titles' i dokument 09.14, fant jeg i tillegg + filmen «She Wore a Yellow Ribbon» fra 1949) som nok også er + feilklassifisert. Filmen «She Wore a Yellow Ribbon» er tilgjengelig + fra Internett-arkivet og markert som allemannseie der. Det virker + dermed å være minst fire ganger så mange filmer som kan lovlig deles + på Internett enn det som er lagt til grunn når en påstår at minst + 99% av innholdet er ulovlig. Jeg ser ikke bort fra at nærmere + undersøkelser kan avdekke flere. Poenget er uansett at metodens + punkt om «rimelig å forvente om at verket var vernet av copyright» + gjør metoden upålitelig.
+ +Den omtalte målemetoden velger ut tilfeldige søketermer fra + ordlisten Dale-Chall. Den ordlisten inneholder 3000 enkle engelske + som fjerdeklassinger i USA er forventet å forstå. Det fremgår ikke + hvorfor akkurat denne ordlisten er valgt, og det er uklart for meg + om den er egnet til å få et representativt utvalg av filmer. Mange + av ordene gir tomt søkeresultat. Ved å simulerte tilsvarende søk + ser jeg store avvik fra fordelingen i katalogen for enkeltmålinger. + Dette antyder at enkeltmålinger av 100 filmer slik målemetoden + beskriver er gjort, ikke er velegnet til å finne andel ulovlig + innhold i bittorrent-katalogene.
+ +En kan motvirke dette store avviket for enkeltmålinger ved å gjøre + mange søk og slå sammen resultatet. Jeg har testet ved å + gjennomføre 100 enkeltmålinger (dvs. måling av (100x100=) 10 000 + tilfeldig valgte filmer) som gir mindre, men fortsatt betydelig + avvik, i forhold til telling av filmer pr år i hele katalogen.
+ +Målemetoden henter ut de fem øverste i søkeresultatet. + Søkeresultatene er sortert på antall bittorrent-klienter registrert + som delere i katalogene, hvilket kan gi en slagside mot hvilke + filmer som er populære blant de som bruker bittorrent-katalogene, + uten at det forteller noe om hvilket innhold som er tilgjengelig + eller hvilket innhold som deles med Popcorn Time-klienter. Jeg har + forsøkt å måle hvor stor en slik slagside eventuelt er ved å + sammenligne fordelingen hvis en tar de 5 nederste i søkeresultatet i + stedet. Avviket for disse to metodene for endel kataloger er godt + synlig på histogramet. Her er histogram over filmer funnet i den + komplette katalogen (grønn strek), og filmer funnet ved søk etter + ord i Dale-Chall. Grafer merket 'top' henter fra de 5 første i + søkeresultatet, mens de merket 'bottom' henter fra de 5 siste. En + kan her se at resultatene påvirkes betydelig av hvorvidt en ser på + de første eller de siste filmene i et søketreff.
+ +
+
+
+
+
+
+
+
+
+
+
+
+
Det er verdt Ã¥ bemerke at de omtalte bittorrent-katalogene ikke er + laget for bruk med Popcorn Time. Eksempelvis tilhører katalogen + YTS, som brukes av klientet som ble lastes ned fra popcorntime.sh, + et selvstendig fildelings-relatert nettsted YTS.AG med et separat + brukermiljø. MÃ¥lemetoden foreslÃ¥tt av Ãkokrim mÃ¥ler dermed ikke + (u)lovligheten rundt bruken av Popcorn Time, men (u)lovligheten til + innholdet i disse katalogene.
+ ++ +
Metoden fra Ãkokrims dokument 09,13 i straffesaken +om DNS-beslag.
+ +1. Evaluation of (il)legality
+ +1.1. Methodology + +
Due to its technical configuration, Popcorn Time applications don't +allow to make a full list of all titles made available. In order to +evaluate the level of illegal operation of PCT, the following +methodology was applied:
+ +-
+
+
- A random selection of 50 keywords, greater than 3 letters, was + made from the Dale-Chall list that contains 3000 simple English + words1. The selection was made by using a Random Number + Generator2. + +
- For each keyword, starting with the first randomly selected + keyword, a search query was conducted in the movie section of the + respective Popcorn Time application. For each keyword, the first + five results were added to the title list until the number of 100 + unique titles was reached (duplicates were removed). + +
- For one fork, .CH, insufficient titles were generated via this + approach to reach 100 titles. This was solved by adding any + additional query results above five for each of the 50 keywords. + Since this still was not enough, another 42 random keywords were + selected to finally reach 100 titles. + +
- It was verified whether or not there is a reasonable expectation + that the work is copyrighted by checking if they are available on + IMDb, also verifying the director, the year when the title was + released, the release date for a certain market, the production + company/ies of the title and the distribution company/ies. + +
1.2. Results
+ +Between 6 and 9 June 2016, four forks of Popcorn Time were +investigated: popcorn-time.to, popcorntime.ag, popcorntime.sh and +popcorntime.ch. An excel sheet with the results is included in +Appendix 1. Screenshots were secured in separate Appendixes for each +respective fork, see Appendix 2-5.
+ +For each fork, out of 100, de-duplicated titles it was possible to +retrieve data according to the parameters set out above that indicate +that the title is commercially available. Per fork, there was 1 title +that presumably falls within the public domain, i.e. the 1928 movie +"The Circus" by and with Charles Chaplin.
+ +Based on the above it is reasonable to assume that 99% of the movie +content of each fork is copyright protected and is made available +illegally.
+ +This exercise was not repeated for TV series, but considering that +besides production companies and distribution companies also +broadcasters may have relevant rights, it is reasonable to assume that +at least a similar level of infringement will be established.
+ +Based on the above it is reasonable to assume that 99% of all the +content of each fork is copyright protected and are made available +illegally.
I just noticed -the -new Norwegian proposal for archiving rules in the goverment list -ECMA-376 -/ ISO/IEC 29500 (aka OOXML) as valid formats to put in long term -storage. Luckily such files will only be accepted based on -pre-approval from the National Archive. Allowing OOXML files to be -used for long term storage might seem like a good idea as long as we -forget that there are plenty of ways for a "valid" OOXML document to -have content with no defined interpretation in the standard, which -lead to a question and an idea.
- -Is there any tool to detect if a OOXML document depend on such -undefined behaviour? It would be useful for the National Archive (and -anyone else interested in verifying that a document is well defined) -to have such tool available when considering to approve the use of -OOXML. I'm aware of the -officeotron OOXML -validator, but do not know how complete it is nor if it will -report use of undefined behaviour. Are there other similar tools -available? Please send me an email if you know of any such tool.
+ +After several months of working and waiting, I am happy to report +that the nice and user friendly 3D printer slicer software Cura just +entered Debian Unstable. It consist of five packages, +cura, +cura-engine, +libarcus, +fdm-materials, +libsavitar and +uranium. The last +two, uranium and cura, entered Unstable yesterday. This should make +it easier for Debian users to print on at least the Ultimaker class of +3D printers. My nearest 3D printer is an Ultimaker 2+, so it will +make life easier for at least me. :)
+ +The work to make this happen was done by Gregor Riepl, and I was +happy to assist him in sponsoring the packages. With the introduction +of Cura, Debian is up to three 3D printer slicers at your service, +Cura, Slic3r and Slic3r Prusa. If you own or have access to a 3D +printer, give it a go. :)
+ +The 3D printer software is maintained by the 3D printer Debian +team, flocking together on the +3dprinter-general +mailing list and the +#debian-3dprinting +IRC channel.
+ +The next step for Cura in Debian is to update the cura package to +version 3.0.3 and then update the entire set of packages to version +3.1.0 which showed up the last few days.
A few days ago, we received the ruling from -my -day in court. The case in question is a challenge of the seizure -of the DNS domain popcorn-time.no. The ruling simply did not mention -most of our arguments, and seemed to take everything ÃKOKRIM said at -face value, ignoring our demonstration and explanations. But it is -hard to tell for sure, as we still have not seen most of the documents -in the case and thus were unprepared and unable to contradict several -of the claims made in court by the opposition. We are considering an -appeal, but it is partly a question of funding, as it is costing us -quite a bit to pay for our lawyer. If you want to help, please -donate to the -NUUG defense fund.
- -The details of the case, as far as we know it, is available in -Norwegian from -the NUUG -blog. This also include -the -ruling itself.
+ +While looking at +the scanned copies +for the copyright renewal entries for movies published in the USA, +an idea occurred to me. The number of renewals are so few per year, it +should be fairly quick to transcribe them all and add references to +the corresponding IMDB title ID. This would give the (presumably) +complete list of movies published 28 years earlier that did _not_ +enter the public domain for the transcribed year. By fetching the +list of USA movies published 28 years earlier and subtract the movies +with renewals, we should be left with movies registered in IMDB that +are now in the public domain. For the year 1955 (which is the one I +have looked at the most), the total number of pages to transcribe is +21. For the 28 years from 1950 to 1978, it should be in the range +500-600 pages. It is just a few days of work, and spread among a +small group of people it should be doable in a few weeks of spare +time.
+ +A typical copyright renewal entry look like this (the first one +listed for 1955):
+ ++ ADAM AND EVIL, a photoplay in seven reels by Metro-Goldwyn-Mayer + Distribution Corp. (c) 17Aug27; L24293. Loew's Incorporated (PWH); + 10Jun55; R151558. ++ +
The movie title as well as registration and renewal dates are easy +enough to locate by a program (split on first comma and look for +DDmmmYY). The rest of the text is not required to find the movie in +IMDB, but is useful to confirm the correct movie is found. I am not +quite sure what the L and R numbers mean, but suspect they are +reference numbers into the archive of the US Copyright Office.
+ +Tracking down the equivalent IMDB title ID is probably going to be +a manual task, but given the year it is fairly easy to search for the +movie title using for example +http://www.imdb.com/find?q=adam+and+evil+1927&s=all. +Using this search, I find that the equivalent IMDB title ID for the +first renewal entry from 1955 is +http://www.imdb.com/title/tt0017588/.
+ +I suspect the best way to do this would be to make a specialised +web service to make it easy for contributors to transcribe and track +down IMDB title IDs. In the web service, once a entry is transcribed, +the title and year could be extracted from the text, a search in IMDB +conducted for the user to pick the equivalent IMDB title ID right +away. By spreading out the work among volunteers, it would also be +possible to make at least two persons transcribe the same entries to +be able to discover any typos introduced. But I will need help to +make this happen, as I lack the spare time to do all of this on my +own. If you would like to help, please get in touch. Perhaps you can +draft a web service for crowd sourcing the task?
+ +Note, Project Gutenberg already have some +transcribed +copies of the US Copyright Office renewal protocols, but I have +not been able to find any film renewals there, so I suspect they only +have copies of renewal for written works. I have not been able to find +any transcribed versions of movie renewals so far. Perhaps they exist +somewhere?
+ +I would love to figure out methods for finding all the public +domain works in other countries too, but it is a lot harder. At least +for Norway and Great Britain, such work involve tracking down the +people involved in making the movie and figuring out when they died. +It is hard enough to figure out who was part of making a movie, but I +do not know how to automate such procedure without a registry of every +person involved in making movies and their death year.
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
On Wednesday, I spent the entire day in court in Follo Tingrett -representing the member association -NUUG, alongside the member -association EFN and the DNS registrar -IMC, challenging the seizure of the DNS name popcorn-time.no. It -was interesting to sit in a court of law for the first time in my -life. Our team can be seen in the picture above: attorney Ola -Tellesbø, EFN board member Tom Fredrik Blenning, IMC CEO Morten Emil -Eriksen and NUUG board member Petter Reinholdtsen.
- -The -case at hand is that the Norwegian National Authority for -Investigation and Prosecution of Economic and Environmental Crime (aka -Ãkokrim) decided on their own, to seize a DNS domain early last -year, without following -the -official policy of the Norwegian DNS authority which require a -court decision. The web site in question was a site covering Popcorn -Time. And Popcorn Time is the name of a technology with both legal -and illegal applications. Popcorn Time is a client combining -searching a Bittorrent directory available on the Internet with -downloading/distribute content via Bittorrent and playing the -downloaded content on screen. It can be used illegally if it is used -to distribute content against the will of the right holder, but it can -also be used legally to play a lot of content, for example the -millions of movies -available from the -Internet Archive or the collection -available from Vodo. We created -a -video demonstrating legally use of Popcorn Time and played it in -Court. It can of course be downloaded using Bittorrent.
- -I did not quite know what to expect from a day in court. The -government held on to their version of the story and we held on to -ours, and I hope the judge is able to make sense of it all. We will -know in two weeks time. Unfortunately I do not have high hopes, as -the Government have the upper hand here with more knowledge about the -case, better training in handling criminal law and in general higher -standing in the courts than fairly unknown DNS registrar and member -associations. It is expensive to be right also in Norway. So far the -case have cost more than NOK 70 000,-. To help fund the case, NUUG -and EFN have asked for donations, and managed to collect around NOK 25 -000,- so far. Given the presentation from the Government, I expect -the government to appeal if the case go our way. And if the case do -not go our way, I hope we have enough funding to appeal.
- -From the other side came two people from Ãkokrim. On the benches, -appearing to be part of the group from the government were two people -from the Simonsen Vogt Wiik lawyer office, and three others I am not -quite sure who was. Ãkokrim had proposed to present two witnesses -from The Motion Picture Association, but this was rejected because -they did not speak Norwegian and it was a bit late to bring in a -translator, but perhaps the two from MPA were present anyway. All -seven appeared to know each other. Good to see the case is take -seriously.
- -If you, like me, believe the courts should be involved before a DNS -domain is hijacked by the government, or you believe the Popcorn Time -technology have a lot of useful and legal applications, I suggest you -too donate to -the NUUG defense fund. Both Bitcoin and bank transfer are -available. If NUUG get more than we need for the legal action (very -unlikely), the rest will be spend promoting free software, open -standards and unix-like operating systems in Norway, so no matter what -happens the money will be put to good use.
- -If you want to lean more about the case, I recommend you check out -the blog -posts from NUUG covering the case. They cover the legal arguments -on both sides.
+ +Three years ago, a presumed lost animation film, +Empty Socks from +1927, was discovered in the Norwegian National Library. At the +time it was discovered, it was generally assumed to be copyrighted by +The Walt Disney Company, and I blogged about +my +reasoning to conclude that it would would enter the Norwegian +equivalent of the public domain in 2053, based on my understanding of +Norwegian Copyright Law. But a few days ago, I came across +a +blog post claiming the movie was already in the public domain, at +least in USA. The reasoning is as follows: The film was released in +November or Desember 1927 (sources disagree), and presumably +registered its copyright that year. At that time, right holders of +movies registered by the copyright office received government +protection for there work for 28 years. After 28 years, the copyright +had to be renewed if the wanted the government to protect it further. +The blog post I found claim such renewal did not happen for this +movie, and thus it entered the public domain in 1956. Yet someone +claim the copyright was renewed and the movie is still copyright +protected. Can anyone help me to figure out which claim is correct? +I have not been able to find Empty Socks in Catalog of copyright +entries. Ser.3 pt.12-13 v.9-12 1955-1958 Motion Pictures +available +from the University of Pennsylvania, neither in +page +45 for the first half of 1955, nor in +page +119 for the second half of 1955. It is of course possible that +the renewal entry was left out of the printed catalog by mistake. Is +there some way to rule out this possibility? Please help, and update +the wikipedia page with your findings. + +
As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
I dag fikk jeg en skikkelig gladmelding. Bakgrunnen er at før jul -arrangerte Nasjonalbiblioteket -et -seminar om sitt knakende gode tiltak «verksregister». Eneste -måten å melde seg på dette seminaret var å sende personopplysninger -til Google via Google Skjemaer. Dette syntes jeg var tvilsom praksis, -da det bør være mulig å delta på seminarer arrangert av det offentlige -uten å måtte dele sine interesser, posisjon og andre -personopplysninger med Google. Jeg ba derfor om innsyn via -Mimes brønn i -avtaler -og vurderinger Nasjonalbiblioteket hadde rundt dette. -Personopplysningsloven legger klare rammer for hva som må være på -plass før en kan be tredjeparter, spesielt i utlandet, behandle -personopplysninger på sine vegne, så det burde eksistere grundig -dokumentasjon før noe slikt kan bli lovlig. To jurister hos -Nasjonalbiblioteket mente først dette var helt i orden, og at Googles -standardavtale kunne brukes som databehandlingsavtale. Det syntes jeg -var merkelig, men har ikke hatt kapasitet til å følge opp saken før -for to dager siden.
- -Gladnyheten i dag, som kom etter at jeg tipset Nasjonalbiblioteket -om at Datatilsynet underkjente Googles standardavtaler som -databehandleravtaler i 2011, er at Nasjonalbiblioteket har bestemt seg -for å avslutte bruken av Googles Skjemaer/Apps og gå i dialog med DIFI -for å finne bedre måter å håndtere påmeldinger i tråd med -personopplysningsloven. Det er fantastisk å se at av og til hjelper -det å spørre hva i alle dager det offentlige holder på med.
+ +It would be easier to locate the movie you want to watch in +the Internet Archive, if the +metadata about each movie was more complete and accurate. In the +archiving community, a well known saying state that good metadata is a +love letter to the future. The metadata in the Internet Archive could +use a face lift for the future to love us back. Here is a proposal +for a small improvement that would make the metadata more useful +today. I've been unable to find any document describing the various +standard fields available when uploading videos to the archive, so +this proposal is based on my best quess and searching through several +of the existing movies.
+ +I have a few use cases in mind. First of all, I would like to be +able to count the number of distinct movies in the Internet Archive, +without duplicates. I would further like to identify the IMDB title +ID of the movies in the Internet Archive, to be able to look up a IMDB +title ID and know if I can fetch the video from there and share it +with my friends.
+ +Second, I would like the Butter data provider for The Internet +archive +(available +from github), to list as many of the good movies as possible. The +plugin currently do a search in the archive with the following +parameters:
+ ++collection:moviesandfilms +AND NOT collection:movie_trailers +AND -mediatype:collection +AND format:"Archive BitTorrent" +AND year ++ +
Most of the cool movies that fail to show up in Butter do so +because the 'year' field is missing. The 'year' field is populated by +the year part from the 'date' field, and should be when the movie was +released (date or year). Two such examples are +Ben Hur +from 1905 and +Caminandes +2: Gran Dillama from 2013, where the year metadata field is +missing.
+ +So, my proposal is simply, for every movie in The Internet Archive +where an IMDB title ID exist, please fill in these metadata fields +(note, they can be updated also long after the video was uploaded, but +as far as I can tell, only by the uploader): + +-
+
+
- mediatype +
- Should be 'movie' for movies. + +
- collection +
- Should contain 'moviesandfilms'. + +
- title +
- The title of the movie, without the publication year. + +
- date +
- The data or year the movie was released. This make the movie show +up in Butter, as well as make it possible to know the age of the +movie and is useful to figure out copyright status. + +
- director +
- The director of the movie. This make it easier to know if the +correct movie is found in movie databases. + +
- publisher +
- The production company making the movie. Also useful for +identifying the correct movie. + +
- links + +
- Add a link to the IMDB title page, for example like this: <a +href="http://www.imdb.com/title/tt0028496/">Movie in +IMDB</a>. This make it easier to find duplicates and allow for +counting of number of unique movies in the Archive. Other external +references, like to TMDB, could be added like this too. + +
I did consider proposing a Custom field for the IMDB title ID (for +example 'imdb_title_url', 'imdb_code' or simply 'imdb', but suspect it +will be easier to simply place it in the links free text field.
+ +I created +a +list of IMDB title IDs for several thousand movies in the Internet +Archive, but I also got a list of several thousand movies without +such IMDB title ID (and quite a few duplicates). It would be great if +this data set could be integrated into the Internet Archive metadata +to be available for everyone in the future, but with the current +policy of leaving metadata editing to the uploaders, it will take a +while before this happen. If you have uploaded movies into the +Internet Archive, you can help. Please consider following my proposal +above for your movies, to ensure that movie is properly +counted. :)
+ +The list is mostly generated using wikidata, which based on +Wikipedia articles make it possible to link between IMDB and movies in +the Internet Archive. But there are lots of movies without a +Wikipedia article, and some movies where only a collection page exist +(like for the +Caminandes example above, where there are three movies but only +one Wikidata entry).
+ +As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
Jeg leste med interesse en nyhetssak hos -digi.no -og -NRK -om at det ikke bare er meg, men at også NAV bedriver geolokalisering -av IP-adresser, og at det gjøres analyse av IP-adressene til de som -sendes inn meldekort for å se om meldekortet sendes inn fra -utenlandske IP-adresser. Politiadvokat i Drammen, Hans Lyder Haare, -er sitert i NRK på at «De to er jo blant annet avslørt av -IP-adresser. At man ser at meldekortet kommer fra utlandet.»
- -Jeg synes det er fint at det blir bedre kjent at IP-adresser -knyttes til enkeltpersoner og at innsamlet informasjon brukes til å -stedsbestemme personer også av aktører her i Norge. Jeg ser det som -nok et argument for å bruke -Tor så mye som mulig for å -gjøre gjøre IP-lokalisering vanskeligere, slik at en kan beskytte sin -privatsfære og unngå å dele sin fysiske plassering med -uvedkommede.
- -Men det er en ting som bekymrer meg rundt denne nyheten. Jeg ble -tipset (takk #nuug) om -NAVs -personvernerklæring, som under punktet «Personvern og statistikk» -lyder:
- -- -«Når du besøker nav.no, etterlater du deg elektroniske spor. Sporene -dannes fordi din nettleser automatisk sender en rekke opplysninger til -NAVs tjener (server-maskin) hver gang du ber om å få vist en side. Det -er eksempelvis opplysninger om hvilken nettleser og -versjon du -bruker, og din internettadresse (ip-adresse). For hver side som vises, -lagres følgende opplysninger:
- --
+ +- hvilken side du ser på
-- dato og tid
-- hvilken nettleser du bruker
-- din ip-adresse
-18th November 2017+@@ -577,164 +1027,87 @@ kanskje Datatilsynet bør gjøre det?A month ago, I blogged about my work to +automatically +check the copyright status of IMDB entries, and try to count the +number of movies listed in IMDB that is legal to distribute on the +Internet. I have continued to look for good data sources, and +identified a few more. The code used to extract information from +various data sources is available in +a +git repository, currently available from github.
+ +So far I have identified 3186 unique IMDB title IDs. To gain +better understanding of the structure of the data set, I created a +histogram of the year associated with each movie (typically release +year). It is interesting to notice where the peaks and dips in the +graph are located. I wonder why they are placed there. I suspect +World War II caused the dip around 1940, but what caused the peak +around 2010?
+ ++ +
I've so far identified ten sources for IMDB title IDs for movies in +the public domain or with a free license. This is the statistics +reported when running 'make stats' in the git repository:
+ ++ 249 entries ( 6 unique) with and 288 without IMDB title ID in free-movies-archive-org-butter.json + 2301 entries ( 540 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json + 830 entries ( 29 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json + 2109 entries ( 377 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json + 291 entries ( 122 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json + 144 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-manual.json + 350 entries ( 1 unique) with and 801 without IMDB title ID in free-movies-publicdomainmovies.json + 4 entries ( 0 unique) with and 124 without IMDB title ID in free-movies-publicdomainreview.json + 698 entries ( 119 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json + 8 entries ( 8 unique) with and 196 without IMDB title ID in free-movies-vodo.json + 3186 unique IMDB title IDs in total ++ +The entries without IMDB title ID are candidates to increase the +data set, but might equally well be duplicates of entries already +listed with IMDB title ID in one of the other sources, or represent +movies that lack a IMDB title ID. I've seen examples of all these +situations when peeking at the entries without IMDB title ID. Based +on these data sources, the lower bound for movies listed in IMDB that +are legal to distribute on the Internet is between 3186 and 4713. + +
It would be great for improving the accuracy of this measurement, +if the various sources added IMDB title ID to their metadata. I have +tried to reach the people behind the various sources to ask if they +are interested in doing this, without any replies so far. Perhaps you +can help me get in touch with the people behind VODO, Public Domain +Torrents, Public Domain Movies and Public Domain Review to try to +convince them to add more metadata to their movie entries?
+ +Another way you could help is by adding pages to Wikipedia about +movies that are legal to distribute on the Internet. If such page +exist and include a link to both IMDB and The Internet Archive, the +script used to generate free-movies-archive-org-wikidata.json should +pick up the mapping as soon as wikidata is updates.
-Ingen av opplysningene vil bli brukt til å identifisere -enkeltpersoner. NAV bruker disse opplysningene til å generere en -samlet statistikk som blant annet viser hvilke sider som er mest -populære. Statistikken er et redskap til å forbedre våre -tjenester.»
- - - -Jeg klarer ikke helt å se hvordan analyse av de besøkendes -IP-adresser for å se hvem som sender inn meldekort via web fra en -IP-adresse i utlandet kan gjøres uten å komme i strid med påstanden om -at «ingen av opplysningene vil bli brukt til å identifisere -enkeltpersoner». Det virker dermed for meg som at NAV bryter sine -egen personvernerklæring, hvilket -Datatilsynet -fortalte meg i starten av desember antagelig er brudd på -personopplysningsloven. - -
I tillegg er personvernerklæringen ganske misvisende i og med at -NAVs nettsider ikke bare forsyner NAV med personopplysninger, men i -tillegg ber brukernes nettleser kontakte fem andre nettjenere -(script.hotjar.com, static.hotjar.com, vars.hotjar.com, -www.google-analytics.com og www.googletagmanager.com), slik at -personopplysninger blir gjort tilgjengelig for selskapene Hotjar og -Google , og alle som kan lytte på trafikken på veien (som FRA, GCHQ og -NSA). Jeg klarer heller ikke se hvordan slikt spredning av -personopplysninger kan være i tråd med kravene i -personopplysningloven, eller i tråd med NAVs personvernerklæring.
- -Kanskje NAV bør ta en nøye titt på sin personvernerklæring? Eller -kanskje Datatilsynet bør gjøre det?
+As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
- -9th January 2017-Did you ever wonder where the web trafic really flow to reach the -web servers, and who own the network equipment it is flowing through? -It is possible to get a glimpse of this from using traceroute, but it -is hard to find all the details. Many years ago, I wrote a system to -map the Norwegian Internet (trying to figure out if our plans for a -network game service would get low enough latency, and who we needed -to talk to about setting up game servers close to the users. Back -then I used traceroute output from many locations (I asked my friends -to run a script and send me their traceroute output) to create the -graph and the map. The output from traceroute typically look like -this: +
+1st November 2017+@@ -749,6 +1122,15 @@ activities, please send Bitcoin donations to my addressIf you care about how fault tolerant your storage is, you might +find these articles and papers interesting. They have formed how I +think of when designing a storage system.
--traceroute to www.stortinget.no (85.88.67.10), 30 hops max, 60 byte packets - 1 uio-gw10.uio.no (129.240.202.1) 0.447 ms 0.486 ms 0.621 ms - 2 uio-gw8.uio.no (129.240.24.229) 0.467 ms 0.578 ms 0.675 ms - 3 oslo-gw1.uninett.no (128.39.65.17) 0.385 ms 0.373 ms 0.358 ms - 4 te3-1-2.br1.fn3.as2116.net (193.156.90.3) 1.174 ms 1.172 ms 1.153 ms - 5 he16-1-1.cr1.san110.as2116.net (195.0.244.234) 2.627 ms he16-1-1.cr2.oslosda310.as2116.net (195.0.244.48) 3.172 ms he16-1-1.cr1.san110.as2116.net (195.0.244.234) 2.857 ms - 6 ae1.ar8.oslosda310.as2116.net (195.0.242.39) 0.662 ms 0.637 ms ae0.ar8.oslosda310.as2116.net (195.0.242.23) 0.622 ms - 7 89.191.10.146 (89.191.10.146) 0.931 ms 0.917 ms 0.955 ms - 8 * * * - 9 * * * -[...] -+-
+ +This show the DNS names and IP addresses of (at least some of the) -network equipment involved in getting the data traffic from me to the -www.stortinget.no server, and how long it took in milliseconds for a -package to reach the equipment and return to me. Three packages are -sent, and some times the packages do not follow the same path. This -is shown for hop 5, where three different IP addresses replied to the -traceroute request.
- -There are many ways to measure trace routes. Other good traceroute -implementations I use are traceroute (using ICMP packages) mtr (can do -both ICMP, UDP and TCP) and scapy (python library with ICMP, UDP, TCP -traceroute and a lot of other capabilities). All of them are easily -available in Debian.
- -This time around, I wanted to know the geographic location of -different route points, to visualize how visiting a web page spread -information about the visit to a lot of servers around the globe. The -background is that a web site today often will ask the browser to get -from many servers the parts (for example HTML, JSON, fonts, -JavaScript, CSS, video) required to display the content. This will -leak information about the visit to those controlling these servers -and anyone able to peek at the data traffic passing by (like your ISP, -the ISPs backbone provider, FRA, GCHQ, NSA and others).
- -Lets pick an example, the Norwegian parliament web site -www.stortinget.no. It is read daily by all members of parliament and -their staff, as well as political journalists, activits and many other -citizens of Norway. A visit to the www.stortinget.no web site will -ask your browser to contact 8 other servers: ajax.googleapis.com, -insights.hotjar.com, script.hotjar.com, static.hotjar.com, -stats.g.doubleclick.net, www.google-analytics.com, -www.googletagmanager.com and www.netigate.se. I extracted this by -asking PhantomJS to visit the -Stortinget web page and tell me all the URLs PhantomJS downloaded to -render the page (in HAR format using -their -netsniff example. I am very grateful to Gorm for showing me how -to do this). My goal is to visualize network traces to all IP -addresses behind these DNS names, do show where visitors personal -information is spread when visiting the page.
- - - -When I had a look around for options, I could not find any good -free software tools to do this, and decided I needed my own traceroute -wrapper outputting KML based on locations looked up using GeoIP. KML -is easy to work with and easy to generate, and understood by several -of the GIS tools I have available. I got good help from by NUUG -colleague Anders Einar with this, and the result can be seen in -my -kmltraceroute git repository. Unfortunately, the quality of the -free GeoIP databases I could find (and the for-pay databases my -friends had access to) is not up to the task. The IP addresses of -central Internet infrastructure would typically be placed near the -controlling companies main office, and not where the router is really -located, as you can see from the -KML file I created using the GeoLite City dataset from MaxMind. - -
- -I also had a look at the visual traceroute graph created by -the scrapy project, -showing IP network ownership (aka AS owner) for the IP address in -question. -The -graph display a lot of useful information about the traceroute in SVG -format, and give a good indication on who control the network -equipment involved, but it do not include geolocation. This graph -make it possible to see the information is made available at least for -UNINETT, Catchcom, Stortinget, Nordunet, Google, Amazon, Telia, Level -3 Communications and NetDNA.
- - - -In the process, I came across the -web service GeoTraceroute by -Salim Gasmi. Its methology of combining guesses based on DNS names, -various location databases and finally use latecy times to rule out -candidate locations seemed to do a very good job of guessing correct -geolocation. But it could only do one trace at the time, did not have -a sensor in Norway and did not make the geolocations easily available -for postprocessing. So I contacted the developer and asked if he -would be willing to share the code (he refused until he had time to -clean it up), but he was interested in providing the geolocations in a -machine readable format, and willing to set up a sensor in Norway. So -since yesterday, it is possible to run traces from Norway in this -service thanks to a sensor node set up by -the NUUG assosiation, and get the -trace in KML format for further processing.
- - - -Here we can see a lot of trafic passes Sweden on its way to -Denmark, Germany, Holland and Ireland. Plenty of places where the -Snowden confirmations verified the traffic is read by various actors -without your best interest as their top priority.
- -Combining KML files is trivial using a text editor, so I could loop -over all the hosts behind the urls imported by www.stortinget.no and -ask for the KML file from GeoTraceroute, and create a combined KML -file with all the traces (unfortunately only one of the IP addresses -behind the DNS name is traced this time. To get them all, one would -have to request traces using IP number instead of DNS names from -GeoTraceroute). That might be the next step in this project.
- -Armed with these tools, I find it a lot easier to figure out where -the IP traffic moves and who control the boxes involved in moving it. -And every time the link crosses for example the Swedish border, we can -be sure Swedish Signal Intelligence (FRA) is listening, as GCHQ do in -Britain and NSA in USA and cables around the globe. (Hm, what should -we tell them? :) Keep that in mind if you ever send anything -unencrypted over the Internet.
- -PS: KML files are drawn using -the KML viewer from Ivan -Rublev, as it was less cluttered than the local Linux application -Marble. There are heaps of other options too.
+- USENIX :login; Redundancy +Does Not Imply Fault Tolerance. Analysis of Distributed Storage +Reactions to Single Errors and Corruptions by Aishwarya Ganesan, +Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi +H. Arpaci-Dusseau
+ +- ZDNet +Why +RAID 5 stops working in 2009 by Robin Harris
+ +- ZDNet +Why +RAID 6 stops working in 2019 by Robin Harris
+ +- USENIX FAST'07 +Failure +Trends in a Large Disk Drive Population by Eduardo Pinheiro, +Wolf-Dietrich Weber and Luiz AndreÌ Barroso
+ +- USENIX ;login: Data +Integrity. Finding Truth in a World of Guesses and Lies by Doug +Hughes
+ +- USENIX FAST'08 +An +Analysis of Data Corruption in the Storage Stack by +L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. +Arpaci-Dusseau, and R. H. Arpaci-Dusseau
+ +- USENIX FAST'07 Disk +failures in the real world: what does an MTTF of 1,000,000 hours mean +to you? by B. Schroeder and G. A. Gibson.
+ +- USENIX ;login: Are +Disks the Dominant Contributor for Storage Failures? A Comprehensive +Study of Storage Subsystem Failure Characteristics by Weihang +Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky
+ +- SIGMETRICS 2007 +An +analysis of latent sector errors in disk drives by +L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler
+ +Several of these research papers are based on data collected from +hundred thousands or millions of disk, and their findings are eye +opening. The short story is simply do not implicitly trust RAID or +redundant storage systems. Details matter. And unfortunately there +are few options on Linux addressing all the identified issues. Both +ZFS and Btrfs are doing a fairly good job, but have legal and +practical issues on their own. I wonder how cluster file systems like +Ceph do in this regard. After all, there is an old saying, you know +you have a distributed system when the crash of a computer you have +never heard of stops you from getting any work done. The same holds +true if fault tolerance do not work.
+ +Just remember, in the end, it do not matter how redundant, or how +fault tolerant your storage is, if you do not continuously monitor its +status to detect and replace failed disks.
As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address -15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
+15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.Archive
+
- 2018 +
++ +
- January (1)
+ +- February (2)
+ +- 2017
@@ -1008,7 +1406,7 @@ activities, please send Bitcoin donations to my address@@ -756,7 +1138,23 @@ activities, please send Bitcoin donations to my address
- February (3)
-- March (4)
+- March (5)
+ +- April (2)
+ +- June (5)
+ +- July (1)
+ +- August (1)
+ +- September (3)
+ +- October (5)
+ +- November (3)
+ +- December (4)
Tags
-
- 3d-printer (13)
+- 3d-printer (15)
- amiga (1)
@@ -1024,27 +1422,27 @@ activities, please send Bitcoin donations to my address- chrpath (2)
-- debian (148)
+- debian (155)
- debian edu (158)
-- debian-handbook (3)
+- debian-handbook (4)
- digistan (10)
-- dld (16)
+- dld (17)
-- docbook (23)
+- docbook (24)
- drivstoffpriser (4)
-- english (345)
+- english (365)
- fiksgatami (23)
-- fildeling (12)
+- fildeling (13)
-- freeculture (29)
+- freeculture (32)
- freedombox (9)
@@ -1060,6 +1458,8 @@ activities, please send Bitcoin donations to my address- ldap (9)
+- lego (4)
+- lenker (8)
- lsdvd (2)
@@ -1072,19 +1472,19 @@ activities, please send Bitcoin donations to my address- nice free software (9)
-- norsk (287)
+- norsk (295)
-- nuug (187)
+- nuug (190)
-- offentlig innsyn (28)
+- offentlig innsyn (33)
- open311 (2)
-- opphavsrett (64)
+- opphavsrett (71)
-- personvern (99)
+- personvern (104)
-- raid (1)
+- raid (2)
- reactos (1)
@@ -1100,27 +1500,29 @@ activities, please send Bitcoin donations to my address- scraperwiki (2)
-- sikkerhet (52)
+- sikkerhet (53)
- sitesummary (4)
- skepsis (5)
-- standard (51)
+- standard (55)
-- stavekontroll (5)
+- stavekontroll (6)
-- stortinget (11)
+- stortinget (12)
-- surveillance (48)
+- surveillance (53)
-- sysadmin (3)
+- sysadmin (4)
- usenix (2)
-- valg (8)
+- valg (9)
+ +- verkidetfri (10)
-- video (59)
+- video (61)
- vitenskap (4)