- <title>Norwegian movies that might be legal to share on the Internet</title>
- <link>http://people.skolelinux.org/pere/blog/Norwegian_movies_that_might_be_legal_to_share_on_the_Internet.html</link>
- <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Norwegian_movies_that_might_be_legal_to_share_on_the_Internet.html</guid>
- <pubDate>Sun, 1 Sep 2019 11:10:00 +0200</pubDate>
- <description><p>While working on identifying and counting movies that can be
-legally shared on the Internet, I also looked at the Norwegian movies
-listed in IMDb. So far I have identified 54 candidates published
-before 1940 that might no longer be protected by norwegian copyright
-law. Of these, only 29 are available at least in part from the
-Norwegian National Library. It can be assumed that the remaining 25
-movies are lost. It seem most useful to identify the copyright status
-of movies that are not lost. To verify that the movie is really no
-longer protected, one need to verify the list of copyright holders and
-figure out if and when they died. I've been able to identify some of
-them, but for some it is hard to figure out when they died.</p>
-
-</p>This is the list of 29 movies both available from the library and
-possibly no longer protected by copyright law. The year range
-(1909-1979 on the first line) is year of publication and last year
-with copyright protection.</p>
-
-<pre>
-1909-1979 ( 70 year) NSB Bergensbanen 1909 - http://www.imdb.com/title/tt0347601/
-1910-1980 ( 70 year) Bjørnstjerne Bjørnsons likfærd - http://www.imdb.com/title/tt9299304/
-1910-1980 ( 70 year) Bjørnstjerne Bjørnsons begravelse - http://www.imdb.com/title/tt9299300/
-1912-1998 ( 86 year) Roald Amundsens Sydpolsferd (1910-1912) - http://www.imdb.com/title/tt9237500/
-1913-2006 ( 93 year) Roald Amundsen på sydpolen - http://www.imdb.com/title/tt0347886/
-1917-1987 ( 70 year) Fanden i nøtten - http://www.imdb.com/title/tt0346964/
-1919-2018 ( 99 year) Historien om en gut - http://www.imdb.com/title/tt0010259/
-1920-1990 ( 70 year) Kaksen på Øverland - http://www.imdb.com/title/tt0011361/
-1923-1993 ( 70 year) Norge - en skildring i 6 akter - http://www.imdb.com/title/tt0014319/
-1925-1997 ( 72 year) Roald Amundsen - Ellsworths flyveekspedition 1925 - http://www.imdb.com/title/tt0016295/
-1925-1995 ( 70 year) En verdensreise, eller Da knold og tott vaskede negrene hvite med 13 sæpen - http://www.imdb.com/title/tt1018948/
-1926-1996 ( 70 year) Luftskibet 'Norge's flugt over polhavet - http://www.imdb.com/title/tt0017090/
-1926-1996 ( 70 year) Med 'Maud' over Polhavet - http://www.imdb.com/title/tt0017129/
-1927-1997 ( 70 year) Den store sultan - http://www.imdb.com/title/tt1017997/
-1928-1998 ( 70 year) Noahs ark - http://www.imdb.com/title/tt1018917/
-1928-1998 ( 70 year) Skjæbnen - http://www.imdb.com/title/tt1002652/
-1928-1998 ( 70 year) Chefens cigarett - http://www.imdb.com/title/tt1019896/
-1929-1999 ( 70 year) Se Norge - http://www.imdb.com/title/tt0020378/
-1929-1999 ( 70 year) Fra Chr. Michelsen til Kronprins Olav og Prinsesse Martha - http://www.imdb.com/title/tt0019899/
-1930-2000 ( 70 year) Mot ukjent land - http://www.imdb.com/title/tt0021158/
-1930-2000 ( 70 year) Det er natt - http://www.imdb.com/title/tt1017904/
-1930-2000 ( 70 year) Over Besseggen på motorcykel - http://www.imdb.com/title/tt0347721/
-1931-2001 ( 70 year) Glimt fra New York og den Norske koloni - http://www.imdb.com/title/tt0021913/
-1932-2007 ( 75 year) En glad gutt - http://www.imdb.com/title/tt0022946/
-1934-2004 ( 70 year) Den lystige radio-trio - http://www.imdb.com/title/tt1002628/
-1935-2005 ( 70 year) Kronprinsparets reise i Nord Norge - http://www.imdb.com/title/tt0268411/
-1935-2005 ( 70 year) Stormangrep - http://www.imdb.com/title/tt1017998/
-1936-2006 ( 70 year) En fargesymfoni i blått - http://www.imdb.com/title/tt1002762/
-1939-2009 ( 70 year) Til Vesterheimen - http://www.imdb.com/title/tt0032036/
-</pre>
-
-To be sure which one of these can be legally shared on the Internet,
-in addition to verifying the right holders list is complete, one need
-to verify the death year of these persons:
-
-<pre>
-Bjørnstjerne Bjørnson (dead 1910) - http://www.imdb.com/name/nm0085085/
-Gustav Adolf Olsen (missing death year) - http://www.imdb.com/name/nm0647652/
-Gustav Lund (missing death year) - http://www.imdb.com/name/nm0526168/
-John W. Brunius (dead 1937) - http://www.imdb.com/name/nm0116307/
-Ola Cornelius (missing death year) - http://www.imdb.com/name/nm1227236/
-Oskar Omdal (dead 1927) - http://www.imdb.com/name/nm3116241/
-Paul Berge (missing death year) - http://www.imdb.com/name/nm0074006/
-Peter Lykke-Seest (dead 1948) - http://www.imdb.com/name/nm0528064/
-Roald Amundsen (dead 1928) - https://www.imdb.com/name/nm0025468/
-Sverre Halvorsen (dead 1936) - http://www.imdb.com/name/nm1299757/
-Thomas W. Schwartz (missing death year) - http://www.imdb.com/name/nm2616250/
-</pre>
-
-<p>Perhaps you can help me figuring death year of those missing it, or
-right holders if some are missing in IMDb? It would be nice to have a
-definite list of Norwegian movies that are legal to share on the
-Internet.</p>
-
-</p>This is the list of 25 movies not available from the library and
-possibly no longer protected by copyright law:</p>
-
-<pre>
-1907-2009 (102 year) Fiskerlivets farer - http://www.imdb.com/title/tt0121288/
-1912-2018 (106 year) Historien omen moder - http://www.imdb.com/title/tt0382852/
-1912-2002 ( 90 year) Anny - en gatepiges roman - http://www.imdb.com/title/tt0002026/
-1916-1986 ( 70 year) The Mother Who Paid - http://www.imdb.com/title/tt3619226/
-1917-2018 (101 year) En vinternat - http://www.imdb.com/title/tt0008740/
-1917-2018 (101 year) Unge hjerter - http://www.imdb.com/title/tt0008719/
-1917-2018 (101 year) De forældreløse - http://www.imdb.com/title/tt0007972/
-1918-2018 (100 year) Vor tids helte - http://www.imdb.com/title/tt0009769/
-1918-2018 (100 year) Lodsens datter - http://www.imdb.com/title/tt0009314/
-1919-2018 ( 99 year) Æresgjesten - http://www.imdb.com/title/tt0010939/
-1921-2006 ( 85 year) Det nye year? - http://www.imdb.com/title/tt0347686/
-1921-1991 ( 70 year) Under Polarkredsens himmel - http://www.imdb.com/title/tt0012789/
-1923-1993 ( 70 year) Nordenfor polarcirkelen - http://www.imdb.com/title/tt0014318/
-1925-1995 ( 70 year) Med 'Stavangerfjord' til Nordkap - http://www.imdb.com/title/tt0016098/
-1926-1996 ( 70 year) Over Atlanterhavet og gjennem Amerika - http://www.imdb.com/title/tt0017241/
-1926-1996 ( 70 year) Hallo! Amerika! - http://www.imdb.com/title/tt0016945/
-1926-1996 ( 70 year) Tigeren Teodors triumf - http://www.imdb.com/title/tt1008052/
-1927-1997 ( 70 year) Rød sultan - http://www.imdb.com/title/tt1017979/
-1927-1997 ( 70 year) Søndagsfiskeren Flag - http://www.imdb.com/title/tt1018002/
-1930-2000 ( 70 year) Ro-ro til fiskeskjær - http://www.imdb.com/title/tt1017973/
-1933-2003 ( 70 year) I kongens klær - http://www.imdb.com/title/tt0024164/
-1934-2004 ( 70 year) Eventyret om de tre bukkene bruse - http://www.imdb.com/title/tt1007963/
-1934-2004 ( 70 year) Pål sine høner - http://www.imdb.com/title/tt1017966/
-1937-2007 ( 70 year) Et mesterverk - http://www.imdb.com/title/tt1019937/
-1938-2008 ( 70 year) En Harmony - http://www.imdb.com/title/tt1007975/
-</pre>
+ <title>Speech to text, she APTly whispered, how hard can it be?</title>
+ <link>https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html</link>
+ <guid isPermaLink="true">https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html</guid>
+ <pubDate>Sun, 23 Apr 2023 09:40:00 +0200</pubDate>
+ <description><p>While visiting a convention during Easter, it occurred to me that
+it would be great if I could have a digital Dictaphone with
+transcribing capabilities, providing me with texts to cut-n-paste into
+stuff I need to write. The background is that long drives often bring
+up the urge to write on texts I am working on, which of course is out
+of the question while driving. With the release of
+<a href="https://github.com/openai/whisper/">OpenAI Whisper</a>, this
+seem to be within reach with Free Software, so I decided to give it a
+go. OpenAI Whisper is a Linux based neural network system to read in
+audio files and provide text representation of the speech in that
+audio recording. It handle multiple languages and according to its
+creators even can translate into a different language than the spoken
+one. I have not tested the latter feature. It can either use the CPU
+or a GPU with CUDA support. As far as I can tell, CUDA in practice
+limit that feature to NVidia graphics cards. I have few of those, as
+they do not work great with free software drivers, and have not tested
+the GPU option. While looking into the matter, I did discover some
+work to provide CUDA support on non-NVidia GPUs, and some work with
+the library used by Whisper to port it to other GPUs, but have not
+spent much time looking into GPU support yet. I've so far used an old
+X220 laptop as my test machine, and only transcribed using its
+CPU.</p>
+
+<p>As it from a privacy standpoint is unthinkable to use computers
+under control of someone else (aka a "cloud" service) to transcribe
+ones thoughts and personal notes, I want to run the transcribing
+system locally on my own computers. The only sensible approach to me
+is to make the effort I put into this available for any Linux user and
+to upload the needed packages into Debian. Looking at Debian Bookworm, I
+discovered that only three packages were missing,
+<a href="https://bugs.debian.org/1034307">tiktoken</a>,
+<a href="https://bugs.debian.org/1034144">triton</a>, and
+<a href="https://bugs.debian.org/1034091">openai-whisper</a>. For a while
+I also believed
+<a href="https://bugs.debian.org/1034286">ffmpeg-python</a> was
+needed, but as its
+<a href="https://github.com/kkroening/ffmpeg-python/issues/760">upstream
+seem to have vanished</a> I found it safer
+<a href="https://github.com/openai/whisper/pull/1242">to rewrite
+whisper</a> to stop depending on in than to introduce ffmpeg-python
+into Debian. I decided to place these packages under the umbrella of
+<a href="https://salsa.debian.org/deeplearning-team">the Debian Deep
+Learning Team</a>, which seem like the best team to look after such
+packages. Discussing the topic within the group also made me aware
+that the triton package was already a future dependency of newer
+versions of the torch package being planned, and would be needed after
+Bookworm is released.</p>
+
+<p>All required code packages have been now waiting in
+<a href="https://ftp-master.debian.org/new.html">the Debian NEW
+queue</a> since Wednesday, heading for Debian Experimental until
+Bookworm is released. An unsolved issue is how to handle the neural
+network models used by Whisper. The default behaviour of Whisper is
+to require Internet connectivity and download the model requested to
+<tt>~/.cache/whisper/</tt> on first invocation. This obviously would
+fail <a href="https://people.debian.org/~bap/dfsg-faq.html">the
+deserted island test of free software</a> as the Debian packages would
+be unusable for someone stranded with only the Debian archive and solar
+powered computer on a deserted island.</p>
+
+<p>Because of this, I would love to include the models in the Debian
+mirror system. This is problematic, as the models are very large
+files, which would put a heavy strain on the Debian mirror
+infrastructure around the globe. The strain would be even higher if
+the models change often, which luckily as far as I can tell they do
+not. The small model, which according to its creator is most useful
+for English and in my experience is not doing a great job there
+either, is 462 MiB (deb is 414 MiB). The medium model, which to me
+seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB)
+and the large model is 2.9 GiB (deb is 2.6 GiB). I would assume
+everyone with enough resources would prefer to use the large model for
+highest quality. I believe the models themselves would have to go
+into the non-free part of the Debian archive, as they are not really
+including any useful source code for updating the models. The
+"source", aka the model training set, according to the creators
+consist of "680,000 hours of multilingual and multitask supervised
+data collected from the web", which to me reads material with both
+unknown copyright terms, unavailable to the general public. In other
+words, the source is not available according to the Debian Free
+Software Guidelines and the model should be considered non-free.</p>
+
+<p>I asked the Debian FTP masters for advice regarding uploading a
+model package on their IRC channel, and based on the feedback there it
+is still unclear to me if such package would be accepted into the
+archive. In any case I wrote build rules for a
+<a href="https://salsa.debian.org/deeplearning-team/openai-whisper-model">OpenAI
+Whisper model package</a> and
+<a href="https://github.com/openai/whisper/pull/1257">modified the
+Whisper code base</a> to prefer shared files under <tt>/usr/</tt> and
+<tt>/var/</tt> over user specific files in <tt>~/.cache/whisper/</tt>
+to be able to use these model packages, to prepare for such
+possibility. One solution might be to include only one of the models
+(small or medium, I guess) in the Debian archive, and ask people to
+download the others from the Internet. Not quite sure what to do
+here, and advice is most welcome (use the debian-ai mailing list).</p>
+
+<p>To make it easier to test the new packages while I wait for them to
+clear the NEW queue, I created an APT source targeting bookworm. I
+selected Bookworm instead of Bullseye, even though I know the latter
+would reach more users, is that some of the required dependencies are
+missing from Bullseye and I during this phase of testing did not want
+to backport a lot of packages just to get up and running.</p>
+
+<p>Here is a recipe to run as user root if you want to test OpenAI
+Whisper using Debian packages on your Debian Bookworm installation,
+first adding the APT repository GPG key to the list of trusted keys,
+then setting up the APT repository and finally installing the packages
+and one of the models:</p>