From: Petter Reinholdtsen <pere@hungry.com>
Date: Sun, 23 Apr 2023 07:16:53 +0000 (+0200)
Subject: New post on OpenAI Whisper.
X-Git-Url: http://pere.pagekite.me/gitweb/homepage.git/commitdiff_plain/56b07ae0314a05f01cb791d819f2a06feb28f71d

New post on OpenAI Whisper.
---

diff --git a/blog/data/2023-04-23-whisper-apt-debian.txt b/blog/data/2023-04-23-whisper-apt-debian.txt
new file mode 100644
index 0000000000..44163f4f81
--- /dev/null
+++ b/blog/data/2023-04-23-whisper-apt-debian.txt
@@ -0,0 +1,140 @@
+Title: Speech to text, she APTly whispered, how hard can it be?
+Tags: english, debian, multimedia, video
+Date: 2023-04-23 09:40
+
+<p>While visiting a convention during Eastern, it occurred to me that
+it would be great if I could have a digital Dictaphone with
+transcribing capabilities, providing me with texts to cut-n-paste into
+stuff I need to write.  The background is that long drives often bring
+up the urge to write on texts I am working on, which of course is out
+of the question while driving.  With the release of
+<a href="https://github.com/openai/whisper/">OpenAI Whisper</a>, this
+seem to be within reach with Free Software, so I decided to give it a
+go.  OpenAI Whisper is a Linux based neural network system to read in
+audio files and provide text representation of the speech in that
+audio recording.  It handle multiple languages and according to its
+creators even can translate into a different language than the spoken
+one.  I have not tested the latter feature.  It can either use the CPU
+or a GPU with CODA support.  As far as I can tell, CODA in practice
+limit that feature to NVidia graphics cards.  I have few of those, as
+they do not work great with free software drivers, and have not tested
+the GPU option.  While looking into the matter, I did discover some
+work to provide CODA support on non-NVidia GPUs, and some work with
+the library used by Whisper to port it to other GPUs, but have not
+spent much time looking into GPU support yet.  I've so far used an old
+X220 laptop as my test machine, and only transcribed using its
+CPU.</p>
+
+<p>As it from a privacy standpoint is unthinkable to use computers
+under control of someone else (aka a "cloud" service) to transcribe
+ones thoughts and personal notes, I want to run the transcribing
+system locally on my own computers.  The only sensible approach to me
+is to make the effort I put into this available for any Linux user and
+to upload the needed packages into Debian.  Looking at Debian Bookworm, I
+discovered that only three packages were missing,
+<a href="https://bugs.debian.org/1034307">tiktoken</a>,
+<a href="https://bugs.debian.org/1034144">triton</a>, and
+<a href="https://bugs.debian.org/1034091">openai-whisper</a>.  For a while
+I also believed
+<a href="https://bugs.debian.org/1034286">ffmpeg-python</a> was
+needed, but as its
+<a href="https://github.com/kkroening/ffmpeg-python/issues/760">upstream
+seem to have vanished</a> I found it safer
+<a href="https://github.com/openai/whisper/pull/1242">to rewrite
+whisper</a> to stop depending on in than to introduce ffmpeg-python
+into Debian.  I decided to place these packages under the umbrella of
+<a href="https://salsa.debian.org/deeplearning-team">the Debian Deep
+Learning Team</a>, which seem like the best team to look after such
+packages.  Discussing the topic within the group also made me aware
+that the triton package was already a future dependency of newer
+versions of the torch package being planned, and would be needed after
+Bookworm is released.</p>
+
+<p>All required code packages have been now waiting in
+<a href="https://ftp-master.debian.org/new.html">the Debian NEW
+queue</a> since Wednesday, heading for Debian Experimental until
+Bookworm is released.  An unsolved issue is how to handle the neural
+network models used by Whisper.  The default behaviour of Whisper is
+to require Internet connectivity and download the model requested to
+<tt>~/.cache/whisper/</tt> on first invocation.  This obviously would
+fail <a href="https://people.debian.org/~bap/dfsg-faq.html">the
+deserted island test of free software</a> as the Debian packages would
+be unusable for someone stranded with only the Debian archive and solar
+powered computer on a deserted island.</p>
+
+<p>Because of this, I would love to include the models in the Debian
+mirror system.  This is problematic, as the models are very large
+files, which would put a heavy strain on the Debian mirror
+infrastructure around the globe.  The strain would be even higher if
+the models change often, which luckily as far as I can tell they do
+not.  The small model, which according to its creator is most useful
+for English and in my experience is not doing a great job there
+either, is 462 MiB (deb is 414 MiB).  The medium model, which to me
+seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB)
+and the large model is 2.9 GiB (deb is 2.6 GiB).  I would assume
+everyone with enough resources would prefer to use the large model for
+highest quality.  I believe the models themselves would have to go
+into the non-free part of the Debian archive, as they are not really
+including any useful source code for updating the models.  The
+"source", aka the model training set, according to the creators
+consist of "680,000 hours of multilingual and multitask supervised
+data collected from the web", which to me reads material with both
+unknown copyright terms, unavailable to the general public.  In other
+words, the source is not available according to the Debian Free
+Software Guidelines and the model should be considered non-free.</p>
+
+<p>I asked the Debian FTP masters for advice regarding uploading a
+model package on their IRC channel, and based on the feedback there it
+is still unclear to me if such package would be accepted into the
+archive.  In any case I wrote build rules for a
+<a href="https://salsa.debian.org/deeplearning-team/openai-whisper-model">OpenAI
+Whisper model package</a> and
+<a href="https://github.com/openai/whisper/pull/1257">modified the
+Whisper code base</a> to prefer shared files under <tt>/usr/</tt> and
+<tt>/var/</tt> over user specific files in <tt>~/.cache/whisper/</tt>
+to be able to use these model packages, to prepare for such
+possibility.  One solution might be to include only one of the models
+(small or medium, I guess) in the Debian archive, and ask people to
+download the others from the Internet.  Not quite sure what to do
+here, and advice is most welcome (use the debian-ai mailing list).</p>
+
+<p>To make it easier to test the new packages while I wait for them to
+clear the NEW queue, I created an APT source targeting bookworm.  I
+selected Bookworm instead of Bullseye, even though I know the latter
+would reach more users, is that some of the required dependencies are
+missing from Bullseye and I during this phase of testing did not want
+to backport a lot of packages just to get up and running.</p>
+
+<p>Here is a recipe to run as user root if you want to test OpenAI
+Whisper using Debian packages on your Debian Bookworm installation,
+first adding the APT repository GPG key to the list of trusted keys,
+then setting up the APT repository and finally installing the packages
+and one of the models:</p>
+
+<p><pre>
+curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \
+  -o /etc/apt/trusted.gpg.d/pere-whisper.asc
+mkdir -p /etc/apt/sources.list.d
+cat > /etc/apt/sources.list.d/pere-whisper.list &lt;&lt;EOF
+deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
+deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
+EOF
+apt update
+apt install openai-whisper
+</pre></p>
+
+<p>The package work for me, but have not yet been tested on any other
+computer than my own.  With it, I have been able to (badly) transcribe
+a 2 minute 40 second Norwegian audio clip to test using the small
+model.  This took 11 minutes and around 2.2 GiB of RAM.  Transcribing
+the same file with the medium model gave a accurate text in 77 minutes
+using around 5.2 GiB of RAM.  My test machine had too little memory to
+test the large model, which I believe require 11 GiB of RAM.  In
+short, this now work for me using Debian packages, and I hope it will
+for you and everyone else once the packages enter Debian.</p>
+
+<p>Now I need to start on the audio recording part of this project.</p>
+ 
+<p>As usual, if you use Bitcoin and want to show your support of my
+activities, please send Bitcoin donations to my address
+<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>