blog/data/2023-04-23-whisper-apt-debian.txt

   1 Title: Speech to text, she APTly whispered, how hard can it be?
   2 Tags: english, debian, multimedia, video
   3 Date: 2023-04-23 09:40
   4
   5 <p>While visiting a convention during Easter, it occurred to me that
   6 it would be great if I could have a digital Dictaphone with
   7 transcribing capabilities, providing me with texts to cut-n-paste into
   8 stuff I need to write.  The background is that long drives often bring
   9 up the urge to write on texts I am working on, which of course is out
  10 of the question while driving.  With the release of
  11 <a href="https://github.com/openai/whisper/">OpenAI Whisper</a>, this
  12 seem to be within reach with Free Software, so I decided to give it a
  13 go.  OpenAI Whisper is a Linux based neural network system to read in
  14 audio files and provide text representation of the speech in that
  15 audio recording.  It handle multiple languages and according to its
  16 creators even can translate into a different language than the spoken
  17 one.  I have not tested the latter feature.  It can either use the CPU
  18 or a GPU with CUDA support.  As far as I can tell, CUDA in practice
  19 limit that feature to NVidia graphics cards.  I have few of those, as
  20 they do not work great with free software drivers, and have not tested
  21 the GPU option.  While looking into the matter, I did discover some
  22 work to provide CUDA support on non-NVidia GPUs, and some work with
  23 the library used by Whisper to port it to other GPUs, but have not
  24 spent much time looking into GPU support yet.  I've so far used an old
  25 X220 laptop as my test machine, and only transcribed using its
  26 CPU.</p>
  27
  28 <p>As it from a privacy standpoint is unthinkable to use computers
  29 under control of someone else (aka a "cloud" service) to transcribe
  30 ones thoughts and personal notes, I want to run the transcribing
  31 system locally on my own computers.  The only sensible approach to me
  32 is to make the effort I put into this available for any Linux user and
  33 to upload the needed packages into Debian.  Looking at Debian Bookworm, I
  34 discovered that only three packages were missing,
  35 <a href="https://bugs.debian.org/1034307">tiktoken</a>,
  36 <a href="https://bugs.debian.org/1034144">triton</a>, and
  37 <a href="https://bugs.debian.org/1034091">openai-whisper</a>.  For a while
  38 I also believed
  39 <a href="https://bugs.debian.org/1034286">ffmpeg-python</a> was
  40 needed, but as its
  41 <a href="https://github.com/kkroening/ffmpeg-python/issues/760">upstream
  42 seem to have vanished</a> I found it safer
  43 <a href="https://github.com/openai/whisper/pull/1242">to rewrite
  44 whisper</a> to stop depending on in than to introduce ffmpeg-python
  45 into Debian.  I decided to place these packages under the umbrella of
  46 <a href="https://salsa.debian.org/deeplearning-team">the Debian Deep
  47 Learning Team</a>, which seem like the best team to look after such
  48 packages.  Discussing the topic within the group also made me aware
  49 that the triton package was already a future dependency of newer
  50 versions of the torch package being planned, and would be needed after
  51 Bookworm is released.</p>
  52
  53 <p>All required code packages have been now waiting in
  54 <a href="https://ftp-master.debian.org/new.html">the Debian NEW
  55 queue</a> since Wednesday, heading for Debian Experimental until
  56 Bookworm is released.  An unsolved issue is how to handle the neural
  57 network models used by Whisper.  The default behaviour of Whisper is
  58 to require Internet connectivity and download the model requested to
  59 <tt>~/.cache/whisper/</tt> on first invocation.  This obviously would
  60 fail <a href="https://people.debian.org/~bap/dfsg-faq.html">the
  61 deserted island test of free software</a> as the Debian packages would
  62 be unusable for someone stranded with only the Debian archive and solar
  63 powered computer on a deserted island.</p>
  64
  65 <p>Because of this, I would love to include the models in the Debian
  66 mirror system.  This is problematic, as the models are very large
  67 files, which would put a heavy strain on the Debian mirror
  68 infrastructure around the globe.  The strain would be even higher if
  69 the models change often, which luckily as far as I can tell they do
  70 not.  The small model, which according to its creator is most useful
  71 for English and in my experience is not doing a great job there
  72 either, is 462 MiB (deb is 414 MiB).  The medium model, which to me
  73 seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB)
  74 and the large model is 2.9 GiB (deb is 2.6 GiB).  I would assume
  75 everyone with enough resources would prefer to use the large model for
  76 highest quality.  I believe the models themselves would have to go
  77 into the non-free part of the Debian archive, as they are not really
  78 including any useful source code for updating the models.  The
  79 "source", aka the model training set, according to the creators
  80 consist of "680,000 hours of multilingual and multitask supervised
  81 data collected from the web", which to me reads material with both
  82 unknown copyright terms, unavailable to the general public.  In other
  83 words, the source is not available according to the Debian Free
  84 Software Guidelines and the model should be considered non-free.</p>
  85
  86 <p>I asked the Debian FTP masters for advice regarding uploading a
  87 model package on their IRC channel, and based on the feedback there it
  88 is still unclear to me if such package would be accepted into the
  89 archive.  In any case I wrote build rules for a
  90 <a href="https://salsa.debian.org/deeplearning-team/openai-whisper-model">OpenAI
  91 Whisper model package</a> and
  92 <a href="https://github.com/openai/whisper/pull/1257">modified the
  93 Whisper code base</a> to prefer shared files under <tt>/usr/</tt> and
  94 <tt>/var/</tt> over user specific files in <tt>~/.cache/whisper/</tt>
  95 to be able to use these model packages, to prepare for such
  96 possibility.  One solution might be to include only one of the models
  97 (small or medium, I guess) in the Debian archive, and ask people to
  98 download the others from the Internet.  Not quite sure what to do
  99 here, and advice is most welcome (use the debian-ai mailing list).</p>
 100
 101 <p>To make it easier to test the new packages while I wait for them to
 102 clear the NEW queue, I created an APT source targeting bookworm.  I
 103 selected Bookworm instead of Bullseye, even though I know the latter
 104 would reach more users, is that some of the required dependencies are
 105 missing from Bullseye and I during this phase of testing did not want
 106 to backport a lot of packages just to get up and running.</p>
 107
 108 <p>Here is a recipe to run as user root if you want to test OpenAI
 109 Whisper using Debian packages on your Debian Bookworm installation,
 110 first adding the APT repository GPG key to the list of trusted keys,
 111 then setting up the APT repository and finally installing the packages
 112 and one of the models:</p>
 113
 114 <p><pre>
 115 curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \
 116   -o /etc/apt/trusted.gpg.d/pere-whisper.asc
 117 mkdir -p /etc/apt/sources.list.d
 118 cat > /etc/apt/sources.list.d/pere-whisper.list &lt;&lt;EOF
 119 deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
 120 deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
 121 EOF
 122 apt update
 123 apt install openai-whisper
 124 </pre></p>
 125
 126 <p>The package work for me, but have not yet been tested on any other
 127 computer than my own.  With it, I have been able to (badly) transcribe
 128 a 2 minute 40 second Norwegian audio clip to test using the small
 129 model.  This took 11 minutes and around 2.2 GiB of RAM.  Transcribing
 130 the same file with the medium model gave a accurate text in 77 minutes
 131 using around 5.2 GiB of RAM.  My test machine had too little memory to
 132 test the large model, which I believe require 11 GiB of RAM.  In
 133 short, this now work for me using Debian packages, and I hope it will
 134 for you and everyone else once the packages enter Debian.</p>
 135
 136 <p>Now I can start on the audio recording part of this project.</p>
 137
 138 <p>As usual, if you use Bitcoin and want to show your support of my
 139 activities, please send Bitcoin donations to my address
 140 <b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>