<link>https://people.skolelinux.org/pere/blog/</link>
+ <item>
+ <title>Speech to text, she APTly whispered, how hard can it be?</title>
+ <link>https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html</link>
+ <guid isPermaLink="true">https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html</guid>
+ <pubDate>Sun, 23 Apr 2023 09:40:00 +0200</pubDate>
+ <description><p>While visiting a convention during Eastern, it occurred to me that
+it would be great if I could have a digital Dictaphone with
+transcribing capabilities, providing me with texts to cut-n-paste into
+stuff I need to write. The background is that long drives often bring
+up the urge to write on texts I am working on, which of course is out
+of the question while driving. With the release of
+<a href="https://github.com/openai/whisper/">OpenAI Whisper</a>, this
+seem to be within reach with Free Software, so I decided to give it a
+go. OpenAI Whisper is a Linux based neural network system to read in
+audio files and provide text representation of the speech in that
+audio recording. It handle multiple languages and according to its
+creators even can translate into a different language than the spoken
+one. I have not tested the latter feature. It can either use the CPU
+or a GPU with CODA support. As far as I can tell, CODA in practice
+limit that feature to NVidia graphics cards. I have few of those, as
+they do not work great with free software drivers, and have not tested
+the GPU option. While looking into the matter, I did discover some
+work to provide CODA support on non-NVidia GPUs, and some work with
+the library used by Whisper to port it to other GPUs, but have not
+spent much time looking into GPU support yet. I've so far used an old
+X220 laptop as my test machine, and only transcribed using its
+CPU.</p>
+
+<p>As it from a privacy standpoint is unthinkable to use computers
+under control of someone else (aka a "cloud" service) to transcribe
+ones thoughts and personal notes, I want to run the transcribing
+system locally on my own computers. The only sensible approach to me
+is to make the effort I put into this available for any Linux user and
+to upload the needed packages into Debian. Looking at Debian Bookworm, I
+discovered that only three packages were missing,
+<a href="https://bugs.debian.org/1034307">tiktoken</a>,
+<a href="https://bugs.debian.org/1034144">triton</a>, and
+<a href="https://bugs.debian.org/1034091">openai-whisper</a>. For a while
+I also believed
+<a href="https://bugs.debian.org/1034286">ffmpeg-python</a> was
+needed, but as its
+<a href="https://github.com/kkroening/ffmpeg-python/issues/760">upstream
+seem to have vanished</a> I found it safer
+<a href="https://github.com/openai/whisper/pull/1242">to rewrite
+whisper</a> to stop depending on in than to introduce ffmpeg-python
+into Debian. I decided to place these packages under the umbrella of
+<a href="https://salsa.debian.org/deeplearning-team">the Debian Deep
+Learning Team</a>, which seem like the best team to look after such
+packages. Discussing the topic within the group also made me aware
+that the triton package was already a future dependency of newer
+versions of the torch package being planned, and would be needed after
+Bookworm is released.</p>
+
+<p>All required code packages have been now waiting in
+<a href="https://ftp-master.debian.org/new.html">the Debian NEW
+queue</a> since Wednesday, heading for Debian Experimental until
+Bookworm is released. An unsolved issue is how to handle the neural
+network models used by Whisper. The default behaviour of Whisper is
+to require Internet connectivity and download the model requested to
+<tt>~/.cache/whisper/</tt> on first invocation. This obviously would
+fail <a href="https://people.debian.org/~bap/dfsg-faq.html">the
+deserted island test of free software</a> as the Debian packages would
+be unusable for someone stranded with only the Debian archive and solar
+powered computer on a deserted island.</p>
+
+<p>Because of this, I would love to include the models in the Debian
+mirror system. This is problematic, as the models are very large
+files, which would put a heavy strain on the Debian mirror
+infrastructure around the globe. The strain would be even higher if
+the models change often, which luckily as far as I can tell they do
+not. The small model, which according to its creator is most useful
+for English and in my experience is not doing a great job there
+either, is 462 MiB (deb is 414 MiB). The medium model, which to me
+seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB)
+and the large model is 2.9 GiB (deb is 2.6 GiB). I would assume
+everyone with enough resources would prefer to use the large model for
+highest quality. I believe the models themselves would have to go
+into the non-free part of the Debian archive, as they are not really
+including any useful source code for updating the models. The
+"source", aka the model training set, according to the creators
+consist of "680,000 hours of multilingual and multitask supervised
+data collected from the web", which to me reads material with both
+unknown copyright terms, unavailable to the general public. In other
+words, the source is not available according to the Debian Free
+Software Guidelines and the model should be considered non-free.</p>
+
+<p>I asked the Debian FTP masters for advice regarding uploading a
+model package on their IRC channel, and based on the feedback there it
+is still unclear to me if such package would be accepted into the
+archive. In any case I wrote build rules for a
+<a href="https://salsa.debian.org/deeplearning-team/openai-whisper-model">OpenAI
+Whisper model package</a> and
+<a href="https://github.com/openai/whisper/pull/1257">modified the
+Whisper code base</a> to prefer shared files under <tt>/usr/</tt> and
+<tt>/var/</tt> over user specific files in <tt>~/.cache/whisper/</tt>
+to be able to use these model packages, to prepare for such
+possibility. One solution might be to include only one of the models
+(small or medium, I guess) in the Debian archive, and ask people to
+download the others from the Internet. Not quite sure what to do
+here, and advice is most welcome (use the debian-ai mailing list).</p>
+
+<p>To make it easier to test the new packages while I wait for them to
+clear the NEW queue, I created an APT source targeting bookworm. I
+selected Bookworm instead of Bullseye, even though I know the latter
+would reach more users, is that some of the required dependencies are
+missing from Bullseye and I during this phase of testing did not want
+to backport a lot of packages just to get up and running.</p>
+
+<p>Here is a recipe to run as user root if you want to test OpenAI
+Whisper using Debian packages on your Debian Bookworm installation,
+first adding the APT repository GPG key to the list of trusted keys,
+then setting up the APT repository and finally installing the packages
+and one of the models:</p>
+
+<p><pre>
+curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \
+ -o /etc/apt/trusted.gpg.d/pere-whisper.asc
+mkdir -p /etc/apt/sources.list.d
+cat > /etc/apt/sources.list.d/pere-whisper.list &lt;&lt;EOF
+deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
+deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
+EOF
+apt update
+apt install openai-whisper
+</pre></p>
+
+<p>The package work for me, but have not yet been tested on any other
+computer than my own. With it, I have been able to (badly) transcribe
+a 2 minute 40 second Norwegian audio clip to test using the small
+model. This took 11 minutes and around 2.2 GiB of RAM. Transcribing
+the same file with the medium model gave a accurate text in 77 minutes
+using around 5.2 GiB of RAM. My test machine had too little memory to
+test the large model, which I believe require 11 GiB of RAM. In
+short, this now work for me using Debian packages, and I hope it will
+for you and everyone else once the packages enter Debian.</p>
+
+<p>Now I can start on the audio recording part of this project.</p>
+
+<p>As usual, if you use Bitcoin and want to show your support of my
+activities, please send Bitcoin donations to my address
+<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>
+</description>
+ </item>
+
<item>
<title>rtlsdr-scanner, software defined radio frequency scanner for Linux - nice free software</title>
<link>https://people.skolelinux.org/pere/blog/rtlsdr_scanner__software_defined_radio_frequency_scanner_for_Linux____nice_free_software.html</link>