blog/archive/2023/04/04.rss

   1 <?xml version="1.0" encoding="ISO-8859-1"?>
   2 <rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/'>
   3         <channel>
   4                 <title>Petter Reinholdtsen - Entries from April 2023</title>
   5                 <description>Entries from April 2023</description>
   6                 <link>https://www.hungry.com/~pere/blog/</link>
   7
   8
   9         <item>
  10                 <title>Speech to text, she APTly whispered, how hard can it be?</title>
  11                 <link>https://www.hungry.com/~pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html</link>
  12                 <guid isPermaLink="true">https://www.hungry.com/~pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html</guid>
  13                 <pubDate>Sun, 23 Apr 2023 09:40:00 +0200</pubDate>
  14                 <description>&lt;p&gt;While visiting a convention during Easter, it occurred to me that
  15 it would be great if I could have a digital Dictaphone with
  16 transcribing capabilities, providing me with texts to cut-n-paste into
  17 stuff I need to write.  The background is that long drives often bring
  18 up the urge to write on texts I am working on, which of course is out
  19 of the question while driving.  With the release of
  20 &lt;a href=&quot;https://github.com/openai/whisper/&quot;&gt;OpenAI Whisper&lt;/a&gt;, this
  21 seem to be within reach with Free Software, so I decided to give it a
  22 go.  OpenAI Whisper is a Linux based neural network system to read in
  23 audio files and provide text representation of the speech in that
  24 audio recording.  It handle multiple languages and according to its
  25 creators even can translate into a different language than the spoken
  26 one.  I have not tested the latter feature.  It can either use the CPU
  27 or a GPU with CUDA support.  As far as I can tell, CUDA in practice
  28 limit that feature to NVidia graphics cards.  I have few of those, as
  29 they do not work great with free software drivers, and have not tested
  30 the GPU option.  While looking into the matter, I did discover some
  31 work to provide CUDA support on non-NVidia GPUs, and some work with
  32 the library used by Whisper to port it to other GPUs, but have not
  33 spent much time looking into GPU support yet.  I&#39;ve so far used an old
  34 X220 laptop as my test machine, and only transcribed using its
  35 CPU.&lt;/p&gt;
  36
  37 &lt;p&gt;As it from a privacy standpoint is unthinkable to use computers
  38 under control of someone else (aka a &quot;cloud&quot; service) to transcribe
  39 ones thoughts and personal notes, I want to run the transcribing
  40 system locally on my own computers.  The only sensible approach to me
  41 is to make the effort I put into this available for any Linux user and
  42 to upload the needed packages into Debian.  Looking at Debian Bookworm, I
  43 discovered that only three packages were missing,
  44 &lt;a href=&quot;https://bugs.debian.org/1034307&quot;&gt;tiktoken&lt;/a&gt;,
  45 &lt;a href=&quot;https://bugs.debian.org/1034144&quot;&gt;triton&lt;/a&gt;, and
  46 &lt;a href=&quot;https://bugs.debian.org/1034091&quot;&gt;openai-whisper&lt;/a&gt;.  For a while
  47 I also believed
  48 &lt;a href=&quot;https://bugs.debian.org/1034286&quot;&gt;ffmpeg-python&lt;/a&gt; was
  49 needed, but as its
  50 &lt;a href=&quot;https://github.com/kkroening/ffmpeg-python/issues/760&quot;&gt;upstream
  51 seem to have vanished&lt;/a&gt; I found it safer
  52 &lt;a href=&quot;https://github.com/openai/whisper/pull/1242&quot;&gt;to rewrite
  53 whisper&lt;/a&gt; to stop depending on in than to introduce ffmpeg-python
  54 into Debian.  I decided to place these packages under the umbrella of
  55 &lt;a href=&quot;https://salsa.debian.org/deeplearning-team&quot;&gt;the Debian Deep
  56 Learning Team&lt;/a&gt;, which seem like the best team to look after such
  57 packages.  Discussing the topic within the group also made me aware
  58 that the triton package was already a future dependency of newer
  59 versions of the torch package being planned, and would be needed after
  60 Bookworm is released.&lt;/p&gt;
  61
  62 &lt;p&gt;All required code packages have been now waiting in
  63 &lt;a href=&quot;https://ftp-master.debian.org/new.html&quot;&gt;the Debian NEW
  64 queue&lt;/a&gt; since Wednesday, heading for Debian Experimental until
  65 Bookworm is released.  An unsolved issue is how to handle the neural
  66 network models used by Whisper.  The default behaviour of Whisper is
  67 to require Internet connectivity and download the model requested to
  68 &lt;tt&gt;~/.cache/whisper/&lt;/tt&gt; on first invocation.  This obviously would
  69 fail &lt;a href=&quot;https://people.debian.org/~bap/dfsg-faq.html&quot;&gt;the
  70 deserted island test of free software&lt;/a&gt; as the Debian packages would
  71 be unusable for someone stranded with only the Debian archive and solar
  72 powered computer on a deserted island.&lt;/p&gt;
  73
  74 &lt;p&gt;Because of this, I would love to include the models in the Debian
  75 mirror system.  This is problematic, as the models are very large
  76 files, which would put a heavy strain on the Debian mirror
  77 infrastructure around the globe.  The strain would be even higher if
  78 the models change often, which luckily as far as I can tell they do
  79 not.  The small model, which according to its creator is most useful
  80 for English and in my experience is not doing a great job there
  81 either, is 462 MiB (deb is 414 MiB).  The medium model, which to me
  82 seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB)
  83 and the large model is 2.9 GiB (deb is 2.6 GiB).  I would assume
  84 everyone with enough resources would prefer to use the large model for
  85 highest quality.  I believe the models themselves would have to go
  86 into the non-free part of the Debian archive, as they are not really
  87 including any useful source code for updating the models.  The
  88 &quot;source&quot;, aka the model training set, according to the creators
  89 consist of &quot;680,000 hours of multilingual and multitask supervised
  90 data collected from the web&quot;, which to me reads material with both
  91 unknown copyright terms, unavailable to the general public.  In other
  92 words, the source is not available according to the Debian Free
  93 Software Guidelines and the model should be considered non-free.&lt;/p&gt;
  94
  95 &lt;p&gt;I asked the Debian FTP masters for advice regarding uploading a
  96 model package on their IRC channel, and based on the feedback there it
  97 is still unclear to me if such package would be accepted into the
  98 archive.  In any case I wrote build rules for a
  99 &lt;a href=&quot;https://salsa.debian.org/deeplearning-team/openai-whisper-model&quot;&gt;OpenAI
 100 Whisper model package&lt;/a&gt; and
 101 &lt;a href=&quot;https://github.com/openai/whisper/pull/1257&quot;&gt;modified the
 102 Whisper code base&lt;/a&gt; to prefer shared files under &lt;tt&gt;/usr/&lt;/tt&gt; and
 103 &lt;tt&gt;/var/&lt;/tt&gt; over user specific files in &lt;tt&gt;~/.cache/whisper/&lt;/tt&gt;
 104 to be able to use these model packages, to prepare for such
 105 possibility.  One solution might be to include only one of the models
 106 (small or medium, I guess) in the Debian archive, and ask people to
 107 download the others from the Internet.  Not quite sure what to do
 108 here, and advice is most welcome (use the debian-ai mailing list).&lt;/p&gt;
 109
 110 &lt;p&gt;To make it easier to test the new packages while I wait for them to
 111 clear the NEW queue, I created an APT source targeting bookworm.  I
 112 selected Bookworm instead of Bullseye, even though I know the latter
 113 would reach more users, is that some of the required dependencies are
 114 missing from Bullseye and I during this phase of testing did not want
 115 to backport a lot of packages just to get up and running.&lt;/p&gt;
 116
 117 &lt;p&gt;Here is a recipe to run as user root if you want to test OpenAI
 118 Whisper using Debian packages on your Debian Bookworm installation,
 119 first adding the APT repository GPG key to the list of trusted keys,
 120 then setting up the APT repository and finally installing the packages
 121 and one of the models:&lt;/p&gt;
 122
 123 &lt;p&gt;&lt;pre&gt;
 124 curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \
 125   -o /etc/apt/trusted.gpg.d/pere-whisper.asc
 126 mkdir -p /etc/apt/sources.list.d
 127 cat &gt; /etc/apt/sources.list.d/pere-whisper.list &amp;lt;&amp;lt;EOF
 128 deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
 129 deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
 130 EOF
 131 apt update
 132 apt install openai-whisper
 133 &lt;/pre&gt;&lt;/p&gt;
 134
 135 &lt;p&gt;The package work for me, but have not yet been tested on any other
 136 computer than my own.  With it, I have been able to (badly) transcribe
 137 a 2 minute 40 second Norwegian audio clip to test using the small
 138 model.  This took 11 minutes and around 2.2 GiB of RAM.  Transcribing
 139 the same file with the medium model gave a accurate text in 77 minutes
 140 using around 5.2 GiB of RAM.  My test machine had too little memory to
 141 test the large model, which I believe require 11 GiB of RAM.  In
 142 short, this now work for me using Debian packages, and I hope it will
 143 for you and everyone else once the packages enter Debian.&lt;/p&gt;
 144
 145 &lt;p&gt;Now I can start on the audio recording part of this project.&lt;/p&gt;
 146
 147 &lt;p&gt;As usual, if you use Bitcoin and want to show your support of my
 148 activities, please send Bitcoin donations to my address
 149 &lt;b&gt;&lt;a href=&quot;bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&quot;&gt;15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&lt;/a&gt;&lt;/b&gt;.&lt;/p&gt;
 150 </description>
 151         </item>
 152
 153         <item>
 154                 <title>rtlsdr-scanner, software defined radio frequency scanner for Linux  - nice free software</title>
 155                 <link>https://www.hungry.com/~pere/blog/rtlsdr_scanner__software_defined_radio_frequency_scanner_for_Linux____nice_free_software.html</link>
 156                 <guid isPermaLink="true">https://www.hungry.com/~pere/blog/rtlsdr_scanner__software_defined_radio_frequency_scanner_for_Linux____nice_free_software.html</guid>
 157                 <pubDate>Fri, 7 Apr 2023 23:10:00 +0200</pubDate>
 158                 <description>&lt;p&gt;Today I finally found time to track down a useful radio frequency
 159 scanner for my software defined radio.  Just for fun I tried to locate
 160 the radios used in the areas, and a good start would be to scan all
 161 the frequencies to see what is in use.  I&#39;ve tried to find a useful
 162 program earlier, but ran out of time before I managed to find a useful
 163 tool.  This time I was more successful, and after a few false leads I
 164 found a description of
 165 &lt;a href=&quot;https://www.kali.org/tools/rtlsdr-scanner/&quot;&gt;rtlsdr-scanner
 166 over at the Kali site&lt;/a&gt;, and was able to track down
 167 &lt;a href=&quot;https://gitlab.com/kalilinux/packages/rtlsdr-scanner.git&quot;&gt;the
 168 Kali package git repository&lt;/a&gt; to build a deb package for the
 169 scanner.  Sadly the package is missing from the Debian project itself,
 170 at least in Debian Bullseye.  Two runtime dependencies,
 171 &lt;a href=&quot;https://gitlab.com/kalilinux/packages/python-visvis.git&quot;&gt;python-visvis&lt;/a&gt;
 172 and
 173 &lt;a href=&quot;https://gitlab.com/kalilinux/packages/python-rtlsdr.git&quot;&gt;python-rtlsdr&lt;/a&gt;
 174 had to be built and installed separately.  Luckily &#39;&lt;tt&gt;gbp
 175 buildpackage&lt;/tt&gt;&#39; handled them just fine and no further packages had
 176 to be manually built.  The end result worked out of the box after
 177 installation.&lt;/p&gt;
 178
 179 &lt;p&gt;My initial scans for FM channels worked just fine, so I knew the
 180 scanner was functioning.  But when I tried to scan every frequency
 181 from 100 to 1000 MHz, the program stopped unexpectedly near the
 182 completion.  After some debugging I discovered USB software radio I
 183 used rejected frequencies above 948 MHz, triggering a unreported
 184 exception breaking the scan.  Changing the scan to end at 957 worked
 185 better.  I similarly found the lower limit to be around 15, and ended
 186 up with the following full scan:&lt;/p&gt;
 187
 188 &lt;p&gt;&lt;a href=&quot;https://people.skolelinux.org/pere/blog/images/2023-04-07-radio-freq-scanning.png&quot;&gt;&lt;img src=&quot;https://people.skolelinux.org/pere/blog/images/2023-04-07-radio-freq-scanning.png&quot; width=&quot;100%&quot;&gt;&lt;/a&gt;&lt;/p&gt;
 189
 190 &lt;p&gt;Saving the scan did not work, but exporting it as a CSV file worked
 191 just fine.  I ended up with around 477k CVS lines with dB level for
 192 the given frequency.&lt;/p&gt;
 193
 194 &lt;p&gt;The save failure seem to be a missing UTF-8 encoding issue in the
 195 python code.  Will see if I can find time to send a patch
 196 &lt;a href=&quot;https://github.com/CdeMills/RTLSDR-Scanner/&quot;&gt;upstream&lt;/a&gt;
 197 later to fix this exception:&lt;/p&gt;
 198
 199 &lt;pre&gt;
 200 Traceback (most recent call last):
 201   File &quot;/usr/lib/python3/dist-packages/rtlsdr_scanner/main_window.py&quot;, line 485, in __on_save
 202     save_plot(fullName, self.scanInfo, self.spectrum, self.locations)
 203   File &quot;/usr/lib/python3/dist-packages/rtlsdr_scanner/file.py&quot;, line 408, in save_plot
 204     handle.write(json.dumps(data, indent=4))
 205 TypeError: a bytes-like object is required, not &#39;str&#39;
 206 Traceback (most recent call last):
 207   File &quot;/usr/lib/python3/dist-packages/rtlsdr_scanner/main_window.py&quot;, line 485, in __on_save
 208     save_plot(fullName, self.scanInfo, self.spectrum, self.locations)
 209   File &quot;/usr/lib/python3/dist-packages/rtlsdr_scanner/file.py&quot;, line 408, in save_plot
 210     handle.write(json.dumps(data, indent=4))
 211 TypeError: a bytes-like object is required, not &#39;str&#39;
 212 &lt;/pre&gt;
 213
 214 &lt;p&gt;As usual, if you use Bitcoin and want to show your support of my
 215 activities, please send Bitcoin donations to my address
 216 &lt;b&gt;&lt;a href=&quot;bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&quot;&gt;15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&lt;/a&gt;&lt;/b&gt;.&lt;/p&gt;
 217 </description>
 218         </item>
 219
 220         </channel>
 221 </rss>