1 <?xml version=
"1.0" encoding=
"ISO-8859-1"?>
2 <rss version='
2.0' xmlns:lj='http://www.livejournal.org/rss/lj/
1.0/'
>
4 <title>Petter Reinholdtsen - Entries from April
2023</title>
5 <description>Entries from April
2023</description>
6 <link>https://www.hungry.com/~pere/blog/
</link>
10 <title>Speech to text, she APTly whispered, how hard can it be?
</title>
11 <link>https://www.hungry.com/~pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html
</link>
12 <guid isPermaLink=
"true">https://www.hungry.com/~pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html
</guid>
13 <pubDate>Sun,
23 Apr
2023 09:
40:
00 +
0200</pubDate>
14 <description><p
>While visiting a convention during Easter, it occurred to me that
15 it would be great if I could have a digital Dictaphone with
16 transcribing capabilities, providing me with texts to cut-n-paste into
17 stuff I need to write. The background is that long drives often bring
18 up the urge to write on texts I am working on, which of course is out
19 of the question while driving. With the release of
20 <a href=
"https://github.com/openai/whisper/
">OpenAI Whisper
</a
>, this
21 seem to be within reach with Free Software, so I decided to give it a
22 go. OpenAI Whisper is a Linux based neural network system to read in
23 audio files and provide text representation of the speech in that
24 audio recording. It handle multiple languages and according to its
25 creators even can translate into a different language than the spoken
26 one. I have not tested the latter feature. It can either use the CPU
27 or a GPU with CUDA support. As far as I can tell, CUDA in practice
28 limit that feature to NVidia graphics cards. I have few of those, as
29 they do not work great with free software drivers, and have not tested
30 the GPU option. While looking into the matter, I did discover some
31 work to provide CUDA support on non-NVidia GPUs, and some work with
32 the library used by Whisper to port it to other GPUs, but have not
33 spent much time looking into GPU support yet. I
've so far used an old
34 X220 laptop as my test machine, and only transcribed using its
37 <p
>As it from a privacy standpoint is unthinkable to use computers
38 under control of someone else (aka a
"cloud
" service) to transcribe
39 ones thoughts and personal notes, I want to run the transcribing
40 system locally on my own computers. The only sensible approach to me
41 is to make the effort I put into this available for any Linux user and
42 to upload the needed packages into Debian. Looking at Debian Bookworm, I
43 discovered that only three packages were missing,
44 <a href=
"https://bugs.debian.org/
1034307">tiktoken
</a
>,
45 <a href=
"https://bugs.debian.org/
1034144">triton
</a
>, and
46 <a href=
"https://bugs.debian.org/
1034091">openai-whisper
</a
>. For a while
48 <a href=
"https://bugs.debian.org/
1034286">ffmpeg-python
</a
> was
50 <a href=
"https://github.com/kkroening/ffmpeg-python/issues/
760">upstream
51 seem to have vanished
</a
> I found it safer
52 <a href=
"https://github.com/openai/whisper/pull/
1242">to rewrite
53 whisper
</a
> to stop depending on in than to introduce ffmpeg-python
54 into Debian. I decided to place these packages under the umbrella of
55 <a href=
"https://salsa.debian.org/deeplearning-team
">the Debian Deep
56 Learning Team
</a
>, which seem like the best team to look after such
57 packages. Discussing the topic within the group also made me aware
58 that the triton package was already a future dependency of newer
59 versions of the torch package being planned, and would be needed after
60 Bookworm is released.
</p
>
62 <p
>All required code packages have been now waiting in
63 <a href=
"https://ftp-master.debian.org/new.html
">the Debian NEW
64 queue
</a
> since Wednesday, heading for Debian Experimental until
65 Bookworm is released. An unsolved issue is how to handle the neural
66 network models used by Whisper. The default behaviour of Whisper is
67 to require Internet connectivity and download the model requested to
68 <tt
>~/.cache/whisper/
</tt
> on first invocation. This obviously would
69 fail
<a href=
"https://people.debian.org/~bap/dfsg-faq.html
">the
70 deserted island test of free software
</a
> as the Debian packages would
71 be unusable for someone stranded with only the Debian archive and solar
72 powered computer on a deserted island.
</p
>
74 <p
>Because of this, I would love to include the models in the Debian
75 mirror system. This is problematic, as the models are very large
76 files, which would put a heavy strain on the Debian mirror
77 infrastructure around the globe. The strain would be even higher if
78 the models change often, which luckily as far as I can tell they do
79 not. The small model, which according to its creator is most useful
80 for English and in my experience is not doing a great job there
81 either, is
462 MiB (deb is
414 MiB). The medium model, which to me
82 seem to handle English speech fairly well is
1.5 GiB (deb is
1.3 GiB)
83 and the large model is
2.9 GiB (deb is
2.6 GiB). I would assume
84 everyone with enough resources would prefer to use the large model for
85 highest quality. I believe the models themselves would have to go
86 into the non-free part of the Debian archive, as they are not really
87 including any useful source code for updating the models. The
88 "source
", aka the model training set, according to the creators
89 consist of
"680,
000 hours of multilingual and multitask supervised
90 data collected from the web
", which to me reads material with both
91 unknown copyright terms, unavailable to the general public. In other
92 words, the source is not available according to the Debian Free
93 Software Guidelines and the model should be considered non-free.
</p
>
95 <p
>I asked the Debian FTP masters for advice regarding uploading a
96 model package on their IRC channel, and based on the feedback there it
97 is still unclear to me if such package would be accepted into the
98 archive. In any case I wrote build rules for a
99 <a href=
"https://salsa.debian.org/deeplearning-team/openai-whisper-model
">OpenAI
100 Whisper model package
</a
> and
101 <a href=
"https://github.com/openai/whisper/pull/
1257">modified the
102 Whisper code base
</a
> to prefer shared files under
<tt
>/usr/
</tt
> and
103 <tt
>/var/
</tt
> over user specific files in
<tt
>~/.cache/whisper/
</tt
>
104 to be able to use these model packages, to prepare for such
105 possibility. One solution might be to include only one of the models
106 (small or medium, I guess) in the Debian archive, and ask people to
107 download the others from the Internet. Not quite sure what to do
108 here, and advice is most welcome (use the debian-ai mailing list).
</p
>
110 <p
>To make it easier to test the new packages while I wait for them to
111 clear the NEW queue, I created an APT source targeting bookworm. I
112 selected Bookworm instead of Bullseye, even though I know the latter
113 would reach more users, is that some of the required dependencies are
114 missing from Bullseye and I during this phase of testing did not want
115 to backport a lot of packages just to get up and running.
</p
>
117 <p
>Here is a recipe to run as user root if you want to test OpenAI
118 Whisper using Debian packages on your Debian Bookworm installation,
119 first adding the APT repository GPG key to the list of trusted keys,
120 then setting up the APT repository and finally installing the packages
121 and one of the models:
</p
>
124 curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \
125 -o /etc/apt/trusted.gpg.d/pere-whisper.asc
126 mkdir -p /etc/apt/sources.list.d
127 cat
> /etc/apt/sources.list.d/pere-whisper.list
&lt;
&lt;EOF
128 deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
129 deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
132 apt install openai-whisper
133 </pre
></p
>
135 <p
>The package work for me, but have not yet been tested on any other
136 computer than my own. With it, I have been able to (badly) transcribe
137 a
2 minute
40 second Norwegian audio clip to test using the small
138 model. This took
11 minutes and around
2.2 GiB of RAM. Transcribing
139 the same file with the medium model gave a accurate text in
77 minutes
140 using around
5.2 GiB of RAM. My test machine had too little memory to
141 test the large model, which I believe require
11 GiB of RAM. In
142 short, this now work for me using Debian packages, and I hope it will
143 for you and everyone else once the packages enter Debian.
</p
>
145 <p
>Now I can start on the audio recording part of this project.
</p
>
147 <p
>As usual, if you use Bitcoin and want to show your support of my
148 activities, please send Bitcoin donations to my address
149 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>
154 <title>rtlsdr-scanner, software defined radio frequency scanner for Linux - nice free software
</title>
155 <link>https://www.hungry.com/~pere/blog/rtlsdr_scanner__software_defined_radio_frequency_scanner_for_Linux____nice_free_software.html
</link>
156 <guid isPermaLink=
"true">https://www.hungry.com/~pere/blog/rtlsdr_scanner__software_defined_radio_frequency_scanner_for_Linux____nice_free_software.html
</guid>
157 <pubDate>Fri,
7 Apr
2023 23:
10:
00 +
0200</pubDate>
158 <description><p
>Today I finally found time to track down a useful radio frequency
159 scanner for my software defined radio. Just for fun I tried to locate
160 the radios used in the areas, and a good start would be to scan all
161 the frequencies to see what is in use. I
've tried to find a useful
162 program earlier, but ran out of time before I managed to find a useful
163 tool. This time I was more successful, and after a few false leads I
164 found a description of
165 <a href=
"https://www.kali.org/tools/rtlsdr-scanner/
">rtlsdr-scanner
166 over at the Kali site
</a
>, and was able to track down
167 <a href=
"https://gitlab.com/kalilinux/packages/rtlsdr-scanner.git
">the
168 Kali package git repository
</a
> to build a deb package for the
169 scanner. Sadly the package is missing from the Debian project itself,
170 at least in Debian Bullseye. Two runtime dependencies,
171 <a href=
"https://gitlab.com/kalilinux/packages/python-visvis.git
">python-visvis
</a
>
173 <a href=
"https://gitlab.com/kalilinux/packages/python-rtlsdr.git
">python-rtlsdr
</a
>
174 had to be built and installed separately. Luckily
'<tt
>gbp
175 buildpackage
</tt
>' handled them just fine and no further packages had
176 to be manually built. The end result worked out of the box after
177 installation.
</p
>
179 <p
>My initial scans for FM channels worked just fine, so I knew the
180 scanner was functioning. But when I tried to scan every frequency
181 from
100 to
1000 MHz, the program stopped unexpectedly near the
182 completion. After some debugging I discovered USB software radio I
183 used rejected frequencies above
948 MHz, triggering a unreported
184 exception breaking the scan. Changing the scan to end at
957 worked
185 better. I similarly found the lower limit to be around
15, and ended
186 up with the following full scan:
</p
>
188 <p
><a href=
"https://people.skolelinux.org/pere/blog/images/
2023-
04-
07-radio-freq-scanning.png
"><img src=
"https://people.skolelinux.org/pere/blog/images/
2023-
04-
07-radio-freq-scanning.png
" width=
"100%
"></a
></p
>
190 <p
>Saving the scan did not work, but exporting it as a CSV file worked
191 just fine. I ended up with around
477k CVS lines with dB level for
192 the given frequency.
</p
>
194 <p
>The save failure seem to be a missing UTF-
8 encoding issue in the
195 python code. Will see if I can find time to send a patch
196 <a href=
"https://github.com/CdeMills/RTLSDR-Scanner/
">upstream
</a
>
197 later to fix this exception:
</p
>
200 Traceback (most recent call last):
201 File
"/usr/lib/python3/dist-packages/rtlsdr_scanner/main_window.py
", line
485, in __on_save
202 save_plot(fullName, self.scanInfo, self.spectrum, self.locations)
203 File
"/usr/lib/python3/dist-packages/rtlsdr_scanner/file.py
", line
408, in save_plot
204 handle.write(json.dumps(data, indent=
4))
205 TypeError: a bytes-like object is required, not
'str
'
206 Traceback (most recent call last):
207 File
"/usr/lib/python3/dist-packages/rtlsdr_scanner/main_window.py
", line
485, in __on_save
208 save_plot(fullName, self.scanInfo, self.spectrum, self.locations)
209 File
"/usr/lib/python3/dist-packages/rtlsdr_scanner/file.py
", line
408, in save_plot
210 handle.write(json.dumps(data, indent=
4))
211 TypeError: a bytes-like object is required, not
'str
'
214 <p
>As usual, if you use Bitcoin and want to show your support of my
215 activities, please send Bitcoin donations to my address
216 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>