X-Git-Url: http://pere.pagekite.me/gitweb/homepage.git/blobdiff_plain/a5529e559e1b4ee786d97c9f7c918ef22628b83a..11a4f982e837f227d9e5a04786c536fa31d55998:/blog/index.rss?ds=inline

diff --git a/blog/index.rss b/blog/index.rss
index 4f013d5afd..434ba4114e 100644
--- a/blog/index.rss
+++ b/blog/index.rss
@@ -6,6 +6,480 @@
                 <link>https://people.skolelinux.org/pere/blog/</link>
                 <atom:link href="https://people.skolelinux.org/pere/blog/index.rss" rel="self" type="application/rss+xml" />
 	
+	<item>
+		<title>New and improved sqlcipher in Debian for accessing Signal database</title>
+		<link>https://people.skolelinux.org/pere/blog/New_and_improved_sqlcipher_in_Debian_for_accessing_Signal_database.html</link>
+		<guid isPermaLink="true">https://people.skolelinux.org/pere/blog/New_and_improved_sqlcipher_in_Debian_for_accessing_Signal_database.html</guid>
+                <pubDate>Sun, 12 Nov 2023 12:00:00 +0100</pubDate>
+		<description>&lt;p&gt;For a while now I wanted to have direct access to the
+&lt;a href=&quot;https://signal.org/&quot;&gt;Signal&lt;/a&gt; database of messages and
+channels of my Desktop edition of Signal.  I prefer the enforced end
+to end encryption of Signal these days for my communication with
+friends and family, to increase the level of safety and privacy as
+well as raising the cost of the mass surveillance government and
+non-government entities practice these days.  In August I came across
+a nice
+&lt;a href=&quot;https://www.yoranbrondsema.com/post/the-guide-to-extracting-statistics-from-your-signal-conversations/&quot;&gt;recipe
+on how to use sqlcipher to extract statistics from the Signal
+database&lt;/a&gt; explaining how to do this.  Unfortunately this did not
+work with the version of sqlcipher in Debian.  The
+&lt;a href=&quot;http://tracker.debian.org/sqlcipher/&quot;&gt;sqlcipher&lt;/a&gt;
+package is a &quot;fork&quot; of the sqlite package with added support for
+encrypted databases.  Sadly the current Debian maintainer
+&lt;a href=&quot;https://bugs.debian.org/961598&quot;&gt;announced more than three
+years ago that he did not have time to maintain sqlcipher&lt;/a&gt;, so it
+seemed unlikely to be upgraded by the maintainer.  I was reluctant to
+take on the job myself, as I have very limited experience maintaining
+shared libraries in Debian.  After waiting and hoping for a few
+months, I gave up the last  week, and set out to update the package.  In
+the process I orphaned it to make it more obvious for the next person
+looking at it that the package need proper maintenance.&lt;/p&gt;
+
+&lt;p&gt;The version in Debian was around five years old, and quite a lot of
+changes had taken place upstream into the Debian maintenance git
+repository.  After spending a few days importing the new upstream
+versions, realising that upstream did not care much for SONAME
+versioning as I saw library symbols being both added and removed with
+minor version number changes to the project, I concluded that I had to
+do a SONAME bump of the library package to avoid surprising the
+reverse dependencies.  I even added a simple
+autopkgtest script to ensure the package work as intended.  Dug deep
+into the hole of learning shared library maintenance, I set out a few
+days ago to upload the new version to Debian experimental to see what
+the quality assurance framework in Debian had to say about the result.
+The feedback told me the pacakge was not too shabby, and yesterday I
+uploaded the latest version to Debian unstable.  It should enter
+testing today or tomorrow, perhaps delayed by
+&lt;a href=&quot;https://bugs.debian.org/1055812&quot;&gt;a small library
+transition&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;Armed with a new version of sqlcipher, I can now have a look at the
+SQL database in ~/.config/Signal/sql/db.sqlite.  First, one need to
+fetch the encryption key from the Signal configuration using this
+simple JSON extraction command:&lt;/p&gt;
+
+&lt;pre&gt;/usr/bin/jq -r &#39;.&quot;key&quot;&#39; ~/.config/Signal/config.json&lt;/pre&gt;
+
+&lt;p&gt;Assuming the result from that command is &#39;secretkey&#39;, which is a
+hexadecimal number representing the key used to encrypt the database.
+Next, one can now connect to the database and inject the encryption
+key for access via SQL to fetch information from the database.  Here
+is an example dumping the database structure:&lt;/p&gt;
+
+&lt;pre&gt;
+% sqlcipher ~/.config/Signal/sql/db.sqlite
+sqlite&gt; PRAGMA key = &quot;x&#39;secretkey&#39;&quot;;
+sqlite&gt; .schema
+CREATE TABLE sqlite_stat1(tbl,idx,stat);
+CREATE TABLE conversations(
+      id STRING PRIMARY KEY ASC,
+      json TEXT,
+
+      active_at INTEGER,
+      type STRING,
+      members TEXT,
+      name TEXT,
+      profileName TEXT
+    , profileFamilyName TEXT, profileFullName TEXT, e164 TEXT, serviceId TEXT, groupId TEXT, profileLastFetchedAt INTEGER);
+CREATE TABLE identityKeys(
+      id STRING PRIMARY KEY ASC,
+      json TEXT
+    );
+CREATE TABLE items(
+      id STRING PRIMARY KEY ASC,
+      json TEXT
+    );
+CREATE TABLE sessions(
+      id TEXT PRIMARY KEY,
+      conversationId TEXT,
+      json TEXT
+    , ourServiceId STRING, serviceId STRING);
+CREATE TABLE attachment_downloads(
+    id STRING primary key,
+    timestamp INTEGER,
+    pending INTEGER,
+    json TEXT
+  );
+CREATE TABLE sticker_packs(
+    id TEXT PRIMARY KEY,
+    key TEXT NOT NULL,
+
+    author STRING,
+    coverStickerId INTEGER,
+    createdAt INTEGER,
+    downloadAttempts INTEGER,
+    installedAt INTEGER,
+    lastUsed INTEGER,
+    status STRING,
+    stickerCount INTEGER,
+    title STRING
+  , attemptedStatus STRING, position INTEGER DEFAULT 0 NOT NULL, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync
+      INTEGER DEFAULT 0 NOT NULL);
+CREATE TABLE stickers(
+    id INTEGER NOT NULL,
+    packId TEXT NOT NULL,
+
+    emoji STRING,
+    height INTEGER,
+    isCoverOnly INTEGER,
+    lastUsed INTEGER,
+    path STRING,
+    width INTEGER,
+
+    PRIMARY KEY (id, packId),
+    CONSTRAINT stickers_fk
+      FOREIGN KEY (packId)
+      REFERENCES sticker_packs(id)
+      ON DELETE CASCADE
+  );
+CREATE TABLE sticker_references(
+    messageId STRING,
+    packId TEXT,
+    CONSTRAINT sticker_references_fk
+      FOREIGN KEY(packId)
+      REFERENCES sticker_packs(id)
+      ON DELETE CASCADE
+  );
+CREATE TABLE emojis(
+    shortName TEXT PRIMARY KEY,
+    lastUsage INTEGER
+  );
+CREATE TABLE messages(
+        rowid INTEGER PRIMARY KEY ASC,
+        id STRING UNIQUE,
+        json TEXT,
+        readStatus INTEGER,
+        expires_at INTEGER,
+        sent_at INTEGER,
+        schemaVersion INTEGER,
+        conversationId STRING,
+        received_at INTEGER,
+        source STRING,
+        hasAttachments INTEGER,
+        hasFileAttachments INTEGER,
+        hasVisualMediaAttachments INTEGER,
+        expireTimer INTEGER,
+        expirationStartTimestamp INTEGER,
+        type STRING,
+        body TEXT,
+        messageTimer INTEGER,
+        messageTimerStart INTEGER,
+        messageTimerExpiresAt INTEGER,
+        isErased INTEGER,
+        isViewOnce INTEGER,
+        sourceServiceId TEXT, serverGuid STRING NULL, sourceDevice INTEGER, storyId STRING, isStory INTEGER
+        GENERATED ALWAYS AS (type IS &#39;story&#39;), isChangeCreatedByUs INTEGER NOT NULL DEFAULT 0, isTimerChangeFromSync INTEGER
+        GENERATED ALWAYS AS (
+          json_extract(json, &#39;$.expirationTimerUpdate.fromSync&#39;) IS 1
+        ), seenStatus NUMBER default 0, storyDistributionListId STRING, expiresAt INT
+        GENERATED ALWAYS
+        AS (ifnull(
+          expirationStartTimestamp + (expireTimer * 1000),
+          9007199254740991
+        )), shouldAffectActivity INTEGER
+        GENERATED ALWAYS AS (
+          type IS NULL
+          OR
+          type NOT IN (
+            &#39;change-number-notification&#39;,
+            &#39;contact-removed-notification&#39;,
+            &#39;conversation-merge&#39;,
+            &#39;group-v1-migration&#39;,
+            &#39;keychange&#39;,
+            &#39;message-history-unsynced&#39;,
+            &#39;profile-change&#39;,
+            &#39;story&#39;,
+            &#39;universal-timer-notification&#39;,
+            &#39;verified-change&#39;
+          )
+        ), shouldAffectPreview INTEGER
+        GENERATED ALWAYS AS (
+          type IS NULL
+          OR
+          type NOT IN (
+            &#39;change-number-notification&#39;,
+            &#39;contact-removed-notification&#39;,
+            &#39;conversation-merge&#39;,
+            &#39;group-v1-migration&#39;,
+            &#39;keychange&#39;,
+            &#39;message-history-unsynced&#39;,
+            &#39;profile-change&#39;,
+            &#39;story&#39;,
+            &#39;universal-timer-notification&#39;,
+            &#39;verified-change&#39;
+          )
+        ), isUserInitiatedMessage INTEGER
+        GENERATED ALWAYS AS (
+          type IS NULL
+          OR
+          type NOT IN (
+            &#39;change-number-notification&#39;,
+            &#39;contact-removed-notification&#39;,
+            &#39;conversation-merge&#39;,
+            &#39;group-v1-migration&#39;,
+            &#39;group-v2-change&#39;,
+            &#39;keychange&#39;,
+            &#39;message-history-unsynced&#39;,
+            &#39;profile-change&#39;,
+            &#39;story&#39;,
+            &#39;universal-timer-notification&#39;,
+            &#39;verified-change&#39;
+          )
+        ), mentionsMe INTEGER NOT NULL DEFAULT 0, isGroupLeaveEvent INTEGER
+        GENERATED ALWAYS AS (
+          type IS &#39;group-v2-change&#39; AND
+          json_array_length(json_extract(json, &#39;$.groupV2Change.details&#39;)) IS 1 AND
+          json_extract(json, &#39;$.groupV2Change.details[0].type&#39;) IS &#39;member-remove&#39; AND
+          json_extract(json, &#39;$.groupV2Change.from&#39;) IS NOT NULL AND
+          json_extract(json, &#39;$.groupV2Change.from&#39;) IS json_extract(json, &#39;$.groupV2Change.details[0].aci&#39;)
+        ), isGroupLeaveEventFromOther INTEGER
+        GENERATED ALWAYS AS (
+          isGroupLeaveEvent IS 1
+          AND
+          isChangeCreatedByUs IS 0
+        ), callId TEXT
+        GENERATED ALWAYS AS (
+          json_extract(json, &#39;$.callId&#39;)
+        ));
+CREATE TABLE sqlite_stat4(tbl,idx,neq,nlt,ndlt,sample);
+CREATE TABLE jobs(
+        id TEXT PRIMARY KEY,
+        queueType TEXT STRING NOT NULL,
+        timestamp INTEGER NOT NULL,
+        data STRING TEXT
+      );
+CREATE TABLE reactions(
+        conversationId STRING,
+        emoji STRING,
+        fromId STRING,
+        messageReceivedAt INTEGER,
+        targetAuthorAci STRING,
+        targetTimestamp INTEGER,
+        unread INTEGER
+      , messageId STRING);
+CREATE TABLE senderKeys(
+        id TEXT PRIMARY KEY NOT NULL,
+        senderId TEXT NOT NULL,
+        distributionId TEXT NOT NULL,
+        data BLOB NOT NULL,
+        lastUpdatedDate NUMBER NOT NULL
+      );
+CREATE TABLE unprocessed(
+        id STRING PRIMARY KEY ASC,
+        timestamp INTEGER,
+        version INTEGER,
+        attempts INTEGER,
+        envelope TEXT,
+        decrypted TEXT,
+        source TEXT,
+        serverTimestamp INTEGER,
+        sourceServiceId STRING
+      , serverGuid STRING NULL, sourceDevice INTEGER, receivedAtCounter INTEGER, urgent INTEGER, story INTEGER);
+CREATE TABLE sendLogPayloads(
+        id INTEGER PRIMARY KEY ASC,
+
+        timestamp INTEGER NOT NULL,
+        contentHint INTEGER NOT NULL,
+        proto BLOB NOT NULL
+      , urgent INTEGER, hasPniSignatureMessage INTEGER DEFAULT 0 NOT NULL);
+CREATE TABLE sendLogRecipients(
+        payloadId INTEGER NOT NULL,
+
+        recipientServiceId STRING NOT NULL,
+        deviceId INTEGER NOT NULL,
+
+        PRIMARY KEY (payloadId, recipientServiceId, deviceId),
+
+        CONSTRAINT sendLogRecipientsForeignKey
+          FOREIGN KEY (payloadId)
+          REFERENCES sendLogPayloads(id)
+          ON DELETE CASCADE
+      );
+CREATE TABLE sendLogMessageIds(
+        payloadId INTEGER NOT NULL,
+
+        messageId STRING NOT NULL,
+
+        PRIMARY KEY (payloadId, messageId),
+
+        CONSTRAINT sendLogMessageIdsForeignKey
+          FOREIGN KEY (payloadId)
+          REFERENCES sendLogPayloads(id)
+          ON DELETE CASCADE
+      );
+CREATE TABLE preKeys(
+        id STRING PRIMARY KEY ASC,
+        json TEXT
+      , ourServiceId NUMBER
+        GENERATED ALWAYS AS (json_extract(json, &#39;$.ourServiceId&#39;)));
+CREATE TABLE signedPreKeys(
+        id STRING PRIMARY KEY ASC,
+        json TEXT
+      , ourServiceId NUMBER
+        GENERATED ALWAYS AS (json_extract(json, &#39;$.ourServiceId&#39;)));
+CREATE TABLE badges(
+        id TEXT PRIMARY KEY,
+        category TEXT NOT NULL,
+        name TEXT NOT NULL,
+        descriptionTemplate TEXT NOT NULL
+      );
+CREATE TABLE badgeImageFiles(
+        badgeId TEXT REFERENCES badges(id)
+          ON DELETE CASCADE
+          ON UPDATE CASCADE,
+        &#39;order&#39; INTEGER NOT NULL,
+        url TEXT NOT NULL,
+        localPath TEXT,
+        theme TEXT NOT NULL
+      );
+CREATE TABLE storyReads (
+        authorId STRING NOT NULL,
+        conversationId STRING NOT NULL,
+        storyId STRING NOT NULL,
+        storyReadDate NUMBER NOT NULL,
+
+        PRIMARY KEY (authorId, storyId)
+      );
+CREATE TABLE storyDistributions(
+        id STRING PRIMARY KEY NOT NULL,
+        name TEXT,
+
+        senderKeyInfoJson STRING
+      , deletedAtTimestamp INTEGER, allowsReplies INTEGER, isBlockList INTEGER, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync INTEGER);
+CREATE TABLE storyDistributionMembers(
+        listId STRING NOT NULL REFERENCES storyDistributions(id)
+          ON DELETE CASCADE
+          ON UPDATE CASCADE,
+        serviceId STRING NOT NULL,
+
+        PRIMARY KEY (listId, serviceId)
+      );
+CREATE TABLE uninstalled_sticker_packs (
+        id STRING NOT NULL PRIMARY KEY,
+        uninstalledAt NUMBER NOT NULL,
+        storageID STRING,
+        storageVersion NUMBER,
+        storageUnknownFields BLOB,
+        storageNeedsSync INTEGER NOT NULL
+      );
+CREATE TABLE groupCallRingCancellations(
+        ringId INTEGER PRIMARY KEY,
+        createdAt INTEGER NOT NULL
+      );
+CREATE TABLE IF NOT EXISTS &#39;messages_fts_data&#39;(id INTEGER PRIMARY KEY, block BLOB);
+CREATE TABLE IF NOT EXISTS &#39;messages_fts_idx&#39;(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
+CREATE TABLE IF NOT EXISTS &#39;messages_fts_content&#39;(id INTEGER PRIMARY KEY, c0);
+CREATE TABLE IF NOT EXISTS &#39;messages_fts_docsize&#39;(id INTEGER PRIMARY KEY, sz BLOB);
+CREATE TABLE IF NOT EXISTS &#39;messages_fts_config&#39;(k PRIMARY KEY, v) WITHOUT ROWID;
+CREATE TABLE edited_messages(
+        messageId STRING REFERENCES messages(id)
+          ON DELETE CASCADE,
+        sentAt INTEGER,
+        readStatus INTEGER
+      , conversationId STRING);
+CREATE TABLE mentions (
+        messageId REFERENCES messages(id) ON DELETE CASCADE,
+        mentionAci STRING,
+        start INTEGER,
+        length INTEGER
+      );
+CREATE TABLE kyberPreKeys(
+        id STRING PRIMARY KEY NOT NULL,
+        json TEXT NOT NULL, ourServiceId NUMBER
+        GENERATED ALWAYS AS (json_extract(json, &#39;$.ourServiceId&#39;)));
+CREATE TABLE callsHistory (
+        callId TEXT PRIMARY KEY,
+        peerId TEXT NOT NULL, -- conversation id (legacy) | uuid | groupId | roomId
+        ringerId TEXT DEFAULT NULL, -- ringer uuid
+        mode TEXT NOT NULL, -- enum &quot;Direct&quot; | &quot;Group&quot;
+        type TEXT NOT NULL, -- enum &quot;Audio&quot; | &quot;Video&quot; | &quot;Group&quot;
+        direction TEXT NOT NULL, -- enum &quot;Incoming&quot; | &quot;Outgoing
+        -- Direct: enum &quot;Pending&quot; | &quot;Missed&quot; | &quot;Accepted&quot; | &quot;Deleted&quot;
+        -- Group: enum &quot;GenericGroupCall&quot; | &quot;OutgoingRing&quot; | &quot;Ringing&quot; | &quot;Joined&quot; | &quot;Missed&quot; | &quot;Declined&quot; | &quot;Accepted&quot; | &quot;Deleted&quot;
+        status TEXT NOT NULL,
+        timestamp INTEGER NOT NULL,
+        UNIQUE (callId, peerId) ON CONFLICT FAIL
+      );
+[ dropped all indexes to save space in this blog post ]
+CREATE TRIGGER messages_on_view_once_update AFTER UPDATE ON messages
+      WHEN
+        new.body IS NOT NULL AND new.isViewOnce = 1
+      BEGIN
+        DELETE FROM messages_fts WHERE rowid = old.rowid;
+      END;
+CREATE TRIGGER messages_on_insert AFTER INSERT ON messages
+      WHEN new.isViewOnce IS NOT 1 AND new.storyId IS NULL
+      BEGIN
+        INSERT INTO messages_fts
+          (rowid, body)
+        VALUES
+          (new.rowid, new.body);
+      END;
+CREATE TRIGGER messages_on_delete AFTER DELETE ON messages BEGIN
+        DELETE FROM messages_fts WHERE rowid = old.rowid;
+        DELETE FROM sendLogPayloads WHERE id IN (
+          SELECT payloadId FROM sendLogMessageIds
+          WHERE messageId = old.id
+        );
+        DELETE FROM reactions WHERE rowid IN (
+          SELECT rowid FROM reactions
+          WHERE messageId = old.id
+        );
+        DELETE FROM storyReads WHERE storyId = old.storyId;
+      END;
+CREATE VIRTUAL TABLE messages_fts USING fts5(
+        body,
+        tokenize = &#39;signal_tokenizer&#39;
+      );
+CREATE TRIGGER messages_on_update AFTER UPDATE ON messages
+      WHEN
+        (new.body IS NULL OR old.body IS NOT new.body) AND
+         new.isViewOnce IS NOT 1 AND new.storyId IS NULL
+      BEGIN
+        DELETE FROM messages_fts WHERE rowid = old.rowid;
+        INSERT INTO messages_fts
+          (rowid, body)
+        VALUES
+          (new.rowid, new.body);
+      END;
+CREATE TRIGGER messages_on_insert_insert_mentions AFTER INSERT ON messages
+      BEGIN
+        INSERT INTO mentions (messageId, mentionAci, start, length)
+        
+    SELECT messages.id, bodyRanges.value -&gt;&gt; &#39;mentionAci&#39; as mentionAci,
+      bodyRanges.value -&gt;&gt; &#39;start&#39; as start,
+      bodyRanges.value -&gt;&gt; &#39;length&#39; as length
+    FROM messages, json_each(messages.json -&gt;&gt; &#39;bodyRanges&#39;) as bodyRanges
+    WHERE bodyRanges.value -&gt;&gt; &#39;mentionAci&#39; IS NOT NULL
+  
+        AND messages.id = new.id;
+      END;
+CREATE TRIGGER messages_on_update_update_mentions AFTER UPDATE ON messages
+      BEGIN
+        DELETE FROM mentions WHERE messageId = new.id;
+        INSERT INTO mentions (messageId, mentionAci, start, length)
+        
+    SELECT messages.id, bodyRanges.value -&gt;&gt; &#39;mentionAci&#39; as mentionAci,
+      bodyRanges.value -&gt;&gt; &#39;start&#39; as start,
+      bodyRanges.value -&gt;&gt; &#39;length&#39; as length
+    FROM messages, json_each(messages.json -&gt;&gt; &#39;bodyRanges&#39;) as bodyRanges
+    WHERE bodyRanges.value -&gt;&gt; &#39;mentionAci&#39; IS NOT NULL
+  
+        AND messages.id = new.id;
+      END;
+sqlite&gt;
+&lt;/pre&gt;
+
+&lt;p&gt;Finally I have the tool needed to inspect and process Signal
+messages that I need, without using the vendor provided client.  Now
+on to transforming it to a more useful format.&lt;/p&gt;
+
+&lt;p&gt;As usual, if you use Bitcoin and want to show your support of my
+activities, please send Bitcoin donations to my address
+&lt;b&gt;&lt;a href=&quot;bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&quot;&gt;15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&lt;/a&gt;&lt;/b&gt;.&lt;/p&gt;
+</description>
+	</item>
+	
 	<item>
 		<title>New chrpath release 0.17</title>
 		<link>https://people.skolelinux.org/pere/blog/New_chrpath_release_0_17.html</link>
@@ -472,150 +946,6 @@ Debian.  Not sure how much work it would be to get it working, but
 suspect some kernel related packages need to be extended with more
 header files to get it working.&lt;/p&gt;
 
-&lt;p&gt;As usual, if you use Bitcoin and want to show your support of my
-activities, please send Bitcoin donations to my address
-&lt;b&gt;&lt;a href=&quot;bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&quot;&gt;15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&lt;/a&gt;&lt;/b&gt;.&lt;/p&gt;
-</description>
-	</item>
-	
-	<item>
-		<title>Speech to text, she APTly whispered, how hard can it be?</title>
-		<link>https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html</link>
-		<guid isPermaLink="true">https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html</guid>
-                <pubDate>Sun, 23 Apr 2023 09:40:00 +0200</pubDate>
-		<description>&lt;p&gt;While visiting a convention during Easter, it occurred to me that
-it would be great if I could have a digital Dictaphone with
-transcribing capabilities, providing me with texts to cut-n-paste into
-stuff I need to write.  The background is that long drives often bring
-up the urge to write on texts I am working on, which of course is out
-of the question while driving.  With the release of
-&lt;a href=&quot;https://github.com/openai/whisper/&quot;&gt;OpenAI Whisper&lt;/a&gt;, this
-seem to be within reach with Free Software, so I decided to give it a
-go.  OpenAI Whisper is a Linux based neural network system to read in
-audio files and provide text representation of the speech in that
-audio recording.  It handle multiple languages and according to its
-creators even can translate into a different language than the spoken
-one.  I have not tested the latter feature.  It can either use the CPU
-or a GPU with CUDA support.  As far as I can tell, CUDA in practice
-limit that feature to NVidia graphics cards.  I have few of those, as
-they do not work great with free software drivers, and have not tested
-the GPU option.  While looking into the matter, I did discover some
-work to provide CUDA support on non-NVidia GPUs, and some work with
-the library used by Whisper to port it to other GPUs, but have not
-spent much time looking into GPU support yet.  I&#39;ve so far used an old
-X220 laptop as my test machine, and only transcribed using its
-CPU.&lt;/p&gt;
-
-&lt;p&gt;As it from a privacy standpoint is unthinkable to use computers
-under control of someone else (aka a &quot;cloud&quot; service) to transcribe
-ones thoughts and personal notes, I want to run the transcribing
-system locally on my own computers.  The only sensible approach to me
-is to make the effort I put into this available for any Linux user and
-to upload the needed packages into Debian.  Looking at Debian Bookworm, I
-discovered that only three packages were missing,
-&lt;a href=&quot;https://bugs.debian.org/1034307&quot;&gt;tiktoken&lt;/a&gt;,
-&lt;a href=&quot;https://bugs.debian.org/1034144&quot;&gt;triton&lt;/a&gt;, and
-&lt;a href=&quot;https://bugs.debian.org/1034091&quot;&gt;openai-whisper&lt;/a&gt;.  For a while
-I also believed
-&lt;a href=&quot;https://bugs.debian.org/1034286&quot;&gt;ffmpeg-python&lt;/a&gt; was
-needed, but as its
-&lt;a href=&quot;https://github.com/kkroening/ffmpeg-python/issues/760&quot;&gt;upstream
-seem to have vanished&lt;/a&gt; I found it safer
-&lt;a href=&quot;https://github.com/openai/whisper/pull/1242&quot;&gt;to rewrite
-whisper&lt;/a&gt; to stop depending on in than to introduce ffmpeg-python
-into Debian.  I decided to place these packages under the umbrella of
-&lt;a href=&quot;https://salsa.debian.org/deeplearning-team&quot;&gt;the Debian Deep
-Learning Team&lt;/a&gt;, which seem like the best team to look after such
-packages.  Discussing the topic within the group also made me aware
-that the triton package was already a future dependency of newer
-versions of the torch package being planned, and would be needed after
-Bookworm is released.&lt;/p&gt;
-
-&lt;p&gt;All required code packages have been now waiting in
-&lt;a href=&quot;https://ftp-master.debian.org/new.html&quot;&gt;the Debian NEW
-queue&lt;/a&gt; since Wednesday, heading for Debian Experimental until
-Bookworm is released.  An unsolved issue is how to handle the neural
-network models used by Whisper.  The default behaviour of Whisper is
-to require Internet connectivity and download the model requested to
-&lt;tt&gt;~/.cache/whisper/&lt;/tt&gt; on first invocation.  This obviously would
-fail &lt;a href=&quot;https://people.debian.org/~bap/dfsg-faq.html&quot;&gt;the
-deserted island test of free software&lt;/a&gt; as the Debian packages would
-be unusable for someone stranded with only the Debian archive and solar
-powered computer on a deserted island.&lt;/p&gt;
-
-&lt;p&gt;Because of this, I would love to include the models in the Debian
-mirror system.  This is problematic, as the models are very large
-files, which would put a heavy strain on the Debian mirror
-infrastructure around the globe.  The strain would be even higher if
-the models change often, which luckily as far as I can tell they do
-not.  The small model, which according to its creator is most useful
-for English and in my experience is not doing a great job there
-either, is 462 MiB (deb is 414 MiB).  The medium model, which to me
-seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB)
-and the large model is 2.9 GiB (deb is 2.6 GiB).  I would assume
-everyone with enough resources would prefer to use the large model for
-highest quality.  I believe the models themselves would have to go
-into the non-free part of the Debian archive, as they are not really
-including any useful source code for updating the models.  The
-&quot;source&quot;, aka the model training set, according to the creators
-consist of &quot;680,000 hours of multilingual and multitask supervised
-data collected from the web&quot;, which to me reads material with both
-unknown copyright terms, unavailable to the general public.  In other
-words, the source is not available according to the Debian Free
-Software Guidelines and the model should be considered non-free.&lt;/p&gt;
-
-&lt;p&gt;I asked the Debian FTP masters for advice regarding uploading a
-model package on their IRC channel, and based on the feedback there it
-is still unclear to me if such package would be accepted into the
-archive.  In any case I wrote build rules for a
-&lt;a href=&quot;https://salsa.debian.org/deeplearning-team/openai-whisper-model&quot;&gt;OpenAI
-Whisper model package&lt;/a&gt; and
-&lt;a href=&quot;https://github.com/openai/whisper/pull/1257&quot;&gt;modified the
-Whisper code base&lt;/a&gt; to prefer shared files under &lt;tt&gt;/usr/&lt;/tt&gt; and
-&lt;tt&gt;/var/&lt;/tt&gt; over user specific files in &lt;tt&gt;~/.cache/whisper/&lt;/tt&gt;
-to be able to use these model packages, to prepare for such
-possibility.  One solution might be to include only one of the models
-(small or medium, I guess) in the Debian archive, and ask people to
-download the others from the Internet.  Not quite sure what to do
-here, and advice is most welcome (use the debian-ai mailing list).&lt;/p&gt;
-
-&lt;p&gt;To make it easier to test the new packages while I wait for them to
-clear the NEW queue, I created an APT source targeting bookworm.  I
-selected Bookworm instead of Bullseye, even though I know the latter
-would reach more users, is that some of the required dependencies are
-missing from Bullseye and I during this phase of testing did not want
-to backport a lot of packages just to get up and running.&lt;/p&gt;
-
-&lt;p&gt;Here is a recipe to run as user root if you want to test OpenAI
-Whisper using Debian packages on your Debian Bookworm installation,
-first adding the APT repository GPG key to the list of trusted keys,
-then setting up the APT repository and finally installing the packages
-and one of the models:&lt;/p&gt;
-
-&lt;p&gt;&lt;pre&gt;
-curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \
-  -o /etc/apt/trusted.gpg.d/pere-whisper.asc
-mkdir -p /etc/apt/sources.list.d
-cat &gt; /etc/apt/sources.list.d/pere-whisper.list &amp;lt;&amp;lt;EOF
-deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
-deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
-EOF
-apt update
-apt install openai-whisper
-&lt;/pre&gt;&lt;/p&gt;
-
-&lt;p&gt;The package work for me, but have not yet been tested on any other
-computer than my own.  With it, I have been able to (badly) transcribe
-a 2 minute 40 second Norwegian audio clip to test using the small
-model.  This took 11 minutes and around 2.2 GiB of RAM.  Transcribing
-the same file with the medium model gave a accurate text in 77 minutes
-using around 5.2 GiB of RAM.  My test machine had too little memory to
-test the large model, which I believe require 11 GiB of RAM.  In
-short, this now work for me using Debian packages, and I hope it will
-for you and everyone else once the packages enter Debian.&lt;/p&gt;
-
-&lt;p&gt;Now I can start on the audio recording part of this project.&lt;/p&gt;
- 
 &lt;p&gt;As usual, if you use Bitcoin and want to show your support of my
 activities, please send Bitcoin donations to my address
 &lt;b&gt;&lt;a href=&quot;bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&quot;&gt;15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&lt;/a&gt;&lt;/b&gt;.&lt;/p&gt;