X-Git-Url: http://pere.pagekite.me/gitweb/homepage.git/blobdiff_plain/a5529e559e1b4ee786d97c9f7c918ef22628b83a..11a4f982e837f227d9e5a04786c536fa31d55998:/blog/index.rss?ds=inline
diff --git a/blog/index.rss b/blog/index.rss
index 4f013d5afd..434ba4114e 100644
--- a/blog/index.rss
+++ b/blog/index.rss
@@ -6,6 +6,480 @@
https://people.skolelinux.org/pere/blog/
+
+ New and improved sqlcipher in Debian for accessing Signal database
+ https://people.skolelinux.org/pere/blog/New_and_improved_sqlcipher_in_Debian_for_accessing_Signal_database.html
+ https://people.skolelinux.org/pere/blog/New_and_improved_sqlcipher_in_Debian_for_accessing_Signal_database.html
+ Sun, 12 Nov 2023 12:00:00 +0100
+ <p>For a while now I wanted to have direct access to the
+<a href="https://signal.org/">Signal</a> database of messages and
+channels of my Desktop edition of Signal. I prefer the enforced end
+to end encryption of Signal these days for my communication with
+friends and family, to increase the level of safety and privacy as
+well as raising the cost of the mass surveillance government and
+non-government entities practice these days. In August I came across
+a nice
+<a href="https://www.yoranbrondsema.com/post/the-guide-to-extracting-statistics-from-your-signal-conversations/">recipe
+on how to use sqlcipher to extract statistics from the Signal
+database</a> explaining how to do this. Unfortunately this did not
+work with the version of sqlcipher in Debian. The
+<a href="http://tracker.debian.org/sqlcipher/">sqlcipher</a>
+package is a "fork" of the sqlite package with added support for
+encrypted databases. Sadly the current Debian maintainer
+<a href="https://bugs.debian.org/961598">announced more than three
+years ago that he did not have time to maintain sqlcipher</a>, so it
+seemed unlikely to be upgraded by the maintainer. I was reluctant to
+take on the job myself, as I have very limited experience maintaining
+shared libraries in Debian. After waiting and hoping for a few
+months, I gave up the last week, and set out to update the package. In
+the process I orphaned it to make it more obvious for the next person
+looking at it that the package need proper maintenance.</p>
+
+<p>The version in Debian was around five years old, and quite a lot of
+changes had taken place upstream into the Debian maintenance git
+repository. After spending a few days importing the new upstream
+versions, realising that upstream did not care much for SONAME
+versioning as I saw library symbols being both added and removed with
+minor version number changes to the project, I concluded that I had to
+do a SONAME bump of the library package to avoid surprising the
+reverse dependencies. I even added a simple
+autopkgtest script to ensure the package work as intended. Dug deep
+into the hole of learning shared library maintenance, I set out a few
+days ago to upload the new version to Debian experimental to see what
+the quality assurance framework in Debian had to say about the result.
+The feedback told me the pacakge was not too shabby, and yesterday I
+uploaded the latest version to Debian unstable. It should enter
+testing today or tomorrow, perhaps delayed by
+<a href="https://bugs.debian.org/1055812">a small library
+transition</a>.</p>
+
+<p>Armed with a new version of sqlcipher, I can now have a look at the
+SQL database in ~/.config/Signal/sql/db.sqlite. First, one need to
+fetch the encryption key from the Signal configuration using this
+simple JSON extraction command:</p>
+
+<pre>/usr/bin/jq -r '."key"' ~/.config/Signal/config.json</pre>
+
+<p>Assuming the result from that command is 'secretkey', which is a
+hexadecimal number representing the key used to encrypt the database.
+Next, one can now connect to the database and inject the encryption
+key for access via SQL to fetch information from the database. Here
+is an example dumping the database structure:</p>
+
+<pre>
+% sqlcipher ~/.config/Signal/sql/db.sqlite
+sqlite> PRAGMA key = "x'secretkey'";
+sqlite> .schema
+CREATE TABLE sqlite_stat1(tbl,idx,stat);
+CREATE TABLE conversations(
+ id STRING PRIMARY KEY ASC,
+ json TEXT,
+
+ active_at INTEGER,
+ type STRING,
+ members TEXT,
+ name TEXT,
+ profileName TEXT
+ , profileFamilyName TEXT, profileFullName TEXT, e164 TEXT, serviceId TEXT, groupId TEXT, profileLastFetchedAt INTEGER);
+CREATE TABLE identityKeys(
+ id STRING PRIMARY KEY ASC,
+ json TEXT
+ );
+CREATE TABLE items(
+ id STRING PRIMARY KEY ASC,
+ json TEXT
+ );
+CREATE TABLE sessions(
+ id TEXT PRIMARY KEY,
+ conversationId TEXT,
+ json TEXT
+ , ourServiceId STRING, serviceId STRING);
+CREATE TABLE attachment_downloads(
+ id STRING primary key,
+ timestamp INTEGER,
+ pending INTEGER,
+ json TEXT
+ );
+CREATE TABLE sticker_packs(
+ id TEXT PRIMARY KEY,
+ key TEXT NOT NULL,
+
+ author STRING,
+ coverStickerId INTEGER,
+ createdAt INTEGER,
+ downloadAttempts INTEGER,
+ installedAt INTEGER,
+ lastUsed INTEGER,
+ status STRING,
+ stickerCount INTEGER,
+ title STRING
+ , attemptedStatus STRING, position INTEGER DEFAULT 0 NOT NULL, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync
+ INTEGER DEFAULT 0 NOT NULL);
+CREATE TABLE stickers(
+ id INTEGER NOT NULL,
+ packId TEXT NOT NULL,
+
+ emoji STRING,
+ height INTEGER,
+ isCoverOnly INTEGER,
+ lastUsed INTEGER,
+ path STRING,
+ width INTEGER,
+
+ PRIMARY KEY (id, packId),
+ CONSTRAINT stickers_fk
+ FOREIGN KEY (packId)
+ REFERENCES sticker_packs(id)
+ ON DELETE CASCADE
+ );
+CREATE TABLE sticker_references(
+ messageId STRING,
+ packId TEXT,
+ CONSTRAINT sticker_references_fk
+ FOREIGN KEY(packId)
+ REFERENCES sticker_packs(id)
+ ON DELETE CASCADE
+ );
+CREATE TABLE emojis(
+ shortName TEXT PRIMARY KEY,
+ lastUsage INTEGER
+ );
+CREATE TABLE messages(
+ rowid INTEGER PRIMARY KEY ASC,
+ id STRING UNIQUE,
+ json TEXT,
+ readStatus INTEGER,
+ expires_at INTEGER,
+ sent_at INTEGER,
+ schemaVersion INTEGER,
+ conversationId STRING,
+ received_at INTEGER,
+ source STRING,
+ hasAttachments INTEGER,
+ hasFileAttachments INTEGER,
+ hasVisualMediaAttachments INTEGER,
+ expireTimer INTEGER,
+ expirationStartTimestamp INTEGER,
+ type STRING,
+ body TEXT,
+ messageTimer INTEGER,
+ messageTimerStart INTEGER,
+ messageTimerExpiresAt INTEGER,
+ isErased INTEGER,
+ isViewOnce INTEGER,
+ sourceServiceId TEXT, serverGuid STRING NULL, sourceDevice INTEGER, storyId STRING, isStory INTEGER
+ GENERATED ALWAYS AS (type IS 'story'), isChangeCreatedByUs INTEGER NOT NULL DEFAULT 0, isTimerChangeFromSync INTEGER
+ GENERATED ALWAYS AS (
+ json_extract(json, '$.expirationTimerUpdate.fromSync') IS 1
+ ), seenStatus NUMBER default 0, storyDistributionListId STRING, expiresAt INT
+ GENERATED ALWAYS
+ AS (ifnull(
+ expirationStartTimestamp + (expireTimer * 1000),
+ 9007199254740991
+ )), shouldAffectActivity INTEGER
+ GENERATED ALWAYS AS (
+ type IS NULL
+ OR
+ type NOT IN (
+ 'change-number-notification',
+ 'contact-removed-notification',
+ 'conversation-merge',
+ 'group-v1-migration',
+ 'keychange',
+ 'message-history-unsynced',
+ 'profile-change',
+ 'story',
+ 'universal-timer-notification',
+ 'verified-change'
+ )
+ ), shouldAffectPreview INTEGER
+ GENERATED ALWAYS AS (
+ type IS NULL
+ OR
+ type NOT IN (
+ 'change-number-notification',
+ 'contact-removed-notification',
+ 'conversation-merge',
+ 'group-v1-migration',
+ 'keychange',
+ 'message-history-unsynced',
+ 'profile-change',
+ 'story',
+ 'universal-timer-notification',
+ 'verified-change'
+ )
+ ), isUserInitiatedMessage INTEGER
+ GENERATED ALWAYS AS (
+ type IS NULL
+ OR
+ type NOT IN (
+ 'change-number-notification',
+ 'contact-removed-notification',
+ 'conversation-merge',
+ 'group-v1-migration',
+ 'group-v2-change',
+ 'keychange',
+ 'message-history-unsynced',
+ 'profile-change',
+ 'story',
+ 'universal-timer-notification',
+ 'verified-change'
+ )
+ ), mentionsMe INTEGER NOT NULL DEFAULT 0, isGroupLeaveEvent INTEGER
+ GENERATED ALWAYS AS (
+ type IS 'group-v2-change' AND
+ json_array_length(json_extract(json, '$.groupV2Change.details')) IS 1 AND
+ json_extract(json, '$.groupV2Change.details[0].type') IS 'member-remove' AND
+ json_extract(json, '$.groupV2Change.from') IS NOT NULL AND
+ json_extract(json, '$.groupV2Change.from') IS json_extract(json, '$.groupV2Change.details[0].aci')
+ ), isGroupLeaveEventFromOther INTEGER
+ GENERATED ALWAYS AS (
+ isGroupLeaveEvent IS 1
+ AND
+ isChangeCreatedByUs IS 0
+ ), callId TEXT
+ GENERATED ALWAYS AS (
+ json_extract(json, '$.callId')
+ ));
+CREATE TABLE sqlite_stat4(tbl,idx,neq,nlt,ndlt,sample);
+CREATE TABLE jobs(
+ id TEXT PRIMARY KEY,
+ queueType TEXT STRING NOT NULL,
+ timestamp INTEGER NOT NULL,
+ data STRING TEXT
+ );
+CREATE TABLE reactions(
+ conversationId STRING,
+ emoji STRING,
+ fromId STRING,
+ messageReceivedAt INTEGER,
+ targetAuthorAci STRING,
+ targetTimestamp INTEGER,
+ unread INTEGER
+ , messageId STRING);
+CREATE TABLE senderKeys(
+ id TEXT PRIMARY KEY NOT NULL,
+ senderId TEXT NOT NULL,
+ distributionId TEXT NOT NULL,
+ data BLOB NOT NULL,
+ lastUpdatedDate NUMBER NOT NULL
+ );
+CREATE TABLE unprocessed(
+ id STRING PRIMARY KEY ASC,
+ timestamp INTEGER,
+ version INTEGER,
+ attempts INTEGER,
+ envelope TEXT,
+ decrypted TEXT,
+ source TEXT,
+ serverTimestamp INTEGER,
+ sourceServiceId STRING
+ , serverGuid STRING NULL, sourceDevice INTEGER, receivedAtCounter INTEGER, urgent INTEGER, story INTEGER);
+CREATE TABLE sendLogPayloads(
+ id INTEGER PRIMARY KEY ASC,
+
+ timestamp INTEGER NOT NULL,
+ contentHint INTEGER NOT NULL,
+ proto BLOB NOT NULL
+ , urgent INTEGER, hasPniSignatureMessage INTEGER DEFAULT 0 NOT NULL);
+CREATE TABLE sendLogRecipients(
+ payloadId INTEGER NOT NULL,
+
+ recipientServiceId STRING NOT NULL,
+ deviceId INTEGER NOT NULL,
+
+ PRIMARY KEY (payloadId, recipientServiceId, deviceId),
+
+ CONSTRAINT sendLogRecipientsForeignKey
+ FOREIGN KEY (payloadId)
+ REFERENCES sendLogPayloads(id)
+ ON DELETE CASCADE
+ );
+CREATE TABLE sendLogMessageIds(
+ payloadId INTEGER NOT NULL,
+
+ messageId STRING NOT NULL,
+
+ PRIMARY KEY (payloadId, messageId),
+
+ CONSTRAINT sendLogMessageIdsForeignKey
+ FOREIGN KEY (payloadId)
+ REFERENCES sendLogPayloads(id)
+ ON DELETE CASCADE
+ );
+CREATE TABLE preKeys(
+ id STRING PRIMARY KEY ASC,
+ json TEXT
+ , ourServiceId NUMBER
+ GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
+CREATE TABLE signedPreKeys(
+ id STRING PRIMARY KEY ASC,
+ json TEXT
+ , ourServiceId NUMBER
+ GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
+CREATE TABLE badges(
+ id TEXT PRIMARY KEY,
+ category TEXT NOT NULL,
+ name TEXT NOT NULL,
+ descriptionTemplate TEXT NOT NULL
+ );
+CREATE TABLE badgeImageFiles(
+ badgeId TEXT REFERENCES badges(id)
+ ON DELETE CASCADE
+ ON UPDATE CASCADE,
+ 'order' INTEGER NOT NULL,
+ url TEXT NOT NULL,
+ localPath TEXT,
+ theme TEXT NOT NULL
+ );
+CREATE TABLE storyReads (
+ authorId STRING NOT NULL,
+ conversationId STRING NOT NULL,
+ storyId STRING NOT NULL,
+ storyReadDate NUMBER NOT NULL,
+
+ PRIMARY KEY (authorId, storyId)
+ );
+CREATE TABLE storyDistributions(
+ id STRING PRIMARY KEY NOT NULL,
+ name TEXT,
+
+ senderKeyInfoJson STRING
+ , deletedAtTimestamp INTEGER, allowsReplies INTEGER, isBlockList INTEGER, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync INTEGER);
+CREATE TABLE storyDistributionMembers(
+ listId STRING NOT NULL REFERENCES storyDistributions(id)
+ ON DELETE CASCADE
+ ON UPDATE CASCADE,
+ serviceId STRING NOT NULL,
+
+ PRIMARY KEY (listId, serviceId)
+ );
+CREATE TABLE uninstalled_sticker_packs (
+ id STRING NOT NULL PRIMARY KEY,
+ uninstalledAt NUMBER NOT NULL,
+ storageID STRING,
+ storageVersion NUMBER,
+ storageUnknownFields BLOB,
+ storageNeedsSync INTEGER NOT NULL
+ );
+CREATE TABLE groupCallRingCancellations(
+ ringId INTEGER PRIMARY KEY,
+ createdAt INTEGER NOT NULL
+ );
+CREATE TABLE IF NOT EXISTS 'messages_fts_data'(id INTEGER PRIMARY KEY, block BLOB);
+CREATE TABLE IF NOT EXISTS 'messages_fts_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
+CREATE TABLE IF NOT EXISTS 'messages_fts_content'(id INTEGER PRIMARY KEY, c0);
+CREATE TABLE IF NOT EXISTS 'messages_fts_docsize'(id INTEGER PRIMARY KEY, sz BLOB);
+CREATE TABLE IF NOT EXISTS 'messages_fts_config'(k PRIMARY KEY, v) WITHOUT ROWID;
+CREATE TABLE edited_messages(
+ messageId STRING REFERENCES messages(id)
+ ON DELETE CASCADE,
+ sentAt INTEGER,
+ readStatus INTEGER
+ , conversationId STRING);
+CREATE TABLE mentions (
+ messageId REFERENCES messages(id) ON DELETE CASCADE,
+ mentionAci STRING,
+ start INTEGER,
+ length INTEGER
+ );
+CREATE TABLE kyberPreKeys(
+ id STRING PRIMARY KEY NOT NULL,
+ json TEXT NOT NULL, ourServiceId NUMBER
+ GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
+CREATE TABLE callsHistory (
+ callId TEXT PRIMARY KEY,
+ peerId TEXT NOT NULL, -- conversation id (legacy) | uuid | groupId | roomId
+ ringerId TEXT DEFAULT NULL, -- ringer uuid
+ mode TEXT NOT NULL, -- enum "Direct" | "Group"
+ type TEXT NOT NULL, -- enum "Audio" | "Video" | "Group"
+ direction TEXT NOT NULL, -- enum "Incoming" | "Outgoing
+ -- Direct: enum "Pending" | "Missed" | "Accepted" | "Deleted"
+ -- Group: enum "GenericGroupCall" | "OutgoingRing" | "Ringing" | "Joined" | "Missed" | "Declined" | "Accepted" | "Deleted"
+ status TEXT NOT NULL,
+ timestamp INTEGER NOT NULL,
+ UNIQUE (callId, peerId) ON CONFLICT FAIL
+ );
+[ dropped all indexes to save space in this blog post ]
+CREATE TRIGGER messages_on_view_once_update AFTER UPDATE ON messages
+ WHEN
+ new.body IS NOT NULL AND new.isViewOnce = 1
+ BEGIN
+ DELETE FROM messages_fts WHERE rowid = old.rowid;
+ END;
+CREATE TRIGGER messages_on_insert AFTER INSERT ON messages
+ WHEN new.isViewOnce IS NOT 1 AND new.storyId IS NULL
+ BEGIN
+ INSERT INTO messages_fts
+ (rowid, body)
+ VALUES
+ (new.rowid, new.body);
+ END;
+CREATE TRIGGER messages_on_delete AFTER DELETE ON messages BEGIN
+ DELETE FROM messages_fts WHERE rowid = old.rowid;
+ DELETE FROM sendLogPayloads WHERE id IN (
+ SELECT payloadId FROM sendLogMessageIds
+ WHERE messageId = old.id
+ );
+ DELETE FROM reactions WHERE rowid IN (
+ SELECT rowid FROM reactions
+ WHERE messageId = old.id
+ );
+ DELETE FROM storyReads WHERE storyId = old.storyId;
+ END;
+CREATE VIRTUAL TABLE messages_fts USING fts5(
+ body,
+ tokenize = 'signal_tokenizer'
+ );
+CREATE TRIGGER messages_on_update AFTER UPDATE ON messages
+ WHEN
+ (new.body IS NULL OR old.body IS NOT new.body) AND
+ new.isViewOnce IS NOT 1 AND new.storyId IS NULL
+ BEGIN
+ DELETE FROM messages_fts WHERE rowid = old.rowid;
+ INSERT INTO messages_fts
+ (rowid, body)
+ VALUES
+ (new.rowid, new.body);
+ END;
+CREATE TRIGGER messages_on_insert_insert_mentions AFTER INSERT ON messages
+ BEGIN
+ INSERT INTO mentions (messageId, mentionAci, start, length)
+
+ SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci,
+ bodyRanges.value ->> 'start' as start,
+ bodyRanges.value ->> 'length' as length
+ FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges
+ WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL
+
+ AND messages.id = new.id;
+ END;
+CREATE TRIGGER messages_on_update_update_mentions AFTER UPDATE ON messages
+ BEGIN
+ DELETE FROM mentions WHERE messageId = new.id;
+ INSERT INTO mentions (messageId, mentionAci, start, length)
+
+ SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci,
+ bodyRanges.value ->> 'start' as start,
+ bodyRanges.value ->> 'length' as length
+ FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges
+ WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL
+
+ AND messages.id = new.id;
+ END;
+sqlite>
+</pre>
+
+<p>Finally I have the tool needed to inspect and process Signal
+messages that I need, without using the vendor provided client. Now
+on to transforming it to a more useful format.</p>
+
+<p>As usual, if you use Bitcoin and want to show your support of my
+activities, please send Bitcoin donations to my address
+<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>
+
+
+
New chrpath release 0.17
https://people.skolelinux.org/pere/blog/New_chrpath_release_0_17.html
@@ -472,150 +946,6 @@ Debian. Not sure how much work it would be to get it working, but
suspect some kernel related packages need to be extended with more
header files to get it working.</p>
-<p>As usual, if you use Bitcoin and want to show your support of my
-activities, please send Bitcoin donations to my address
-<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>
-
-
-
-
- Speech to text, she APTly whispered, how hard can it be?
- https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html
- https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html
- Sun, 23 Apr 2023 09:40:00 +0200
- <p>While visiting a convention during Easter, it occurred to me that
-it would be great if I could have a digital Dictaphone with
-transcribing capabilities, providing me with texts to cut-n-paste into
-stuff I need to write. The background is that long drives often bring
-up the urge to write on texts I am working on, which of course is out
-of the question while driving. With the release of
-<a href="https://github.com/openai/whisper/">OpenAI Whisper</a>, this
-seem to be within reach with Free Software, so I decided to give it a
-go. OpenAI Whisper is a Linux based neural network system to read in
-audio files and provide text representation of the speech in that
-audio recording. It handle multiple languages and according to its
-creators even can translate into a different language than the spoken
-one. I have not tested the latter feature. It can either use the CPU
-or a GPU with CUDA support. As far as I can tell, CUDA in practice
-limit that feature to NVidia graphics cards. I have few of those, as
-they do not work great with free software drivers, and have not tested
-the GPU option. While looking into the matter, I did discover some
-work to provide CUDA support on non-NVidia GPUs, and some work with
-the library used by Whisper to port it to other GPUs, but have not
-spent much time looking into GPU support yet. I've so far used an old
-X220 laptop as my test machine, and only transcribed using its
-CPU.</p>
-
-<p>As it from a privacy standpoint is unthinkable to use computers
-under control of someone else (aka a "cloud" service) to transcribe
-ones thoughts and personal notes, I want to run the transcribing
-system locally on my own computers. The only sensible approach to me
-is to make the effort I put into this available for any Linux user and
-to upload the needed packages into Debian. Looking at Debian Bookworm, I
-discovered that only three packages were missing,
-<a href="https://bugs.debian.org/1034307">tiktoken</a>,
-<a href="https://bugs.debian.org/1034144">triton</a>, and
-<a href="https://bugs.debian.org/1034091">openai-whisper</a>. For a while
-I also believed
-<a href="https://bugs.debian.org/1034286">ffmpeg-python</a> was
-needed, but as its
-<a href="https://github.com/kkroening/ffmpeg-python/issues/760">upstream
-seem to have vanished</a> I found it safer
-<a href="https://github.com/openai/whisper/pull/1242">to rewrite
-whisper</a> to stop depending on in than to introduce ffmpeg-python
-into Debian. I decided to place these packages under the umbrella of
-<a href="https://salsa.debian.org/deeplearning-team">the Debian Deep
-Learning Team</a>, which seem like the best team to look after such
-packages. Discussing the topic within the group also made me aware
-that the triton package was already a future dependency of newer
-versions of the torch package being planned, and would be needed after
-Bookworm is released.</p>
-
-<p>All required code packages have been now waiting in
-<a href="https://ftp-master.debian.org/new.html">the Debian NEW
-queue</a> since Wednesday, heading for Debian Experimental until
-Bookworm is released. An unsolved issue is how to handle the neural
-network models used by Whisper. The default behaviour of Whisper is
-to require Internet connectivity and download the model requested to
-<tt>~/.cache/whisper/</tt> on first invocation. This obviously would
-fail <a href="https://people.debian.org/~bap/dfsg-faq.html">the
-deserted island test of free software</a> as the Debian packages would
-be unusable for someone stranded with only the Debian archive and solar
-powered computer on a deserted island.</p>
-
-<p>Because of this, I would love to include the models in the Debian
-mirror system. This is problematic, as the models are very large
-files, which would put a heavy strain on the Debian mirror
-infrastructure around the globe. The strain would be even higher if
-the models change often, which luckily as far as I can tell they do
-not. The small model, which according to its creator is most useful
-for English and in my experience is not doing a great job there
-either, is 462 MiB (deb is 414 MiB). The medium model, which to me
-seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB)
-and the large model is 2.9 GiB (deb is 2.6 GiB). I would assume
-everyone with enough resources would prefer to use the large model for
-highest quality. I believe the models themselves would have to go
-into the non-free part of the Debian archive, as they are not really
-including any useful source code for updating the models. The
-"source", aka the model training set, according to the creators
-consist of "680,000 hours of multilingual and multitask supervised
-data collected from the web", which to me reads material with both
-unknown copyright terms, unavailable to the general public. In other
-words, the source is not available according to the Debian Free
-Software Guidelines and the model should be considered non-free.</p>
-
-<p>I asked the Debian FTP masters for advice regarding uploading a
-model package on their IRC channel, and based on the feedback there it
-is still unclear to me if such package would be accepted into the
-archive. In any case I wrote build rules for a
-<a href="https://salsa.debian.org/deeplearning-team/openai-whisper-model">OpenAI
-Whisper model package</a> and
-<a href="https://github.com/openai/whisper/pull/1257">modified the
-Whisper code base</a> to prefer shared files under <tt>/usr/</tt> and
-<tt>/var/</tt> over user specific files in <tt>~/.cache/whisper/</tt>
-to be able to use these model packages, to prepare for such
-possibility. One solution might be to include only one of the models
-(small or medium, I guess) in the Debian archive, and ask people to
-download the others from the Internet. Not quite sure what to do
-here, and advice is most welcome (use the debian-ai mailing list).</p>
-
-<p>To make it easier to test the new packages while I wait for them to
-clear the NEW queue, I created an APT source targeting bookworm. I
-selected Bookworm instead of Bullseye, even though I know the latter
-would reach more users, is that some of the required dependencies are
-missing from Bullseye and I during this phase of testing did not want
-to backport a lot of packages just to get up and running.</p>
-
-<p>Here is a recipe to run as user root if you want to test OpenAI
-Whisper using Debian packages on your Debian Bookworm installation,
-first adding the APT repository GPG key to the list of trusted keys,
-then setting up the APT repository and finally installing the packages
-and one of the models:</p>
-
-<p><pre>
-curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \
- -o /etc/apt/trusted.gpg.d/pere-whisper.asc
-mkdir -p /etc/apt/sources.list.d
-cat > /etc/apt/sources.list.d/pere-whisper.list <<EOF
-deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
-deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
-EOF
-apt update
-apt install openai-whisper
-</pre></p>
-
-<p>The package work for me, but have not yet been tested on any other
-computer than my own. With it, I have been able to (badly) transcribe
-a 2 minute 40 second Norwegian audio clip to test using the small
-model. This took 11 minutes and around 2.2 GiB of RAM. Transcribing
-the same file with the medium model gave a accurate text in 77 minutes
-using around 5.2 GiB of RAM. My test machine had too little memory to
-test the large model, which I believe require 11 GiB of RAM. In
-short, this now work for me using Debian packages, and I hope it will
-for you and everyone else once the packages enter Debian.</p>
-
-<p>Now I can start on the audio recording part of this project.</p>
-
<p>As usual, if you use Bitcoin and want to show your support of my
activities, please send Bitcoin donations to my address
<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>