X-Git-Url: http://pere.pagekite.me/gitweb/homepage.git/blobdiff_plain/a5529e559e1b4ee786d97c9f7c918ef22628b83a..11a4f982e837f227d9e5a04786c536fa31d55998:/blog/index.rss?ds=inline diff --git a/blog/index.rss b/blog/index.rss index 4f013d5afd..434ba4114e 100644 --- a/blog/index.rss +++ b/blog/index.rss @@ -6,6 +6,480 @@ https://people.skolelinux.org/pere/blog/ + + New and improved sqlcipher in Debian for accessing Signal database + https://people.skolelinux.org/pere/blog/New_and_improved_sqlcipher_in_Debian_for_accessing_Signal_database.html + https://people.skolelinux.org/pere/blog/New_and_improved_sqlcipher_in_Debian_for_accessing_Signal_database.html + Sun, 12 Nov 2023 12:00:00 +0100 + <p>For a while now I wanted to have direct access to the +<a href="https://signal.org/">Signal</a> database of messages and +channels of my Desktop edition of Signal. I prefer the enforced end +to end encryption of Signal these days for my communication with +friends and family, to increase the level of safety and privacy as +well as raising the cost of the mass surveillance government and +non-government entities practice these days. In August I came across +a nice +<a href="https://www.yoranbrondsema.com/post/the-guide-to-extracting-statistics-from-your-signal-conversations/">recipe +on how to use sqlcipher to extract statistics from the Signal +database</a> explaining how to do this. Unfortunately this did not +work with the version of sqlcipher in Debian. The +<a href="http://tracker.debian.org/sqlcipher/">sqlcipher</a> +package is a "fork" of the sqlite package with added support for +encrypted databases. Sadly the current Debian maintainer +<a href="https://bugs.debian.org/961598">announced more than three +years ago that he did not have time to maintain sqlcipher</a>, so it +seemed unlikely to be upgraded by the maintainer. I was reluctant to +take on the job myself, as I have very limited experience maintaining +shared libraries in Debian. After waiting and hoping for a few +months, I gave up the last week, and set out to update the package. In +the process I orphaned it to make it more obvious for the next person +looking at it that the package need proper maintenance.</p> + +<p>The version in Debian was around five years old, and quite a lot of +changes had taken place upstream into the Debian maintenance git +repository. After spending a few days importing the new upstream +versions, realising that upstream did not care much for SONAME +versioning as I saw library symbols being both added and removed with +minor version number changes to the project, I concluded that I had to +do a SONAME bump of the library package to avoid surprising the +reverse dependencies. I even added a simple +autopkgtest script to ensure the package work as intended. Dug deep +into the hole of learning shared library maintenance, I set out a few +days ago to upload the new version to Debian experimental to see what +the quality assurance framework in Debian had to say about the result. +The feedback told me the pacakge was not too shabby, and yesterday I +uploaded the latest version to Debian unstable. It should enter +testing today or tomorrow, perhaps delayed by +<a href="https://bugs.debian.org/1055812">a small library +transition</a>.</p> + +<p>Armed with a new version of sqlcipher, I can now have a look at the +SQL database in ~/.config/Signal/sql/db.sqlite. First, one need to +fetch the encryption key from the Signal configuration using this +simple JSON extraction command:</p> + +<pre>/usr/bin/jq -r '."key"' ~/.config/Signal/config.json</pre> + +<p>Assuming the result from that command is 'secretkey', which is a +hexadecimal number representing the key used to encrypt the database. +Next, one can now connect to the database and inject the encryption +key for access via SQL to fetch information from the database. Here +is an example dumping the database structure:</p> + +<pre> +% sqlcipher ~/.config/Signal/sql/db.sqlite +sqlite> PRAGMA key = "x'secretkey'"; +sqlite> .schema +CREATE TABLE sqlite_stat1(tbl,idx,stat); +CREATE TABLE conversations( + id STRING PRIMARY KEY ASC, + json TEXT, + + active_at INTEGER, + type STRING, + members TEXT, + name TEXT, + profileName TEXT + , profileFamilyName TEXT, profileFullName TEXT, e164 TEXT, serviceId TEXT, groupId TEXT, profileLastFetchedAt INTEGER); +CREATE TABLE identityKeys( + id STRING PRIMARY KEY ASC, + json TEXT + ); +CREATE TABLE items( + id STRING PRIMARY KEY ASC, + json TEXT + ); +CREATE TABLE sessions( + id TEXT PRIMARY KEY, + conversationId TEXT, + json TEXT + , ourServiceId STRING, serviceId STRING); +CREATE TABLE attachment_downloads( + id STRING primary key, + timestamp INTEGER, + pending INTEGER, + json TEXT + ); +CREATE TABLE sticker_packs( + id TEXT PRIMARY KEY, + key TEXT NOT NULL, + + author STRING, + coverStickerId INTEGER, + createdAt INTEGER, + downloadAttempts INTEGER, + installedAt INTEGER, + lastUsed INTEGER, + status STRING, + stickerCount INTEGER, + title STRING + , attemptedStatus STRING, position INTEGER DEFAULT 0 NOT NULL, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync + INTEGER DEFAULT 0 NOT NULL); +CREATE TABLE stickers( + id INTEGER NOT NULL, + packId TEXT NOT NULL, + + emoji STRING, + height INTEGER, + isCoverOnly INTEGER, + lastUsed INTEGER, + path STRING, + width INTEGER, + + PRIMARY KEY (id, packId), + CONSTRAINT stickers_fk + FOREIGN KEY (packId) + REFERENCES sticker_packs(id) + ON DELETE CASCADE + ); +CREATE TABLE sticker_references( + messageId STRING, + packId TEXT, + CONSTRAINT sticker_references_fk + FOREIGN KEY(packId) + REFERENCES sticker_packs(id) + ON DELETE CASCADE + ); +CREATE TABLE emojis( + shortName TEXT PRIMARY KEY, + lastUsage INTEGER + ); +CREATE TABLE messages( + rowid INTEGER PRIMARY KEY ASC, + id STRING UNIQUE, + json TEXT, + readStatus INTEGER, + expires_at INTEGER, + sent_at INTEGER, + schemaVersion INTEGER, + conversationId STRING, + received_at INTEGER, + source STRING, + hasAttachments INTEGER, + hasFileAttachments INTEGER, + hasVisualMediaAttachments INTEGER, + expireTimer INTEGER, + expirationStartTimestamp INTEGER, + type STRING, + body TEXT, + messageTimer INTEGER, + messageTimerStart INTEGER, + messageTimerExpiresAt INTEGER, + isErased INTEGER, + isViewOnce INTEGER, + sourceServiceId TEXT, serverGuid STRING NULL, sourceDevice INTEGER, storyId STRING, isStory INTEGER + GENERATED ALWAYS AS (type IS 'story'), isChangeCreatedByUs INTEGER NOT NULL DEFAULT 0, isTimerChangeFromSync INTEGER + GENERATED ALWAYS AS ( + json_extract(json, '$.expirationTimerUpdate.fromSync') IS 1 + ), seenStatus NUMBER default 0, storyDistributionListId STRING, expiresAt INT + GENERATED ALWAYS + AS (ifnull( + expirationStartTimestamp + (expireTimer * 1000), + 9007199254740991 + )), shouldAffectActivity INTEGER + GENERATED ALWAYS AS ( + type IS NULL + OR + type NOT IN ( + 'change-number-notification', + 'contact-removed-notification', + 'conversation-merge', + 'group-v1-migration', + 'keychange', + 'message-history-unsynced', + 'profile-change', + 'story', + 'universal-timer-notification', + 'verified-change' + ) + ), shouldAffectPreview INTEGER + GENERATED ALWAYS AS ( + type IS NULL + OR + type NOT IN ( + 'change-number-notification', + 'contact-removed-notification', + 'conversation-merge', + 'group-v1-migration', + 'keychange', + 'message-history-unsynced', + 'profile-change', + 'story', + 'universal-timer-notification', + 'verified-change' + ) + ), isUserInitiatedMessage INTEGER + GENERATED ALWAYS AS ( + type IS NULL + OR + type NOT IN ( + 'change-number-notification', + 'contact-removed-notification', + 'conversation-merge', + 'group-v1-migration', + 'group-v2-change', + 'keychange', + 'message-history-unsynced', + 'profile-change', + 'story', + 'universal-timer-notification', + 'verified-change' + ) + ), mentionsMe INTEGER NOT NULL DEFAULT 0, isGroupLeaveEvent INTEGER + GENERATED ALWAYS AS ( + type IS 'group-v2-change' AND + json_array_length(json_extract(json, '$.groupV2Change.details')) IS 1 AND + json_extract(json, '$.groupV2Change.details[0].type') IS 'member-remove' AND + json_extract(json, '$.groupV2Change.from') IS NOT NULL AND + json_extract(json, '$.groupV2Change.from') IS json_extract(json, '$.groupV2Change.details[0].aci') + ), isGroupLeaveEventFromOther INTEGER + GENERATED ALWAYS AS ( + isGroupLeaveEvent IS 1 + AND + isChangeCreatedByUs IS 0 + ), callId TEXT + GENERATED ALWAYS AS ( + json_extract(json, '$.callId') + )); +CREATE TABLE sqlite_stat4(tbl,idx,neq,nlt,ndlt,sample); +CREATE TABLE jobs( + id TEXT PRIMARY KEY, + queueType TEXT STRING NOT NULL, + timestamp INTEGER NOT NULL, + data STRING TEXT + ); +CREATE TABLE reactions( + conversationId STRING, + emoji STRING, + fromId STRING, + messageReceivedAt INTEGER, + targetAuthorAci STRING, + targetTimestamp INTEGER, + unread INTEGER + , messageId STRING); +CREATE TABLE senderKeys( + id TEXT PRIMARY KEY NOT NULL, + senderId TEXT NOT NULL, + distributionId TEXT NOT NULL, + data BLOB NOT NULL, + lastUpdatedDate NUMBER NOT NULL + ); +CREATE TABLE unprocessed( + id STRING PRIMARY KEY ASC, + timestamp INTEGER, + version INTEGER, + attempts INTEGER, + envelope TEXT, + decrypted TEXT, + source TEXT, + serverTimestamp INTEGER, + sourceServiceId STRING + , serverGuid STRING NULL, sourceDevice INTEGER, receivedAtCounter INTEGER, urgent INTEGER, story INTEGER); +CREATE TABLE sendLogPayloads( + id INTEGER PRIMARY KEY ASC, + + timestamp INTEGER NOT NULL, + contentHint INTEGER NOT NULL, + proto BLOB NOT NULL + , urgent INTEGER, hasPniSignatureMessage INTEGER DEFAULT 0 NOT NULL); +CREATE TABLE sendLogRecipients( + payloadId INTEGER NOT NULL, + + recipientServiceId STRING NOT NULL, + deviceId INTEGER NOT NULL, + + PRIMARY KEY (payloadId, recipientServiceId, deviceId), + + CONSTRAINT sendLogRecipientsForeignKey + FOREIGN KEY (payloadId) + REFERENCES sendLogPayloads(id) + ON DELETE CASCADE + ); +CREATE TABLE sendLogMessageIds( + payloadId INTEGER NOT NULL, + + messageId STRING NOT NULL, + + PRIMARY KEY (payloadId, messageId), + + CONSTRAINT sendLogMessageIdsForeignKey + FOREIGN KEY (payloadId) + REFERENCES sendLogPayloads(id) + ON DELETE CASCADE + ); +CREATE TABLE preKeys( + id STRING PRIMARY KEY ASC, + json TEXT + , ourServiceId NUMBER + GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId'))); +CREATE TABLE signedPreKeys( + id STRING PRIMARY KEY ASC, + json TEXT + , ourServiceId NUMBER + GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId'))); +CREATE TABLE badges( + id TEXT PRIMARY KEY, + category TEXT NOT NULL, + name TEXT NOT NULL, + descriptionTemplate TEXT NOT NULL + ); +CREATE TABLE badgeImageFiles( + badgeId TEXT REFERENCES badges(id) + ON DELETE CASCADE + ON UPDATE CASCADE, + 'order' INTEGER NOT NULL, + url TEXT NOT NULL, + localPath TEXT, + theme TEXT NOT NULL + ); +CREATE TABLE storyReads ( + authorId STRING NOT NULL, + conversationId STRING NOT NULL, + storyId STRING NOT NULL, + storyReadDate NUMBER NOT NULL, + + PRIMARY KEY (authorId, storyId) + ); +CREATE TABLE storyDistributions( + id STRING PRIMARY KEY NOT NULL, + name TEXT, + + senderKeyInfoJson STRING + , deletedAtTimestamp INTEGER, allowsReplies INTEGER, isBlockList INTEGER, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync INTEGER); +CREATE TABLE storyDistributionMembers( + listId STRING NOT NULL REFERENCES storyDistributions(id) + ON DELETE CASCADE + ON UPDATE CASCADE, + serviceId STRING NOT NULL, + + PRIMARY KEY (listId, serviceId) + ); +CREATE TABLE uninstalled_sticker_packs ( + id STRING NOT NULL PRIMARY KEY, + uninstalledAt NUMBER NOT NULL, + storageID STRING, + storageVersion NUMBER, + storageUnknownFields BLOB, + storageNeedsSync INTEGER NOT NULL + ); +CREATE TABLE groupCallRingCancellations( + ringId INTEGER PRIMARY KEY, + createdAt INTEGER NOT NULL + ); +CREATE TABLE IF NOT EXISTS 'messages_fts_data'(id INTEGER PRIMARY KEY, block BLOB); +CREATE TABLE IF NOT EXISTS 'messages_fts_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID; +CREATE TABLE IF NOT EXISTS 'messages_fts_content'(id INTEGER PRIMARY KEY, c0); +CREATE TABLE IF NOT EXISTS 'messages_fts_docsize'(id INTEGER PRIMARY KEY, sz BLOB); +CREATE TABLE IF NOT EXISTS 'messages_fts_config'(k PRIMARY KEY, v) WITHOUT ROWID; +CREATE TABLE edited_messages( + messageId STRING REFERENCES messages(id) + ON DELETE CASCADE, + sentAt INTEGER, + readStatus INTEGER + , conversationId STRING); +CREATE TABLE mentions ( + messageId REFERENCES messages(id) ON DELETE CASCADE, + mentionAci STRING, + start INTEGER, + length INTEGER + ); +CREATE TABLE kyberPreKeys( + id STRING PRIMARY KEY NOT NULL, + json TEXT NOT NULL, ourServiceId NUMBER + GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId'))); +CREATE TABLE callsHistory ( + callId TEXT PRIMARY KEY, + peerId TEXT NOT NULL, -- conversation id (legacy) | uuid | groupId | roomId + ringerId TEXT DEFAULT NULL, -- ringer uuid + mode TEXT NOT NULL, -- enum "Direct" | "Group" + type TEXT NOT NULL, -- enum "Audio" | "Video" | "Group" + direction TEXT NOT NULL, -- enum "Incoming" | "Outgoing + -- Direct: enum "Pending" | "Missed" | "Accepted" | "Deleted" + -- Group: enum "GenericGroupCall" | "OutgoingRing" | "Ringing" | "Joined" | "Missed" | "Declined" | "Accepted" | "Deleted" + status TEXT NOT NULL, + timestamp INTEGER NOT NULL, + UNIQUE (callId, peerId) ON CONFLICT FAIL + ); +[ dropped all indexes to save space in this blog post ] +CREATE TRIGGER messages_on_view_once_update AFTER UPDATE ON messages + WHEN + new.body IS NOT NULL AND new.isViewOnce = 1 + BEGIN + DELETE FROM messages_fts WHERE rowid = old.rowid; + END; +CREATE TRIGGER messages_on_insert AFTER INSERT ON messages + WHEN new.isViewOnce IS NOT 1 AND new.storyId IS NULL + BEGIN + INSERT INTO messages_fts + (rowid, body) + VALUES + (new.rowid, new.body); + END; +CREATE TRIGGER messages_on_delete AFTER DELETE ON messages BEGIN + DELETE FROM messages_fts WHERE rowid = old.rowid; + DELETE FROM sendLogPayloads WHERE id IN ( + SELECT payloadId FROM sendLogMessageIds + WHERE messageId = old.id + ); + DELETE FROM reactions WHERE rowid IN ( + SELECT rowid FROM reactions + WHERE messageId = old.id + ); + DELETE FROM storyReads WHERE storyId = old.storyId; + END; +CREATE VIRTUAL TABLE messages_fts USING fts5( + body, + tokenize = 'signal_tokenizer' + ); +CREATE TRIGGER messages_on_update AFTER UPDATE ON messages + WHEN + (new.body IS NULL OR old.body IS NOT new.body) AND + new.isViewOnce IS NOT 1 AND new.storyId IS NULL + BEGIN + DELETE FROM messages_fts WHERE rowid = old.rowid; + INSERT INTO messages_fts + (rowid, body) + VALUES + (new.rowid, new.body); + END; +CREATE TRIGGER messages_on_insert_insert_mentions AFTER INSERT ON messages + BEGIN + INSERT INTO mentions (messageId, mentionAci, start, length) + + SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci, + bodyRanges.value ->> 'start' as start, + bodyRanges.value ->> 'length' as length + FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges + WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL + + AND messages.id = new.id; + END; +CREATE TRIGGER messages_on_update_update_mentions AFTER UPDATE ON messages + BEGIN + DELETE FROM mentions WHERE messageId = new.id; + INSERT INTO mentions (messageId, mentionAci, start, length) + + SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci, + bodyRanges.value ->> 'start' as start, + bodyRanges.value ->> 'length' as length + FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges + WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL + + AND messages.id = new.id; + END; +sqlite> +</pre> + +<p>Finally I have the tool needed to inspect and process Signal +messages that I need, without using the vendor provided client. Now +on to transforming it to a more useful format.</p> + +<p>As usual, if you use Bitcoin and want to show your support of my +activities, please send Bitcoin donations to my address +<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p> + + + New chrpath release 0.17 https://people.skolelinux.org/pere/blog/New_chrpath_release_0_17.html @@ -472,150 +946,6 @@ Debian. Not sure how much work it would be to get it working, but suspect some kernel related packages need to be extended with more header files to get it working.</p> -<p>As usual, if you use Bitcoin and want to show your support of my -activities, please send Bitcoin donations to my address -<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p> - - - - - Speech to text, she APTly whispered, how hard can it be? - https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html - https://people.skolelinux.org/pere/blog/Speech_to_text__she_APTly_whispered__how_hard_can_it_be_.html - Sun, 23 Apr 2023 09:40:00 +0200 - <p>While visiting a convention during Easter, it occurred to me that -it would be great if I could have a digital Dictaphone with -transcribing capabilities, providing me with texts to cut-n-paste into -stuff I need to write. The background is that long drives often bring -up the urge to write on texts I am working on, which of course is out -of the question while driving. With the release of -<a href="https://github.com/openai/whisper/">OpenAI Whisper</a>, this -seem to be within reach with Free Software, so I decided to give it a -go. OpenAI Whisper is a Linux based neural network system to read in -audio files and provide text representation of the speech in that -audio recording. It handle multiple languages and according to its -creators even can translate into a different language than the spoken -one. I have not tested the latter feature. It can either use the CPU -or a GPU with CUDA support. As far as I can tell, CUDA in practice -limit that feature to NVidia graphics cards. I have few of those, as -they do not work great with free software drivers, and have not tested -the GPU option. While looking into the matter, I did discover some -work to provide CUDA support on non-NVidia GPUs, and some work with -the library used by Whisper to port it to other GPUs, but have not -spent much time looking into GPU support yet. I've so far used an old -X220 laptop as my test machine, and only transcribed using its -CPU.</p> - -<p>As it from a privacy standpoint is unthinkable to use computers -under control of someone else (aka a "cloud" service) to transcribe -ones thoughts and personal notes, I want to run the transcribing -system locally on my own computers. The only sensible approach to me -is to make the effort I put into this available for any Linux user and -to upload the needed packages into Debian. Looking at Debian Bookworm, I -discovered that only three packages were missing, -<a href="https://bugs.debian.org/1034307">tiktoken</a>, -<a href="https://bugs.debian.org/1034144">triton</a>, and -<a href="https://bugs.debian.org/1034091">openai-whisper</a>. For a while -I also believed -<a href="https://bugs.debian.org/1034286">ffmpeg-python</a> was -needed, but as its -<a href="https://github.com/kkroening/ffmpeg-python/issues/760">upstream -seem to have vanished</a> I found it safer -<a href="https://github.com/openai/whisper/pull/1242">to rewrite -whisper</a> to stop depending on in than to introduce ffmpeg-python -into Debian. I decided to place these packages under the umbrella of -<a href="https://salsa.debian.org/deeplearning-team">the Debian Deep -Learning Team</a>, which seem like the best team to look after such -packages. Discussing the topic within the group also made me aware -that the triton package was already a future dependency of newer -versions of the torch package being planned, and would be needed after -Bookworm is released.</p> - -<p>All required code packages have been now waiting in -<a href="https://ftp-master.debian.org/new.html">the Debian NEW -queue</a> since Wednesday, heading for Debian Experimental until -Bookworm is released. An unsolved issue is how to handle the neural -network models used by Whisper. The default behaviour of Whisper is -to require Internet connectivity and download the model requested to -<tt>~/.cache/whisper/</tt> on first invocation. This obviously would -fail <a href="https://people.debian.org/~bap/dfsg-faq.html">the -deserted island test of free software</a> as the Debian packages would -be unusable for someone stranded with only the Debian archive and solar -powered computer on a deserted island.</p> - -<p>Because of this, I would love to include the models in the Debian -mirror system. This is problematic, as the models are very large -files, which would put a heavy strain on the Debian mirror -infrastructure around the globe. The strain would be even higher if -the models change often, which luckily as far as I can tell they do -not. The small model, which according to its creator is most useful -for English and in my experience is not doing a great job there -either, is 462 MiB (deb is 414 MiB). The medium model, which to me -seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB) -and the large model is 2.9 GiB (deb is 2.6 GiB). I would assume -everyone with enough resources would prefer to use the large model for -highest quality. I believe the models themselves would have to go -into the non-free part of the Debian archive, as they are not really -including any useful source code for updating the models. The -"source", aka the model training set, according to the creators -consist of "680,000 hours of multilingual and multitask supervised -data collected from the web", which to me reads material with both -unknown copyright terms, unavailable to the general public. In other -words, the source is not available according to the Debian Free -Software Guidelines and the model should be considered non-free.</p> - -<p>I asked the Debian FTP masters for advice regarding uploading a -model package on their IRC channel, and based on the feedback there it -is still unclear to me if such package would be accepted into the -archive. In any case I wrote build rules for a -<a href="https://salsa.debian.org/deeplearning-team/openai-whisper-model">OpenAI -Whisper model package</a> and -<a href="https://github.com/openai/whisper/pull/1257">modified the -Whisper code base</a> to prefer shared files under <tt>/usr/</tt> and -<tt>/var/</tt> over user specific files in <tt>~/.cache/whisper/</tt> -to be able to use these model packages, to prepare for such -possibility. One solution might be to include only one of the models -(small or medium, I guess) in the Debian archive, and ask people to -download the others from the Internet. Not quite sure what to do -here, and advice is most welcome (use the debian-ai mailing list).</p> - -<p>To make it easier to test the new packages while I wait for them to -clear the NEW queue, I created an APT source targeting bookworm. I -selected Bookworm instead of Bullseye, even though I know the latter -would reach more users, is that some of the required dependencies are -missing from Bullseye and I during this phase of testing did not want -to backport a lot of packages just to get up and running.</p> - -<p>Here is a recipe to run as user root if you want to test OpenAI -Whisper using Debian packages on your Debian Bookworm installation, -first adding the APT repository GPG key to the list of trusted keys, -then setting up the APT repository and finally installing the packages -and one of the models:</p> - -<p><pre> -curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \ - -o /etc/apt/trusted.gpg.d/pere-whisper.asc -mkdir -p /etc/apt/sources.list.d -cat > /etc/apt/sources.list.d/pere-whisper.list &lt;&lt;EOF -deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main -deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main -EOF -apt update -apt install openai-whisper -</pre></p> - -<p>The package work for me, but have not yet been tested on any other -computer than my own. With it, I have been able to (badly) transcribe -a 2 minute 40 second Norwegian audio clip to test using the small -model. This took 11 minutes and around 2.2 GiB of RAM. Transcribing -the same file with the medium model gave a accurate text in 77 minutes -using around 5.2 GiB of RAM. My test machine had too little memory to -test the large model, which I believe require 11 GiB of RAM. In -short, this now work for me using Debian packages, and I hope it will -for you and everyone else once the packages enter Debian.</p> - -<p>Now I can start on the audio recording part of this project.</p> - <p>As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address <b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>