1 <?xml version=
"1.0" encoding=
"utf-8"?>
2 <rss version='
2.0' xmlns:lj='http://www.livejournal.org/rss/lj/
1.0/' xmlns:
atom=
"http://www.w3.org/2005/Atom">
4 <title>Petter Reinholdtsen
</title>
5 <description></description>
6 <link>https://people.skolelinux.org/pere/blog/
</link>
7 <atom:link href=
"https://people.skolelinux.org/pere/blog/index.rss" rel=
"self" type=
"application/rss+xml" />
10 <title>Time to move orphaned Debian packages to git
</title>
11 <link>https://people.skolelinux.org/pere/blog/Time_to_move_orphaned_Debian_packages_to_git.html
</link>
12 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/Time_to_move_orphaned_Debian_packages_to_git.html
</guid>
13 <pubDate>Sun,
14 Apr
2024 09:
30:
00 +
0200</pubDate>
14 <description><p
>There are several packages in Debian without a associated git
15 repository with the packaging history. This is unfortunate and it
16 would be nice if more of these would do so. Quote a lot of these are
17 without a maintainer, ie listed as maintained by the
18 '<a href=
"https://qa.debian.org/developer.php?email=packages%
40qa.debian.org
">Debian
19 QA Group
</a
>' place holder. In fact,
438 packages have this property
20 according to UDD (
<tt
>SELECT source FROM sources WHERE release =
'sid
'
21 AND (vcs_url ilike
'%anonscm.debian.org%
' OR vcs_browser ilike
22 '%anonscm.debian.org%
' or vcs_url IS NULL OR vcs_browser IS NULL) AND
23 maintainer ilike
'%packages@qa.debian.org%
';
</tt
>). Such packages can
24 be updated without much coordination by any Debian developer, as they
25 are considered orphaned.
</p
>
27 <p
>To try to improve the situation and reduce the number of packages
28 without associated git repository, I started a few days ago to search
29 out candiates and provide them with a git repository under the
30 'debian
' collaborative Salsa project. I started with the packages
31 pointing to obsolete Alioth git repositories, and am now working my
32 way across the ones completely without git references. In addition to
33 updating the Vcs-* debian/control fields, I try to update
34 Standards-Version, debhelper compat level, simplify d/rules, switch to
35 Rules-Requires-Root: no and fix lintian issues reported. I only
36 implement those that are trivial to fix, to avoid spending too much
37 time on each orphaned package. So far my experience is that it take
38 aproximately
20 minutes to convert a package without any git
39 references, and a lot more for packages with existing git repositories
40 incompatible with git-buildpackages.
</p
>
42 <p
>So far I have converted
10 packages, and I will keep going until I
43 run out of steam. As should be clear from the numbers, there is
44 enough packages remaining for more people to do the same without
45 stepping on each others toes. I find it useful to start by searching
46 for a git repo already on salsa, as I find that some times a git repo
47 has already been created, but no new version is uploaded to Debian
48 yet. In those cases I start with the existing git repository. I
49 convert to the git-buildpackage+pristine-tar workflow, and ensure a
50 debian/gbp.conf file with
"pristine-tar=True
" is added early, to avoid
51 uploading a orig.tar.gz with the wrong checksum by mistake. Did that
52 three times in the begin before I remembered my mistake.
</p
>
54 <p
>So, if you are a Debian Developer and got some spare time, perhaps
55 considering migrating some orphaned packages to git?
</p
>
57 <p
>As usual, if you use Bitcoin and want to show your support of my
58 activities, please send Bitcoin donations to my address
59 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>
64 <title>Plain text accounting file from your bitcoin transactions
</title>
65 <link>https://people.skolelinux.org/pere/blog/Plain_text_accounting_file_from_your_bitcoin_transactions.html
</link>
66 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/Plain_text_accounting_file_from_your_bitcoin_transactions.html
</guid>
67 <pubDate>Thu,
7 Mar
2024 18:
00:
00 +
0100</pubDate>
68 <description><p
>A while back I wrote a small script to extract the Bitcoin
69 transactions in a wallet in the
70 <ahref=
"https://plaintextaccounting.org/
">ledger plain text accounting
71 format
</a
>. The last few days I spent some time to get it working
72 better with more special cases. In case it can be useful for others,
73 here is a copy:
</p
>
75 <p
><blockquote
><pre
>
77 # -*- coding: utf-
8 -*-
78 # Copyright (c)
2023-
2024 Petter Reinholdtsen
80 from decimal import Decimal
87 def format_float(num):
88 return numpy.format_float_positional(num, trim=
'-
')
91 u
'amount
' :
'Assets:BTC:main
',
95 '<some address
>' :
'Assets:bankkonto
',
96 '<some address
>' :
'Assets:bankkonto
',
100 proc = subprocess.Popen(cmd,stdout=subprocess.PIPE)
101 j = json.loads(proc.communicate()[
0], parse_float=Decimal)
105 # get all transactions for all accounts / addresses
110 cmd = [
'bitcoin-cli
',
'listtransactions
',
'*
', str(limit)]
112 txs.extend(exec_json(cmd))
114 # Useful for debugging
115 with open(
'transactions.json
') as f:
116 txs.extend(json.load(f, parse_float=Decimal))
118 for tx in sorted(txs, key=lambda a: a[
'time
']):
119 # print tx[
'category
']
120 if
'abandoned
' in tx and tx[
'abandoned
']:
122 if
'confirmations
' in tx and
0 >= tx[
'confirmations
']:
124 when = time.strftime(
'%Y-%m-%d %H:%M
', time.localtime(tx[
'time
']))
125 if
'message
' in tx:
126 desc = tx[
'message
']
127 elif
'comment
' in tx:
128 desc = tx[
'comment
']
129 elif
'label
' in tx:
130 desc = tx[
'label
']
133 print(
"%s %s
" % (when, desc))
134 if
'address
' in tx:
135 print(
" ; to bitcoin address %s
" % tx[
'address
'])
137 print(
" ; missing address in transaction, txid=%s
" % tx[
'txid
'])
138 print(f
" ; amount={tx[
'amount
']}
")
139 if
'fee
'in tx:
140 print(f
" ; fee={tx[
'fee
']}
")
141 for f in accounts.keys():
142 if f in tx and Decimal(
0) != tx[f]:
144 print(
" %-
20s %s BTC
" % (accounts[f], format_float(amount)))
145 if
'fee
' in tx and Decimal(
0) != tx[
'fee
']:
146 # Make sure to list fee used in several transactions only once.
147 if
'fee
' in tx and tx[
'txid
'] in txidfee \
148 and tx[
'fee
'] == txidfee[tx[
'txid
']]:
151 fee = tx[
'fee
']
152 print(
" %-
20s %s BTC
" % (accounts[
'amount
'], format_float(fee)))
153 print(
" %-
20s %s BTC
" % (
'Expences:BTC-fee
', format_float(-fee)))
154 txidfee[tx[
'txid
']] = tx[
'fee
']
156 if
'address
' in tx and tx[
'address
'] in addresses:
157 print(
" %s
" % addresses[tx[
'address
']])
159 if
'generate
' == tx[
'category
']:
160 print(
" Income:BTC-mining
")
162 if amount
< Decimal(
0):
163 print(f
" Assets:unknown:sent:update-script-addr-{tx[
'address
']}
")
165 print(f
" Assets:unknown:received:update-script-addr-{tx[
'address
']}
")
169 print(
"# Found %d transactions
" % c)
171 print(f
"# Warning: Limit {limit} reached, consider increasing limit.
")
177 </pre
></blockquote
></p
>
179 <p
>It is more of a proof of concept, and I do not expect it to handle
180 all edge cases, but it worked for me, and perhaps you can find it
181 useful too.
</p
>
183 <p
>To get a more interesting result, it is useful to map accounts sent
184 to or received from to accounting accounts, using the
185 <tt
>addresses
</tt
> hash. As these will be very context dependent, I
186 leave out my list to allow each user to fill out their own list of
187 accounts. Out of the box,
'ledger reg BTC:main
' should be able to
188 show the amount of BTCs present in the wallet at any given time in the
189 past. For other and more valuable analysis, a account plan need to be
190 set up in the
<tt
>addresses
</tt
> hash. Here is an example
191 transaction:
</p
>
193 <p
><blockquote
><pre
>
194 2024-
03-
07 17:
00 Donated to good cause
195 Assets:BTC:main -
0.1 BTC
196 Assets:BTC:main -
0.00001 BTC
197 Expences:BTC-fee
0.00001 BTC
198 Expences:donations
0.1 BTC
199 </pre
></blockquote
></p
>
201 <p
>It need a running Bitcoin Core daemon running, as it connect to it
202 using
<tt
>bitcoin-cli listtransactions *
100000</tt
> to extract the
203 transactions listed in the Wallet.
</p
>
205 <p
>As usual, if you use Bitcoin and want to show your support of my
206 activities, please send Bitcoin donations to my address
207 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>
212 <title>RAID status from LSI Megaraid controllers using free software
</title>
213 <link>https://people.skolelinux.org/pere/blog/RAID_status_from_LSI_Megaraid_controllers_using_free_software.html
</link>
214 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/RAID_status_from_LSI_Megaraid_controllers_using_free_software.html
</guid>
215 <pubDate>Sun,
3 Mar
2024 22:
40:
00 +
0100</pubDate>
216 <description><p
>The last few days I have revisited RAID setup using the LSI
217 Megaraid controller. These are a family of controllers called PERC by
218 Dell, and is present in several old PowerEdge servers, and I recently
219 got my hands on one of these. I had forgotten how to handle this RAID
220 controller in Debian, so I had to take a peek in the
221 <a href=
"https://wiki.debian.org/LinuxRaidForAdmins
">Debian wiki page
222 "Linux and Hardware RAID: an administrator
's summary
"</a
> to remember
223 what kind of software is available to configure and monitor the disks
224 and controller. I prefer Free Software alternatives to proprietary
225 tools, as the later tend to fall into disarray once the manufacturer
226 loose interest, and often do not work with newer Linux Distributions.
227 Sadly there is no free software tool to configure the RAID setup, only
228 to monitor it. RAID can provide improved reliability and resilience in
229 a storage solution, but only if it is being regularly checked and any
230 broken disks are being replaced in time. I thus want to ensure some
231 automatic monitoring is available.
</p
>
233 <p
>In the discovery process, I came across a old free software tool to
234 monitor PERC2, PERC3, PERC4 and PERC5 controllers, which to my
235 surprise is not present in debian. To help change that I created a
236 <a href=
"https://bugs.debian.org/
1065322">request for packaging of the
237 megactl package
</a
>, and tried to track down a usable version.
238 <a href=
"https://sourceforge.net/p/megactl/
">The original project
239 site
</a
> is on Sourceforge, but as far as I can tell that project has
240 been dead for more than
15 years. I managed to find a
241 <a href=
"https://github.com/hmage/megactl
">more recent fork on
242 github
</a
> from user hmage, but it is unclear to me if this is still
243 being maintained. It has not seen much improvements since
2016. A
244 <a href=
"https://github.com/namiltd/megactl
">more up to date
245 edition
</a
> is a git fork from the original github fork by user
246 namiltd, and this newer fork seem a lot more promising. The owner of
247 this github repository has replied to change proposals within hours,
248 and had already added some improvements and support for more hardware.
249 Sadly he is reluctant to commit to maintaining the tool and stated in
250 <a href=
"https://github.com/namiltd/megactl/pull/
1">my first pull
251 request
</A
> that he think a new release should be made based on the
252 git repository owned by hmage. I perfectly understand this
253 reluctance, as I feel the same about maintaining yet another package
254 in Debian when I barely have time to take care of the ones I already
255 maintain, but do not really have high hopes that hmage will have time
256 to spend on it and hope namiltd will change his mind.
</p
>
258 <p
>In any case, I created
259 <a href=
"https://salsa.debian.org/debian/megactl
">a draft package
</a
>
260 based on the namiltd edition and put it under the debian group on
261 salsa.debian.org. If you own a Dell PowerEdge server with one of the
262 PERC controllers, or any other RAID controller using the megaraid or
263 megaraid_sas Linux kernel modules, you might want to check it out. If
264 enough people are interested, perhaps the package will make it into
265 the Debian archive.
</p
>
267 <p
>There are two tools provided, megactl for the megaraid Linux kernel
268 module, and megasasctl for the megaraid_sas Linux kernel module. The
269 simple output from the command on one of my machines look like this
270 (yes, I know some of the disks have problems. :).
</p
>
274 a0 PERC H730 Mini encl:
1 ldrv:
2 batt:good
275 a0d0
558GiB RAID
1 1x2 optimal
276 a0d1
3067GiB RAID
0 1x11 optimal
277 a0e32s0
558GiB a0d0 online errs: media:
0 other:
19
278 a0e32s1
279GiB a0d1 online
279 a0e32s2
279GiB a0d1 online
280 a0e32s3
279GiB a0d1 online
281 a0e32s4
279GiB a0d1 online
282 a0e32s5
279GiB a0d1 online
283 a0e32s6
279GiB a0d1 online
284 a0e32s8
558GiB a0d0 online errs: media:
0 other:
17
285 a0e32s9
279GiB a0d1 online
286 a0e32s10
279GiB a0d1 online
287 a0e32s11
279GiB a0d1 online
288 a0e32s12
279GiB a0d1 online
289 a0e32s13
279GiB a0d1 online
294 <p
>In addition to displaying a simple status report, it can also test
295 individual drives and print the various event logs. Perhaps you too
296 find it useful?
</p
>
298 <p
>In the packaging process I provided some patches upstream to
299 improve installation and ensure
300 <ahref=
"https://github.com/namiltd/megactl/pull/
2">a Appstream
301 metainfo file is provided
</a
> to list all supported HW, to allow
302 <a href=
"https://tracker.debian.org/isenkram
">isenkram
</a
> to propose
303 the package on all servers with a relevant PCI card.
</p
>
305 <p
>As usual, if you use Bitcoin and want to show your support of my
306 activities, please send Bitcoin donations to my address
307 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>
313 <title>Frokostseminar om Noark
5 i Oslo tirsdag
2024-
03-
12</title>
314 <link>https://people.skolelinux.org/pere/blog/Frokostseminar_om_Noark_5_i_Oslo_tirsdag_2024_03_12.html
</link>
315 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/Frokostseminar_om_Noark_5_i_Oslo_tirsdag_2024_03_12.html
</guid>
316 <pubDate>Tue,
27 Feb
2024 15:
15:
00 +
0100</pubDate>
317 <description><p
>Nikita-prosjektet, der jeg er involvert, inviterer i samarbeid med
318 Oslo Byarkiv, forskningsgruppen METAINFO og foreningen NUUG, til et
319 frokostseminar om Noark
5 og Noark
5 Tjenestegrensesnitt tirsdag
320 2024-
03-
12. Seminaret finner sted ved Oslo byarkiv. Vi håper å få
321 til videostrømming via Internett av presentasjoner og paneldiskusjon.
322 Oppdatert program og lenker til påmeldingsskjema er
323 <a href=
"https://noark.codeberg.page/noark5-seminars/
2023-
03-
12-noark-workshop.html
">tilgjengelig
324 fra Nikita-prosjektet
</a
>. Arrangementet er gratis.
326 <p
>Som vanlig, hvis du bruker Bitcoin og ønsker å vise din støtte til
327 det jeg driver med, setter jeg pris på om du sender Bitcoin-donasjoner
329 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>. Merk,
330 betaling med bitcoin er ikke anonymt. :)
</p
>
335 <title>Welcome out of prison, Mickey, hope you find some freedom!
</title>
336 <link>https://people.skolelinux.org/pere/blog/Welcome_out_of_prison__Mickey__hope_you_find_some_freedom_.html
</link>
337 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/Welcome_out_of_prison__Mickey__hope_you_find_some_freedom_.html
</guid>
338 <pubDate>Mon,
1 Jan
2024 21:
00:
00 +
0100</pubDate>
339 <description><p align=
"center
"><img src=
"https://people.skolelinux.org/pere/blog/images/
2024-
01-
01-mikke-verk-i-det-fri.jpeg
"/
></p
>
341 <p
>Today, the animation figure Mickey Mouse finally was released from
342 the corporate copyright prison, as the
1928 movie
343 <a href=
"https://en.wikipedia.org/wiki/Steamboat_Willie
">Steamboat
344 Willie
</a
> entered the public domain in USA. This movie was the first
345 public appearance of Mickey Mouse. Sadly the figure is still on
346 probation, thanks to trademark laws and a the Disney corporations
347 powerful pack of lawyers, as described in the
2017 article
348 in
<a href=
"https://priceonomics.com/how-mickey-mouse-evades-the-public-domain/
">"How
349 Mickey Mouse Evades the Public Domain
"</a
> from Priceonomics. On the
350 positive side, the primary driver for repeated extentions of the
351 duration of copyright has been Disney thanks to Mickey Mouse and the
352 2028 movie, and as it now in the public domain I hope it will cause
353 less urge to extend the already unreasonable long copyright
356 <p
>The first book I published, the
2004 book
<a
357 href=
"https://free-culture.cc/
">"Free Culture
" by Lawrence Lessig
</a
>,
359 <a href=
"https://people.skolelinux.org/pere/publisher/#frikultur
">English,
360 French and Norwegian Bokmål
</a
>, touch on the story of Disney pushed
361 for extending the copyright duration in USA. It is a great book
362 explaining problems with the current copyright regime and why we need
363 Creative Commons movement, and I strongly recommend everyone to read
366 <p
>This movie (with
367 <a href=
"https://www.imdb.com/title/tt0019422/
">IMDB ID tt0019422
</a
>)
368 is now available from the Internet Archive. Two copies have been
369 uploaded so far, one uploaded
370 <a href=
"https://archive.org/details/SteamboatWillie
">2015-
11-
04</a
>
371 (
<a href=
"https://archive.org/download/SteamboatWillie/SteamboatWillie_archive.torrent
">torrent
</a
>)
373 <a href=
"https://archive.org/details/steamboat-willie-mickey
">2023-
01-
01</a
>
374 (
<a href=
"https://archive.org/download/steamboat-willie-mickey/steamboat-willie-mickey_archive.torrent
">torrent
</a
>) - see
375 <a href=
"https://people.skolelinux.org/pere/blog/VLC_bittorrent_plugin_still_going_strong__new_upload_2_14_4.html
">VLC
376 bittorrent plugin
</a
> for streaming the video using the torrent link.
377 I am very happy to see
378 <a href=
"https://people.skolelinux.org/pere/blog/Legal_to_share_more_than_16_000_movies_listed_on_IMDB_.html
">the
379 number of public domain movies
</a
> increasing. I look forward to
380 when those are the majority. Perhaps it will reduce the urge of the
381 copyright industry to control its customers.
</p
>
384 <a href=
"https://publicdomainreview.org/features/entering-the-public-domain/
2024/
">comprehensive
385 list of works entering the public domain in
2024</a
> is available from
386 the Public Domain Review.
</p
>
388 <p
>As usual, if you use Bitcoin and want to show your support of my
389 activities, please send Bitcoin donations to my address
390 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>
395 <title>VLC bittorrent plugin still going strong, new upload
2.14-
4</title>
396 <link>https://people.skolelinux.org/pere/blog/VLC_bittorrent_plugin_still_going_strong__new_upload_2_14_4.html
</link>
397 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/VLC_bittorrent_plugin_still_going_strong__new_upload_2_14_4.html
</guid>
398 <pubDate>Sun,
31 Dec
2023 10:
45:
00 +
0100</pubDate>
399 <description><p
>The other day I uploaded a new version of
400 <a href=
"https://tracker.debian.org/pkg/vlc-plugin-bittorrent
">the VLC
401 bittorrent plugin
</a
> to Debian, version
2.14-
4, to fix a few
402 packaging issues. This plugin extend VLC allowing it to stream videos
403 directly from a bittorrent source using both torrent files and magnet
404 links, as easy as using a HTTP or local file source. I believe such
405 protocol support is a vital feature in VLC, allowing efficient
406 streaming from sources such at the
11 million movies in
407 <a href=
"https://archive.org/
">the Internet Archive
</a
>. Bittorrent is
408 one of the most efficient content distribution protocols on the
409 Internet, without centralised control, and should be used more.
</p
>
411 <p
>The new version is now both in Debian Unstable and Testing, as well
412 as Ubuntu. While looking after the package, I decided to ask the VLC
413 upstream community if there was any hope to get Bittorrent support
414 into the official VLC program, and was very happy to learn that
415 someone is already working on it. I hope we can see some fruits of
416 that labour next year, but do not hold my breath. In the mean time we
417 can use the plugin, which is already
418 <a href=
"https://qa.debian.org/popcon.php?package=vlc-plugin-bittorrent
">installed
419 by
0.23 percent of the Debian population
</a
> according to
420 popularity-contest. It could use a new upstream release, and I hope
421 the upstream developer soon find time to polish it even more.
</p
>
423 <p
>It is worth noting that the plugin store the downloaded files in
424 <tt
>~/Downloads/vlc-bittorrent/
</tt
>, which can quickly fill up the
425 user home directory during use. Users of the plugin should keep an
426 eye with disk usage when streaming a bittorrent source.
</p
>
428 <p
>As usual, if you use Bitcoin and want to show your support of my
429 activities, please send Bitcoin donations to my address
430 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>
435 <title>«Når «på» blir «pÃ¥»: Et reservoar av tegn sett fra depotet» i tidsskriftet Aksess
</title>
436 <link>https://people.skolelinux.org/pere/blog/_N_r__p___blir__p_____Et_reservoar_av_tegn_sett_fra_depotet__i_tidsskriftet_Aksess.html
</link>
437 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/_N_r__p___blir__p_____Et_reservoar_av_tegn_sett_fra_depotet__i_tidsskriftet_Aksess.html
</guid>
438 <pubDate>Wed,
15 Nov
2023 09:
20:
00 +
0100</pubDate>
439 <description><p
>For noen uker siden skrev en kamerat og meg
440 <a href=
"https://www.aksess-tidsskrift.no/fordypning/
175530">en
441 artikkel om tegnsett
</a
> i
442 <a href=
"https://www.aksess-tidsskrift.no/
">arkivtidsskriftet
443 Aksess
</a
> både på web og i papirutgave nr.
3 2023. Her er det som
444 nettopp ble publisert.
</p
>
448 <p
><strong
>Når «på» blir «pÃ¥»: Et reservoar av tegn sett fra
449 depotet
</strong
></p
>
451 <p
>av Thomas Sødring og Petter Reinholdtsen
</p
>
453 <p
>De færreste av oss tenker over hva som skjer dypere i datamaskinen
454 mens vi sitter der og skriver noe på tastaturet. Når du trykker på
455 tasten «Å», så vises bokstaven Å. Men noen ganger blir det
456 feil. Hvorfor det – og hva er viktig å være klar over i
457 arkivsammenheng?
</p
>
459 <p
>Dersom bokstaver tolkes forskjellig mellom systemer, blir det fort
460 rot, dette kalles mojibake blant kjennere, etter det japanske
461 uttrykket for tegnomforming. Det er en lang historie her som tidvis
462 har vært preget av rot. Noen husker kanskje tilbake til en tid der
463 bokstavene æ, ø og å ofte var ødelagt i e-poster – et klassisk
464 eksempel på tegnsettproblemstilling.
</p
>
466 <p id=
"tegnsett_access_nå_og_før
"><strong
>«Nå» og «før»
</strong
></p
>
468 <p
>Tid er et skjult problem for depot fordi vi danner dokumentasjon i
469 en kontekst som er preget av å være «nå». Vår forståelse av verden og
470 bruken av teknologi er utgangspunktet for denne konteksten. Tenk selv
471 hvordan verden har utviklet seg de siste
20 årene, hva samfunnet er
472 opptatt av, og hvordan vi bruker teknologi i hverdagen. Tid er et
473 skjult problem fordi når vi trekker dokumentasjon ut av systemer og
474 deponerer for langtidsbevaring, er konteksten til materialet «nå», men
475 verden går videre. Ettersom teknologien og måten vi bruker den på,
476 utvikler seg, blir «nå» til «før», og dokumentasjonen befinner seg
477 snart i en «før»-kontekst.
</p
>
479 <p
>Dette med «før» og «nå» i forhold til dokumentasjonens kontekst er
480 noe vi er veldig lite bevisste på, men det er en problemstilling
481 depotarkivene eier og forvalter. En av disse utfordringene er hvorfor
482 «Ø» ikke nødvendigvis er det samme som «Ø», og hvorfor det i det hele
483 tatt gir mening å si noe sånt. Vi snakker her om noe som heter
484 tegnsett, som er en avtalt måte å representere bokstaver, tall og
485 andre symboler på slik at vi på en feilfri måte kan utveksle tekst
486 mellom datasystemer.
</p
>
488 <p
>Tegnsettproblemstillingen er satt sammen av fire fasetter;
489 repertoar, representasjon, koding og uttegning.
</p
>
491 <p id=
"tegnsett_access_repertoarer
"><strong
>Repertoarer
</strong
></p
>
493 <p
>Repertoar er en samling med tegn og symboler som kan
494 representeres. Tenk norsk alfabet eller japanske piktogrammer, men
495 også matematiske og elektroniske symboler. Bokstaven «stor a» kan være
496 en oppføring i et slikt repertoar. For å kunne brukes i en datamaskin
497 trenger hver oppføring i et slikt repertoar en representasjon, hvilket
498 i datamaskinsammenheng betyr at det tilordnes et tall. Tallet kan
499 lagres på ulike vis i en eller flere kodingsformater. For eksempel kan
500 en skrive tallet ti som både
10, X og A, i henholdsvis
501 titallssystemet, romertallssystemet og sekstentallssystemet.
</p
>
503 <p
>Hvis en skal kunne lese inn filer og vite hvilket tall og hvilken
504 representasjon og instans i et repertoar det er snakk om, så må en
505 vite hvordan tallet er kodet. Sist, men ikke minst, for å kunne bruke
506 symbolet til noe må det kunne være kjent hvordan det skal se ut eller
507 tegnes på ark. Det finnes utallige skrifttyper med norske bokstaver,
508 alle litt forskjellige, og skal en kunne tegne en stor A på skjermen,
509 så må datamaskinen vite hva den skal tegne. Skrifttyper inneholder
510 informasjon om hvordan ulike tall skal tegnes. De inneholder ikke
511 alltid alle symbolene som er brukt i en tekst, hvilket gjør at ikke
512 alle forståtte tegn vil kunne vises på skjerm eller ark.
</p
>
514 <p
>Hver av disse fasettene må være avklart for å kunne ta vare på og vise
515 frem tekst med en datamaskin. Kombinasjon av repertoar, representasjon
516 og koding er det en kaller et tegnsett. Kombinasjonen av
517 representasjon og uttegning kalles en skrifttype. De fleste
518 skrifttyper har også informasjon om repertoar, men det finnes
519 skrifttyper som kun kobler mellom tallkode og uttegning, uten å
520 fortelle noe om hvordan tallkodene egentlig skal tolkes.
</p
>
522 <p id=
"tegnsett_access_fra_ascii_til_iso_8859
"><strong
>Fra ASCII til ISO-
8859</strong
></p
>
524 <p
>Vi begynner historien med ASCII (American Standard Code for
525 Information Interchange) som har en historie som spores tilbake til
526 1963. Utgangspunktet til ASCII var at det kunne kode opp til
128
527 forskjellige symboler i vanlig bruk i USA. De visuelle symbolene i
528 ASCII er de små og store bokstavene (a til z og A til Z), tall (
0 til
529 9) og tegnsettingssymboler (for eksempel semikolon, komma og
530 punktum). ASCII har også noen usynlige symboler som ble brukt for
531 bl.a. kommunikasjon. Før ASCII var det for eksempel teleks-tegnsett
532 med plass til bare
32 tegn og EBCDIC med plass til
256 tegn, alle med
533 en helt annen rekkefølge på symbolene enn ASCII, men de har vært lite
534 brukt de siste femti årene. Et eksempel på noen utvalgte symboler i
535 repertoaret til ASCII vises i tabell
1.
</p
>
537 <table align=
"center
" width=
"50%
">
539 <caption
>Tabell
1. Eksempel på utvalgte symboler hentet fra
540 ASCII-tegnsettet. Kolonnen «Binær» viser symbolets verdi i
541 totallssystemet (
1 og
0 tall), mens kolonnen «Desimal» viser symbolets
542 verdi i titallssystemet.
</caption
>
546 <th
>Grafisk
</th
>
547 <th
>Binær
</th
>
548 <th
>Desimal
</th
>
551 <td
>A
</td
>
552 <td
>1000001</td
>
553 <td align=
"right
">65</td
>
556 <td
>M
</td
>
557 <td
>1001101</td
>
558 <td align=
"right
">77</td
>
561 <td
>Z
</td
>
562 <td
>1011010</td
>
563 <td align=
"right
">90</td
>
566 <td
>a
</td
>
567 <td
>1100001</td
>
568 <td align=
"right
">97</td
>
571 <td
>m
</td
>
572 <td
>1101101</td
>
573 <td align=
"right
">109</td
>
576 <td
>z
</td
>
577 <td
>1111010</td
>
578 <td align=
"right
">122</td
>
581 <td
>0</td
>
582 <td
>0110000</td
>
583 <td align=
"right
">48</td
>
586 <td
>9</td
>
587 <td
>0111001</td
>
588 <td align=
"right
">58</td
>
591 <td
>;
</td
>
592 <td
>0111011</td
>
593 <td align=
"right
">59</td
>
598 <p
>Det opprinnelige ASCII-tegnsettet ble også omtalt som ASCII-
7 og
599 brukte
7 bits (
0 og
1) for å representere symboler. Datamaskiner er
600 ofte konfigurert til å jobbe med enheter der bits er gruppert som
4
601 eller
8 bits . Det lå en mulighet i å ta i bruk bit åtte. En slik
602 endring ville gjøre det mulig for datamaskiner å øke antall symboler
603 de kunne representere, noe som ga en økning fra
128 forskjellige
604 symboler til
256 forskjellige symboler. Det ble åpnet for å innlemme
605 de nordiske bokstavene sammen med ASCII, og dette ble etter hvert
606 standardisert som ISO-
8859-
1. Tabell
2 viser deler av ISO-
8859-
1 som
607 støtter de norske bokstavene.
</p
>
609 <p
>Det sier seg selv at muligheten til å representere inntil
256 symboler
610 ikke holder når vi snakker om en global verden, og det ble gjort et
611 standardiseringsløp som tok utgangspunkt i ASCII-
7 med en utvidelse
612 til å bruke den åttende biten for ulike språkgrupper. Denne standarden
613 heter ISO-
8859 og er inndelt i opptil
16 varianter, altså fra
614 ISO-
8859-
1 til ISO-
8859-
16.
</p
>
616 <table align=
"center
" width=
"50%
">
618 <caption
>Tabell
2. Koding av de norske symbolene slik de er definert i
619 ISO-
8859-
1 tegnsettet.
</caption
>
623 <th
>Grafisk
</th
>
624 <th
>Binær
</th
>
625 <th
>Desimal
</th
>
628 <td
>Æ
</td
>
629 <td
>11000110</td
>
630 <td align=
"right
">198</td
>
633 <td
>Ø
</td
>
634 <td
>11011000</td
>
635 <td align=
"right
">216</td
>
638 <td
>Å
</td
>
639 <td
>11000101</td
>
640 <td align=
"right
">197</td
>
643 <td
>æ
</td
>
644 <td
>11100110</td
>
645 <td align=
"right
">230</td
>
648 <td
>ø
</td
>
649 <td
>11111000</td
>
650 <td align=
"right
">248</td
>
653 <td
>å
</td
>
654 <td
>11100101</td
>
655 <td align=
"right
">229</td
>
660 <p
>Norske tegn er definert i ISO-
8859-
1, som også omtales som Latin
1, de
661 fleste samiske tegn er definert i ISO-
8859-
4 (Latin
4) mens tilgang
662 til €-symbolet kom med ISO-
8859-
15 (Latin
9). ISO-
8859-
15 er en
663 revisjon av ISO-
8859-
1 som fjerner noen lite brukte symboler og
664 erstatter bokstaver som er mer brukt, og introduserer €-symbolet. Det
665 er viktig å merke at alle ISO-
8859-variantene har overlapp med
666 ASCII-
7, noe som ga samvirke med de engelskspråklige landene som ikke
667 trengte å gjøre noe. Det innebærer også at de første
128 verdiene i
668 ISO-
8859-variantene representerer de samme symbolene. Det er først når
669 du kommer til tolkningen av de resterende
128 verdiene med nummer
128
670 til
255, at det oppsto tolkningsutfordringer mellom
671 ISO-
8859-variantene.
</p
>
673 <p
>ISO-
8859-verdenen fungerte godt så lenge tegnsettet som ble brukt når
674 innhold ble skapt, også ble brukt når innhold ble gjengitt og du ikke
675 trengte å kombinere innhold fra forskjellige tegnsett i samme
676 dokument. Utfordringen med bruken av ISO-
8859-variantene ble raskt
677 tydelig i en mer globalisert verden med utveksling av tekst på tvers
678 av landegrenser der tekstlig innhold i dokumenter, e-poster og
679 websider kunne bli skrevet med ett tegnsett og gjengitt med et annet
682 <table align=
"center
" width=
"60%
">
684 <caption
>Tabell
3. Viser tolkning av verdiene som er tilegnet de
685 norske symbolene i ISO-
8859-
1 i de andre ISO
8859-variatene. Merk
686 ISO-
8859-
12 ikke finnes da arbeidet ble avsluttet.
<sup
>[
<a id=
"tegnsett_access_footnoteref_1
" href=
"#tegnsett_access_footnotedef_1
" title=
"View footnote.
">1</a
>]
</sup
></caption
>
690 <th
>Binærverdi
</th
>
691 <th
>1</th
>
692 <th
>2</th
>
693 <th
>3</th
>
694 <th
>4</th
>
695 <th
>5</th
>
696 <th
>6</th
>
697 <th
>7</th
>
698 <th
>8</th
>
699 <th
>9</th
>
700 <th
>10</th
>
701 <th
>11</th
>
702 <th
>13</th
>
703 <th
>14</th
>
704 <th
>15</th
>
705 <th
>16</th
>
708 <td
>11000110</td
>
709 <td
>Æ
</td
>
710 <td
>Ć
</td
>
711 <td
>Ĉ
</td
>
712 <td
>Æ
</td
>
713 <td
>Ц
</td
>
714 <td
>ئ
</td
>
715 <td
>Ζ
</td
>
716 <td
></td
>
717 <td
>Æ
</td
>
718 <td
>Æ
</td
>
719 <td
>ฦ
</td
>
720 <td
>Ę
</td
>
721 <td
>Æ
</td
>
722 <td
>Æ
</td
>
723 <td
>Æ
</td
>
726 <td
>11011000</td
>
727 <td
>Ø
</td
>
728 <td
>Ř
</td
>
729 <td
>Ĝ
</td
>
730 <td
>Ø
</td
>
731 <td
>и
</td
>
732 <td
>ظ
</td
>
733 <td
>Ψ
</td
>
734 <td
></td
>
735 <td
>Ø
</td
>
736 <td
>Ø
</td
>
737 <td
>ุ
</td
>
738 <td
>Ų
</td
>
739 <td
>Ø
</td
>
740 <td
>Ø
</td
>
741 <td
>Ű
</td
>
744 <td
>11000101</td
>
745 <td
>Å
</td
>
746 <td
>Ĺ
</td
>
747 <td
>Ċ
</td
>
748 <td
>Å
</td
>
749 <td
>Х
</td
>
750 <td
>إ
</td
>
751 <td
>Ε
</td
>
752 <td
></td
>
753 <td
>Å
</td
>
754 <td
>Å
</td
>
755 <td
>ล
</td
>
756 <td
>Å
</td
>
757 <td
>Å
</td
>
758 <td
>Å
</td
>
759 <td
>Ć
</td
>
762 <td
>11100110</td
>
763 <td
>æ
</td
>
764 <td
>ć
</td
>
765 <td
>ĉ
</td
>
766 <td
>æ
</td
>
767 <td
>ц
</td
>
768 <td
>ن
</td
>
769 <td
>ζ
</td
>
770 <td
>ז
</td
>
771 <td
>æ
</td
>
772 <td
>æ
</td
>
773 <td
>ๆ
</td
>
774 <td
>ę
</td
>
775 <td
>æ
</td
>
776 <td
>æ
</td
>
777 <td
>v
</td
>
780 <td
>11111000</td
>
781 <td
>ø
</td
>
782 <td
>ř
</td
>
783 <td
>ĝ
</td
>
784 <td
>ø
</td
>
785 <td
>ј
</td
>
786 <td
></td
>
787 <td
>ψ
</td
>
788 <td
>ר
</td
>
789 <td
>ø
</td
>
790 <td
>ø
</td
>
791 <td
>๘
</td
>
792 <td
>ų
</td
>
793 <td
>ø
</td
>
794 <td
>ø
</td
>
795 <td
>ű
</td
>
798 <td
>11100101</td
>
799 <td
>å
</td
>
800 <td
>ĺ
</td
>
801 <td
>ċ
</td
>
802 <td
>å
</td
>
803 <td
>х
</td
>
804 <td
>م
</td
>
805 <td
>ε
</td
>
806 <td
>ו
</td
>
807 <td
>å
</td
>
808 <td
>å
</td
>
809 <td
>ๅ
</td
>
810 <td
>å
</td
>
811 <td
>å
</td
>
812 <td
>å
</td
>
813 <td
>ć
</td
>
818 <p
>Denne problemstillingen er illustrert i tabell
3, der vi ser verdiene
819 tilegnet de norske symbolene i ISO-
8859-
1 i kolonne «
1». I de øvrige
820 kolonnene ser vi hvilket symbol verdien får i de andre
821 ISO-
8859-variantene. Tar vi utgangspunkt i tabell
3, kan vi se at
822 ordet lærlingspørsmål gjengitt med ISO-
8859-
2 (kolonne
2) blir
823 lćrlingspřrsmĺl, mens det blir lζrlingspψrsmεl med ISO-
8859-
7
824 (kolonne
7). Med ISO-
8859-
2 blir «æ» til «ć», «ø» til «ř» og «å» til
825 «ĺ». I ISO-
8859-
7 blir «æ» til «ζ», «ø» til «ψ», mens «å» blir «ε».
</p
>
827 <p
>Det er egentlig ingen utfordring med dette så lenge du vet hvilket
828 tegnsett innholdet ditt er representert med, og det ikke har skjedd
829 omforminger som du ikke er klar over. Det er det siste som er
830 problematisk, spesielt de datasystemene som har vært i bruk de siste
831 20 årene, som ikke har noe innebygd funksjonalitet for å forvalte
832 tegnsettproblematikken. Et godt eksempel på dette er
833 Microsoft-tegnsettet Windows-
1252, som ble forvekslet som
100 %
834 kompatibel med ISO-
8859-
1, men hadde byttet ut plassene fra
127 til
835 159. Historisk vil det finnes en del variasjon i hvilket tegnsett som
836 har vært i bruk, og hvor vellykket konvertering mellom tegnsett har
839 <p id=
"tegnsett_access_unicode_som_løsning
"><strong
>Unicode som løsning
</strong
></p
>
841 <p
>Tegnsettforvirring ble etter hvert et irritasjonsmoment og
842 samvirkeproblem. Ofte fikk man en e-post der æøå var erstattet av rare
843 symboler fordi e-posten hadde vært innom et eller annet datasystem som
844 ikke brukte samme tegnsett.
</p
>
846 <p
>For å løse dette samvirkeproblemet for tegnsett ble det startet et
847 arbeid og en ny standard så dagens lys etter hvert. Denne standarden
848 fikk navnet Unicode (ISO/ IEC
10646) og skulle resultere i et tegnsett
849 som alle skulle være enige om. Unicode er et repertoar og en
850 representasjon, dvs. navngivning og tilordning av tallverdi til alle
851 symboler i bruk i verden i dag. Oppføringer i Unicode skrives gjerne
852 U+XXXX der XXXX er tallkoden i sekstentallssystemet som oppføringen
853 har i Unicode-katalogen. Her finner vi tegn brukt av både levende og
854 døde språk, konstruerte språk, tekniske symboler, morsomme tegninger
855 (såkalte emojier) og tegn ingen vet hva betyr eller skal brukes
856 til. Et morsomt eksempel er i nettartikkelen: U+
237C ⍼ RIGHT ANGLE
857 WITH DOWNWARDS ZIGZAG ARROW, av Jonathan Chan.
<sup
>[
<a id=
"tegnsett_access_footnoteref_2
" href=
"#tegnsett_access_footnotedef_2
" title=
"View footnote.
">2</a
>]
</sup
></p
>
859 <p
>Sammen med Unicode kom det tre måter å kode disse tallene på; UTF-
8,
860 UTF-
16 og UTF-
32. Av datatekniske årsaker er UTF-
8 mye brukt, spesielt
861 når det gjelder utveksling av tekst over Internett, mens UTF-
16 er
862 brukt en del til tekstfiler lagret på Windows. En utfordring med
863 Unicode og UTF-variantene er at disse gir flere måter å kode samme
864 symbol på med en kombinasjonsmekanisme. Dette kan gi utfordringer ved
865 søk, hvis en skal søke etter et ord som har ett eller flere symboler
866 som kan skrives på ulikt vis, så er det ikke sikkert at søkesystemet
867 vil finne alle forekomster. For eksempel kan bokstaven U+
00F8 «Latin
868 Small Letter O with Stroke» kodes som den tradisjonelle norske tegnet
869 ø, men også som o kombinert med skråstrek U+
0338. Begge deler er
870 gyldig bruk av Unicode, selv om det er tradisjon for å foretrekke å
871 «normalisere» kombinasjoner som enkelttegn der det er mulig, nettopp
872 for å forenkle søk.
</p
>
874 <p id=
"tegnsett_access_bare_unicode_fremover
"><strong
>Bare Unicode fremover
</strong
></p
>
876 <p
>Forvaltningens bruk av tegnsett er regulert i Forskrift om
877 IT-standarder i offentlig forvaltning
<sup
>[
<a id=
"tegnsett_access_footnoteref_3
" href=
"#tegnsett_access_footnotedef_3
" title=
"View footnote.
">3</a
>]
</sup
>. Her står det: «Ved all
878 utveksling av informasjon mellom forvaltningsorganer og fra
879 forvaltningsorgan til innbyggere og næringsliv skal tegnsettstandarden
880 ISO/IEC
10646 representert ved UTF8 benyttes.» Det er forskjellige
881 bruksområder til UTF-
8, UTF-
16 og UTF-
32, men UTF-
8 er kodingen vi
882 kjenner mest til. Det er flere grunner at UTF-
8 «vant» konkurransen
883 til å bli den utvalgte. Den kanskje viktigste er at UTF-
8 er fullt
884 samvirkende med ASCII-
7, slik at den engelskspråklige delen av verden
885 kunne rulle ut UTF-
8 uten å merke noe forskjell. En tekstfil med kun
886 ASCII-tekst vil være identisk på disken hvis den lagres som UTF-
8 og
887 ASCII. UTF-
16 og UTF-
32 byr på noen optimaliseringer som gjør dem
888 relevant for spesifikke problemområder, men for det meste vil vi aldri
889 oppleve disse standardene på nært hold i hverdagen. Det er uansett kun
890 bruken av UTF-
8 som er lovregulert i Norge.
</p
>
892 <p
>Det er ikke slik at hele verden bruker ISO/IEC
10646 og UTF-
8. Kina
893 har egne standarder for tegnsett, mye brukt er GB
18030, som er
894 Unicode med en annen koding enn UTF-
8, mens Taiwan og andre asiatiske
895 land gjerne bruker Big5 eller andre tegnsett.
</p
>
897 <p
>UTF-
8 er dominerende i Norge, men det er tidsperioder der forskjellige
898 datasystemer utvekslet data i henhold til ISO-
8859-
1, ISO-
8859-
15,
899 Windows-
1252, Codepage
865 og ISO-
646-
60 / Codepage
1016 mens
900 overgangen til UTF-
8 pågikk. Det er ikke slik at et datasystem enkelt
901 kan tvinges til å bruke et tegnsett, da det er flere lag i et
902 datasystem som må settes opp til å bruke riktig tegnsett, og
903 tegnsettproblemet fort oppstår når det er et eller annet i
904 datasystemet som bruker feil tegnsett.
</p
>
906 <p
>Et klassisk eksempel på problemet er en utveksling av tekst mellom to
907 systemer der teksten i utgangspunktet er kodet i UTF-
8, men går
908 gjennom noe som er ISO-
8859-
1 underveis. Dette kan vises med at ordet
909 «på» i et slik scenario ender opp som «pÃ¥». Det er mulig å spore
910 dette tilbake til verdiene symbolene er tilordnet i tegnsettene. «på»
911 blir til «pÃ¥» fordi «å» i UTF-
8 er representert med U+C3AF, og dersom
912 vi ser på hva disse verdiene representerer, ser vi at
913 sekstentallssystemverdien C3 er
1100 0011 i totallssystemet og
914 symbolet med dette tallet i ISO-
8859-
1 er Ã.
</p
>
916 <p
>Vi ser det samme med sekstentallssystemverdien A5, som er
1010 0101 i
917 totallssystemet, og tilsvarende symbol i ISO-
8859-
1 er ¥. Slik
918 mojibake kan lett skje hvis «på» i utgangspunktet var representert med
919 UTF-
8, men ble behandlet med et system som bruker ISO-
8859-
1. Det er
920 ingen automatikk i å fange opp slike ødeleggelser mens tekstlig
921 innhold utveksles mellom datasystemer.
</p
>
923 <p
>En utfordring for depotarkivene er at bruken av tegnsett ikke alltid
924 har vært regulert, og at det kan finnes flere dokumentasjonssamlinger
925 som er opprettet med varierende tegnsett før gjeldende forskrift
926 inntraff – uten at det er mulig å avlede fra filene hvilket tegnsett
927 som ble brukt. Et eksempel på dette er €-symbolet, som kom først etter
928 at ISO-
8859-
1 var tatt i bruk. Det kan bli en utfordring for et
929 depotarkiv, men så lenge det er kjent hvilket tegnsett var i bruk, så
930 bør det gå bra. Riksarkivarens
931 forskrift
<sup
>[
<a id=
"tegnsett_access_footnoteref_4
" href=
"#tegnsett_access_footnotedef_4
" title=
"View footnote.
">4</a
>]
</sup
>
932 formaliserer dette ved å kreve følgende:
</p
>
935 <p
>§
5-
11. Tegnsett i arkivuttrekk
</p
>
938 <li
>Arkivuttrekk og medfølgende struktur- og innholdsbeskrivelser skal
939 overføres som ren tekst i ukryptert form, og benytte godkjent
942 <li
>Godkjente tegnsett er:
944 <li
>Unicode UTF-
8<br
>
945 (ISO/IEC
10646-
1:
2000 Annex D)
</li
>
946 <li
>ISO
8859-
1:
1998, Latin
1</li
>
947 <li
>ISO
8859-
4:
1998, Latin
4 for samiske tegn.
</li
>
948 </ol
></li
>
950 <li
>Andre tegnsett aksepteres bare etter avtale med Arkivverket.
</li
>
954 <p id=
"tegnsett_access_ditt_ansvar
"><strong
>Ditt ansvar
</strong
></p
>
956 <p
>På mange måter burde ikke tegnsett være et problem i
2023, men sånn er
957 det nok ikke. Land som har oppgradert til UTF-
8 som primærtegnsett for
958 utveksling av tekstlig innhold, begrenser problematikken betraktelig,
959 men globalt sett så er tegnsettutfordringen ikke løst fordi ikke alle
960 er enige om å bruke samme tegnsett. Det kan være geopolitiske eller
961 kulturelle hensyn som ligger til grunn for dette.
</p
>
963 <p
>Det er uansett verdt å merke at selv om bruken av UTF-
8 skulle bli
964 100% utbredt, så er det et historisk perspektiv (ASCII-
7,
965 ISO-
8859-variantene, UTF-
8) her som gjør tegnsett til et problemområde
966 arkivarene må forstå og håndtere. Som danningsarkivar har du et
967 ansvar for å vite hvilket tegnsett systemene og databasene dere
968 forvalter, er i samsvar med. Det er noe IT-avdelingen din eller
969 programvareleverandørene enkelt skal kunne svare på, og svaret skal
970 være UTF-
8 for alle nye systemer.
</p
>
974 <p id=
"tegnsett_access_footnotedef_1
"><a href=
"#tegnsett_access_footnoteref_1
">1</a
>. Tegnsettkilde
<a href=
"https://en.wikipedia.org/wiki/ISO/IEC_8859
">https://en.wikipedia.org/wiki/ISO/IEC_8859
</a
></p
>
976 <p id=
"tegnsett_access_footnotedef_2
"><a href=
"#tegnsett_access_footnoteref_2
">2</a
>.
<a href=
"https://ionathan.ch/
2022/
04/
09/angzarr.html
">https://ionathan.ch/
2022/
04/
09/angzarr.html
</a
></p
>
978 <p id=
"tegnsett_access_footnotedef_3
"><a href=
"#tegnsett_access_footnoteref_3
">3</a
>.
<a href=
"https://lovdata.no/dokument/SF/forskrift/
2013-
04-
05-
959/%C2%A78#%C2%A78
">https://lovdata.no/dokument/SF/forskrift/
2013-
04-
05-
959/%C2%A78#%C2%A78
</a
></p
>
980 <p id=
"tegnsett_access_footnotedef_4
"><a href=
"#tegnsett_access_footnoteref_4
">4</a
>.
<a href=
"https://lovdata.no/forskrift/
2017-
12-
19-
2286/§
5-
11">https://lovdata.no/forskrift/
2017-
12-
19-
2286/§
5-
11</a
></p
>
984 <p
>For øvrig burde varsleren Edward Snowden få politisk asyl i Norge.
</p
>
986 <p
>Som vanlig, hvis du bruker Bitcoin og ønsker å vise din støtte til
987 det jeg driver med, setter jeg pris på om du sender Bitcoin-donasjoner
989 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>. Merk,
990 betaling med bitcoin er ikke anonymt. :)
</p
>
995 <title>New and improved sqlcipher in Debian for accessing Signal database
</title>
996 <link>https://people.skolelinux.org/pere/blog/New_and_improved_sqlcipher_in_Debian_for_accessing_Signal_database.html
</link>
997 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/New_and_improved_sqlcipher_in_Debian_for_accessing_Signal_database.html
</guid>
998 <pubDate>Sun,
12 Nov
2023 12:
00:
00 +
0100</pubDate>
999 <description><p
>For a while now I wanted to have direct access to the
1000 <a href=
"https://signal.org/
">Signal
</a
> database of messages and
1001 channels of my Desktop edition of Signal. I prefer the enforced end
1002 to end encryption of Signal these days for my communication with
1003 friends and family, to increase the level of safety and privacy as
1004 well as raising the cost of the mass surveillance government and
1005 non-government entities practice these days. In August I came across
1007 <a href=
"https://www.yoranbrondsema.com/post/the-guide-to-extracting-statistics-from-your-signal-conversations/
">recipe
1008 on how to use sqlcipher to extract statistics from the Signal
1009 database
</a
> explaining how to do this. Unfortunately this did not
1010 work with the version of sqlcipher in Debian. The
1011 <a href=
"http://tracker.debian.org/sqlcipher/
">sqlcipher
</a
>
1012 package is a
"fork
" of the sqlite package with added support for
1013 encrypted databases. Sadly the current Debian maintainer
1014 <a href=
"https://bugs.debian.org/
961598">announced more than three
1015 years ago that he did not have time to maintain sqlcipher
</a
>, so it
1016 seemed unlikely to be upgraded by the maintainer. I was reluctant to
1017 take on the job myself, as I have very limited experience maintaining
1018 shared libraries in Debian. After waiting and hoping for a few
1019 months, I gave up the last week, and set out to update the package. In
1020 the process I orphaned it to make it more obvious for the next person
1021 looking at it that the package need proper maintenance.
</p
>
1023 <p
>The version in Debian was around five years old, and quite a lot of
1024 changes had taken place upstream into the Debian maintenance git
1025 repository. After spending a few days importing the new upstream
1026 versions, realising that upstream did not care much for SONAME
1027 versioning as I saw library symbols being both added and removed with
1028 minor version number changes to the project, I concluded that I had to
1029 do a SONAME bump of the library package to avoid surprising the
1030 reverse dependencies. I even added a simple
1031 autopkgtest script to ensure the package work as intended. Dug deep
1032 into the hole of learning shared library maintenance, I set out a few
1033 days ago to upload the new version to Debian experimental to see what
1034 the quality assurance framework in Debian had to say about the result.
1035 The feedback told me the pacakge was not too shabby, and yesterday I
1036 uploaded the latest version to Debian unstable. It should enter
1037 testing today or tomorrow, perhaps delayed by
1038 <a href=
"https://bugs.debian.org/
1055812">a small library
1039 transition
</a
>.
</p
>
1041 <p
>Armed with a new version of sqlcipher, I can now have a look at the
1042 SQL database in ~/.config/Signal/sql/db.sqlite. First, one need to
1043 fetch the encryption key from the Signal configuration using this
1044 simple JSON extraction command:
</p
>
1046 <pre
>/usr/bin/jq -r
'.
"key
"' ~/.config/Signal/config.json
</pre
>
1048 <p
>Assuming the result from that command is
'secretkey
', which is a
1049 hexadecimal number representing the key used to encrypt the database.
1050 Next, one can now connect to the database and inject the encryption
1051 key for access via SQL to fetch information from the database. Here
1052 is an example dumping the database structure:
</p
>
1055 % sqlcipher ~/.config/Signal/sql/db.sqlite
1056 sqlite
> PRAGMA key =
"x
'secretkey
'";
1058 CREATE TABLE sqlite_stat1(tbl,idx,stat);
1059 CREATE TABLE conversations(
1060 id STRING PRIMARY KEY ASC,
1068 , profileFamilyName TEXT, profileFullName TEXT, e164 TEXT, serviceId TEXT, groupId TEXT, profileLastFetchedAt INTEGER);
1069 CREATE TABLE identityKeys(
1070 id STRING PRIMARY KEY ASC,
1074 id STRING PRIMARY KEY ASC,
1077 CREATE TABLE sessions(
1078 id TEXT PRIMARY KEY,
1079 conversationId TEXT,
1081 , ourServiceId STRING, serviceId STRING);
1082 CREATE TABLE attachment_downloads(
1083 id STRING primary key,
1088 CREATE TABLE sticker_packs(
1089 id TEXT PRIMARY KEY,
1093 coverStickerId INTEGER,
1095 downloadAttempts INTEGER,
1096 installedAt INTEGER,
1099 stickerCount INTEGER,
1101 , attemptedStatus STRING, position INTEGER DEFAULT
0 NOT NULL, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync
1102 INTEGER DEFAULT
0 NOT NULL);
1103 CREATE TABLE stickers(
1104 id INTEGER NOT NULL,
1105 packId TEXT NOT NULL,
1109 isCoverOnly INTEGER,
1114 PRIMARY KEY (id, packId),
1115 CONSTRAINT stickers_fk
1116 FOREIGN KEY (packId)
1117 REFERENCES sticker_packs(id)
1120 CREATE TABLE sticker_references(
1123 CONSTRAINT sticker_references_fk
1125 REFERENCES sticker_packs(id)
1128 CREATE TABLE emojis(
1129 shortName TEXT PRIMARY KEY,
1132 CREATE TABLE messages(
1133 rowid INTEGER PRIMARY KEY ASC,
1139 schemaVersion INTEGER,
1140 conversationId STRING,
1141 received_at INTEGER,
1143 hasAttachments INTEGER,
1144 hasFileAttachments INTEGER,
1145 hasVisualMediaAttachments INTEGER,
1146 expireTimer INTEGER,
1147 expirationStartTimestamp INTEGER,
1150 messageTimer INTEGER,
1151 messageTimerStart INTEGER,
1152 messageTimerExpiresAt INTEGER,
1155 sourceServiceId TEXT, serverGuid STRING NULL, sourceDevice INTEGER, storyId STRING, isStory INTEGER
1156 GENERATED ALWAYS AS (type IS
'story
'), isChangeCreatedByUs INTEGER NOT NULL DEFAULT
0, isTimerChangeFromSync INTEGER
1157 GENERATED ALWAYS AS (
1158 json_extract(json,
'$.expirationTimerUpdate.fromSync
') IS
1
1159 ), seenStatus NUMBER default
0, storyDistributionListId STRING, expiresAt INT
1162 expirationStartTimestamp + (expireTimer *
1000),
1164 )), shouldAffectActivity INTEGER
1165 GENERATED ALWAYS AS (
1169 'change-number-notification
',
1170 'contact-removed-notification
',
1171 'conversation-merge
',
1172 'group-v1-migration
',
1173 'keychange
',
1174 'message-history-unsynced
',
1175 'profile-change
',
1177 'universal-timer-notification
',
1178 'verified-change
'
1180 ), shouldAffectPreview INTEGER
1181 GENERATED ALWAYS AS (
1185 'change-number-notification
',
1186 'contact-removed-notification
',
1187 'conversation-merge
',
1188 'group-v1-migration
',
1189 'keychange
',
1190 'message-history-unsynced
',
1191 'profile-change
',
1193 'universal-timer-notification
',
1194 'verified-change
'
1196 ), isUserInitiatedMessage INTEGER
1197 GENERATED ALWAYS AS (
1201 'change-number-notification
',
1202 'contact-removed-notification
',
1203 'conversation-merge
',
1204 'group-v1-migration
',
1205 'group-v2-change
',
1206 'keychange
',
1207 'message-history-unsynced
',
1208 'profile-change
',
1210 'universal-timer-notification
',
1211 'verified-change
'
1213 ), mentionsMe INTEGER NOT NULL DEFAULT
0, isGroupLeaveEvent INTEGER
1214 GENERATED ALWAYS AS (
1215 type IS
'group-v2-change
' AND
1216 json_array_length(json_extract(json,
'$.groupV2Change.details
')) IS
1 AND
1217 json_extract(json,
'$.groupV2Change.details[
0].type
') IS
'member-remove
' AND
1218 json_extract(json,
'$.groupV2Change.from
') IS NOT NULL AND
1219 json_extract(json,
'$.groupV2Change.from
') IS json_extract(json,
'$.groupV2Change.details[
0].aci
')
1220 ), isGroupLeaveEventFromOther INTEGER
1221 GENERATED ALWAYS AS (
1222 isGroupLeaveEvent IS
1
1224 isChangeCreatedByUs IS
0
1226 GENERATED ALWAYS AS (
1227 json_extract(json,
'$.callId
')
1229 CREATE TABLE sqlite_stat4(tbl,idx,neq,nlt,ndlt,sample);
1231 id TEXT PRIMARY KEY,
1232 queueType TEXT STRING NOT NULL,
1233 timestamp INTEGER NOT NULL,
1236 CREATE TABLE reactions(
1237 conversationId STRING,
1240 messageReceivedAt INTEGER,
1241 targetAuthorAci STRING,
1242 targetTimestamp INTEGER,
1244 , messageId STRING);
1245 CREATE TABLE senderKeys(
1246 id TEXT PRIMARY KEY NOT NULL,
1247 senderId TEXT NOT NULL,
1248 distributionId TEXT NOT NULL,
1250 lastUpdatedDate NUMBER NOT NULL
1252 CREATE TABLE unprocessed(
1253 id STRING PRIMARY KEY ASC,
1260 serverTimestamp INTEGER,
1261 sourceServiceId STRING
1262 , serverGuid STRING NULL, sourceDevice INTEGER, receivedAtCounter INTEGER, urgent INTEGER, story INTEGER);
1263 CREATE TABLE sendLogPayloads(
1264 id INTEGER PRIMARY KEY ASC,
1266 timestamp INTEGER NOT NULL,
1267 contentHint INTEGER NOT NULL,
1269 , urgent INTEGER, hasPniSignatureMessage INTEGER DEFAULT
0 NOT NULL);
1270 CREATE TABLE sendLogRecipients(
1271 payloadId INTEGER NOT NULL,
1273 recipientServiceId STRING NOT NULL,
1274 deviceId INTEGER NOT NULL,
1276 PRIMARY KEY (payloadId, recipientServiceId, deviceId),
1278 CONSTRAINT sendLogRecipientsForeignKey
1279 FOREIGN KEY (payloadId)
1280 REFERENCES sendLogPayloads(id)
1283 CREATE TABLE sendLogMessageIds(
1284 payloadId INTEGER NOT NULL,
1286 messageId STRING NOT NULL,
1288 PRIMARY KEY (payloadId, messageId),
1290 CONSTRAINT sendLogMessageIdsForeignKey
1291 FOREIGN KEY (payloadId)
1292 REFERENCES sendLogPayloads(id)
1295 CREATE TABLE preKeys(
1296 id STRING PRIMARY KEY ASC,
1298 , ourServiceId NUMBER
1299 GENERATED ALWAYS AS (json_extract(json,
'$.ourServiceId
')));
1300 CREATE TABLE signedPreKeys(
1301 id STRING PRIMARY KEY ASC,
1303 , ourServiceId NUMBER
1304 GENERATED ALWAYS AS (json_extract(json,
'$.ourServiceId
')));
1305 CREATE TABLE badges(
1306 id TEXT PRIMARY KEY,
1307 category TEXT NOT NULL,
1309 descriptionTemplate TEXT NOT NULL
1311 CREATE TABLE badgeImageFiles(
1312 badgeId TEXT REFERENCES badges(id)
1315 'order
' INTEGER NOT NULL,
1320 CREATE TABLE storyReads (
1321 authorId STRING NOT NULL,
1322 conversationId STRING NOT NULL,
1323 storyId STRING NOT NULL,
1324 storyReadDate NUMBER NOT NULL,
1326 PRIMARY KEY (authorId, storyId)
1328 CREATE TABLE storyDistributions(
1329 id STRING PRIMARY KEY NOT NULL,
1332 senderKeyInfoJson STRING
1333 , deletedAtTimestamp INTEGER, allowsReplies INTEGER, isBlockList INTEGER, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync INTEGER);
1334 CREATE TABLE storyDistributionMembers(
1335 listId STRING NOT NULL REFERENCES storyDistributions(id)
1338 serviceId STRING NOT NULL,
1340 PRIMARY KEY (listId, serviceId)
1342 CREATE TABLE uninstalled_sticker_packs (
1343 id STRING NOT NULL PRIMARY KEY,
1344 uninstalledAt NUMBER NOT NULL,
1346 storageVersion NUMBER,
1347 storageUnknownFields BLOB,
1348 storageNeedsSync INTEGER NOT NULL
1350 CREATE TABLE groupCallRingCancellations(
1351 ringId INTEGER PRIMARY KEY,
1352 createdAt INTEGER NOT NULL
1354 CREATE TABLE IF NOT EXISTS
'messages_fts_data
'(id INTEGER PRIMARY KEY, block BLOB);
1355 CREATE TABLE IF NOT EXISTS
'messages_fts_idx
'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
1356 CREATE TABLE IF NOT EXISTS
'messages_fts_content
'(id INTEGER PRIMARY KEY, c0);
1357 CREATE TABLE IF NOT EXISTS
'messages_fts_docsize
'(id INTEGER PRIMARY KEY, sz BLOB);
1358 CREATE TABLE IF NOT EXISTS
'messages_fts_config
'(k PRIMARY KEY, v) WITHOUT ROWID;
1359 CREATE TABLE edited_messages(
1360 messageId STRING REFERENCES messages(id)
1364 , conversationId STRING);
1365 CREATE TABLE mentions (
1366 messageId REFERENCES messages(id) ON DELETE CASCADE,
1371 CREATE TABLE kyberPreKeys(
1372 id STRING PRIMARY KEY NOT NULL,
1373 json TEXT NOT NULL, ourServiceId NUMBER
1374 GENERATED ALWAYS AS (json_extract(json,
'$.ourServiceId
')));
1375 CREATE TABLE callsHistory (
1376 callId TEXT PRIMARY KEY,
1377 peerId TEXT NOT NULL, -- conversation id (legacy) | uuid | groupId | roomId
1378 ringerId TEXT DEFAULT NULL, -- ringer uuid
1379 mode TEXT NOT NULL, -- enum
"Direct
" |
"Group
"
1380 type TEXT NOT NULL, -- enum
"Audio
" |
"Video
" |
"Group
"
1381 direction TEXT NOT NULL, -- enum
"Incoming
" |
"Outgoing
1382 -- Direct: enum
"Pending
" |
"Missed
" |
"Accepted
" |
"Deleted
"
1383 -- Group: enum
"GenericGroupCall
" |
"OutgoingRing
" |
"Ringing
" |
"Joined
" |
"Missed
" |
"Declined
" |
"Accepted
" |
"Deleted
"
1384 status TEXT NOT NULL,
1385 timestamp INTEGER NOT NULL,
1386 UNIQUE (callId, peerId) ON CONFLICT FAIL
1388 [ dropped all indexes to save space in this blog post ]
1389 CREATE TRIGGER messages_on_view_once_update AFTER UPDATE ON messages
1391 new.body IS NOT NULL AND new.isViewOnce =
1
1393 DELETE FROM messages_fts WHERE rowid = old.rowid;
1395 CREATE TRIGGER messages_on_insert AFTER INSERT ON messages
1396 WHEN new.isViewOnce IS NOT
1 AND new.storyId IS NULL
1398 INSERT INTO messages_fts
1401 (new.rowid, new.body);
1403 CREATE TRIGGER messages_on_delete AFTER DELETE ON messages BEGIN
1404 DELETE FROM messages_fts WHERE rowid = old.rowid;
1405 DELETE FROM sendLogPayloads WHERE id IN (
1406 SELECT payloadId FROM sendLogMessageIds
1407 WHERE messageId = old.id
1409 DELETE FROM reactions WHERE rowid IN (
1410 SELECT rowid FROM reactions
1411 WHERE messageId = old.id
1413 DELETE FROM storyReads WHERE storyId = old.storyId;
1415 CREATE VIRTUAL TABLE messages_fts USING fts5(
1417 tokenize =
'signal_tokenizer
'
1419 CREATE TRIGGER messages_on_update AFTER UPDATE ON messages
1421 (new.body IS NULL OR old.body IS NOT new.body) AND
1422 new.isViewOnce IS NOT
1 AND new.storyId IS NULL
1424 DELETE FROM messages_fts WHERE rowid = old.rowid;
1425 INSERT INTO messages_fts
1428 (new.rowid, new.body);
1430 CREATE TRIGGER messages_on_insert_insert_mentions AFTER INSERT ON messages
1432 INSERT INTO mentions (messageId, mentionAci, start, length)
1434 SELECT messages.id, bodyRanges.value -
>> 'mentionAci
' as mentionAci,
1435 bodyRanges.value -
>> 'start
' as start,
1436 bodyRanges.value -
>> 'length
' as length
1437 FROM messages, json_each(messages.json -
>> 'bodyRanges
') as bodyRanges
1438 WHERE bodyRanges.value -
>> 'mentionAci
' IS NOT NULL
1440 AND messages.id = new.id;
1442 CREATE TRIGGER messages_on_update_update_mentions AFTER UPDATE ON messages
1444 DELETE FROM mentions WHERE messageId = new.id;
1445 INSERT INTO mentions (messageId, mentionAci, start, length)
1447 SELECT messages.id, bodyRanges.value -
>> 'mentionAci
' as mentionAci,
1448 bodyRanges.value -
>> 'start
' as start,
1449 bodyRanges.value -
>> 'length
' as length
1450 FROM messages, json_each(messages.json -
>> 'bodyRanges
') as bodyRanges
1451 WHERE bodyRanges.value -
>> 'mentionAci
' IS NOT NULL
1453 AND messages.id = new.id;
1458 <p
>Finally I have the tool needed to inspect and process Signal
1459 messages that I need, without using the vendor provided client. Now
1460 on to transforming it to a more useful format.
</p
>
1462 <p
>As usual, if you use Bitcoin and want to show your support of my
1463 activities, please send Bitcoin donations to my address
1464 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>
1469 <title>New chrpath release
0.17</title>
1470 <link>https://people.skolelinux.org/pere/blog/New_chrpath_release_0_17.html
</link>
1471 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/New_chrpath_release_0_17.html
</guid>
1472 <pubDate>Fri,
10 Nov
2023 07:
30:
00 +
0100</pubDate>
1473 <description><p
>The chrpath package provide a simple command line tool to remove or
1474 modify the rpath or runpath of compiled ELF program. It is almost
10
1475 years since I updated the code base, but I stumbled over the tool
1476 today, and decided it was time to move the code base from Subversion
1477 to git and find a new home for it, as the previous one (Debian Alioth)
1478 has been shut down. I decided to go with
1479 <a href=
"https://codeberg.org/
">Codeberg
</a
> this time, as it is my git
1480 service of choice these days, did a quick and dirty migration to git
1481 and updated the code with a few patches I found in the Debian bug
1482 tracker. These are the release notes:
</p
>
1484 <p
>New in
0.17 released
2023-
11-
10:
</p
>
1487 <li
>Moved project to Codeberg, as Alioth is shut down.
</li
>
1488 <li
>Add Solaris support (use
&lt;sys/byteorder.h
> instead of
&lt;byteswap.h
>).
1489 Patch from Rainer Orth.
</li
>
1490 <li
>Added missing newline from printf() line. Patch from Frank Dana.
</li
>
1491 <li
>Corrected handling of multiple ELF sections. Patch from Frank Dana.
</li
>
1492 <li
>Updated build rules for .deb. Partly based on patch from djcj.
</li
>
1495 <p
>The latest edition is tagged and available from
1496 <a href=
"https://codeberg.org/pere/chrpath
">https://codeberg.org/pere/chrpath
</a
>.
1498 <p
>As usual, if you use Bitcoin and want to show your support of my
1499 activities, please send Bitcoin donations to my address
1500 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>
1505 <title>Test framework for DocBook processors / formatters
</title>
1506 <link>https://people.skolelinux.org/pere/blog/Test_framework_for_DocBook_processors___formatters.html
</link>
1507 <guid isPermaLink=
"true">https://people.skolelinux.org/pere/blog/Test_framework_for_DocBook_processors___formatters.html
</guid>
1508 <pubDate>Sun,
5 Nov
2023 13:
00:
00 +
0100</pubDate>
1509 <description><p
>All the books I have published so far has been using
1510 <a href=
"https://docbook.org/
">DocBook
</a
> somewhere in the process.
1511 For the first book, the source format was DocBook, while for every
1512 later book it was an intermediate format used as the stepping stone to
1513 be able to present the same manuscript in several formats, on paper,
1514 as ebook in ePub format, as a HTML page and as a PDF file either for
1515 paper production or for Internet consumption. This is made possible
1516 with a wide variety of free software tools with DocBook support in
1517 Debian. The source format of later books have been docx via rst,
1518 Markdown, Filemaker and Asciidoc, and for all of these I was able to
1519 generate a suitable DocBook file for further processing using
1520 <a href=
"https://tracker.debian.org/pkg/pandoc
">pandoc
</a
>,
1521 <a href=
"https://tracker.debian.org/pkg/asciidoc
">a2x
</a
> and
1522 <a href=
"https://tracker.debian.org/pkg/asciidoctor
">asciidoctor
</a
>,
1523 as well as rendering using
1524 <a href=
"https://tracker.debian.org/pkg/xmlto
">xmlto
</a
>,
1525 <a href=
"https://tracker.debian.org/pkg/dbtoepub
">dbtoepub
</a
>,
1526 <a href=
"https://tracker.debian.org/pkg/dblatex
">dblatex
</a
>,
1527 <a href=
"https://tracker.debian.org/pkg/docbook-xsl
">docbook-xsl
</a
> and
1528 <a href=
"https://tracker.debian.org/pkg/fop
">fop
</a
>.
</p
>
1530 <p
>Most of the
<a href=
"http://www.hungry.com/~pere/publisher/
">books I
1531 have published
</a
> are translated books, with English as the source
1532 language. The use of
1533 <a href=
"https://tracker.debian.org/pkg/po4a
">po4a
</a
> to
1534 handle translations using the gettext PO format has been a blessing,
1535 but publishing translated books had triggered the need to ensure the
1536 DocBook tools handle relevant languages correctly. For every new
1537 language I have published, I had to submit patches dblatex, dbtoepub
1538 and docbook-xsl fixing incorrect language and country specific issues
1539 in the framework themselves. Typically this has been missing keywords
1540 like
'figure
' or sort ordering of index entries. After a while it
1541 became tiresome to only discover issues like this by accident, and I
1542 decided to write a DocBook
"test framework
" exercising various
1543 features of DocBook and allowing me to see all features exercised for
1544 a given language. It consist of a set of DocBook files, a version
4
1545 book, a version
5 book, a v4 book set, a v4 selection of problematic
1546 tables, one v4 testing sidefloat and finally one v4 testing a book of
1547 articles. The DocBook files are accompanied with a set of build rules
1548 for building PDF using dblatex and docbook-xsl/fop, HTML using xmlto
1549 or docbook-xsl and epub using dbtoepub. The result is a set of files
1550 visualizing footnotes, indexes, table of content list, figures,
1551 formulas and other DocBook features, allowing for a quick review on
1552 the completeness of the given locale settings. To build with a
1553 different language setting, all one need to do is edit the lang= value
1554 in the .xml file to pick a different ISO
639 code value and run
1555 'make
'.
</p
>
1557 <p
>The
<a href=
"https://codeberg.org/pere/docbook-example/
">test framework
1558 source code
</a
> is available from Codeberg, and a generated set of
1559 presentations of the various examples is available as Codeberg static
1561 <a href=
"https://pere.codeberg.page/docbook-example/
">https://pere.codeberg.page/docbook-example/
</a
>.
1562 Using this test framework I have been able to discover and report
1563 several bugs and missing features in various tools, and got a lot of
1564 them fixed. For example I got Northern Sami keywords added to both
1565 docbook-xsl and dblatex, fixed several typos in Norwegian bokmål and
1566 Norwegian Nynorsk, support for non-ascii title IDs added to pandoc,
1567 Norwegian index sorting support fixed in xindy and initial Norwegian
1568 Bokmål support added to dblatex. Some issues still remains, though.
1569 Default index sorting rules are still broken in several tools, so the
1570 Norwegian letters æ, ø and å are more often than not sorted properly
1571 in the book index.
</p
>
1573 <p
>The test framework recently received some more polish, as part of
1574 publishing my latest book. This book contained a lot of fairly
1575 complex tables, which exposed bugs in some of the tools. This made me
1576 add a new test file with various tables, as well as spend some time to
1577 brush up the build rules. My goal is for the test framework to
1578 exercise all DocBook features to make it easier to see which features
1579 work with different processors, and hopefully get them all to support
1580 the full set of DocBook features. Feel free to send patches to extend
1581 the test set, and test it with your favorite DocBook processor.
1582 Please visit these two URLs to learn more:
</p
>
1585 <li
><a href=
"https://codeberg.org/pere/docbook-example/
">https://codeberg.org/pere/docbook-example/
</a
></li
>
1586 <li
><a href=
"https://pere.codeberg.page/docbook-example/
">https://pere.codeberg.page/docbook-example/
</a
></li
>
1589 <p
>If you want to learn more on Docbook and translations, I recommend
1590 having a look at the
<a href=
"https://docbook.org/
">the DocBook
1592 <a href=
"https://doccookbook.sourceforge.net/html/en/
">the DoCookBook
1593 site
<a/
> and my earlier blog post on
1594 <a href=
"https://people.skolelinux.org/pere/blog/From_English_wiki_to_translated_PDF_and_epub_via_Docbook.html
">how
1595 the Skolelinux project process and translate documentation
</a
>, a talk I gave earlier this year on
1596 <a href=
"https://www.nuug.no/aktiviteter/
20230314-oversetting-og-publisering-av-b%c3%b8ker-med-fri-programvare/
">how
1597 to translate and publish books using free software
</a
> (Norwegian
1602 https://github.com/docbook/xslt10-stylesheets/issues/
205 (docbook-xsl: sme support)
1603 https://bugs.debian.org/
968437 (xindy: index sorting rules for nb/nn)
1604 https://bugs.debian.org/
856123 (pandoc: markdown to docbook with non-english titles)
1605 https://bugs.debian.org/
864813 (dblatex: missing nb words)
1606 https://bugs.debian.org/
756386 (dblatex: index sorting rules for nb/nn)
1607 https://bugs.debian.org/
796871 (dbtoepub: index sorting rules for nb/nn)
1608 https://bugs.debian.org/
792616 (dblatex: PDF metadata)
1609 https://bugs.debian.org/
686908 (docbook-xsl: index sorting rules for nb/nn)
1610 https://sourceforge.net/tracker/?func=detail
&atid=
373747&aid=
3556630&group_id=
21935 (docbook-xsl: nb/nn support)
1611 https://bugs.debian.org/
684391 (dblatex: initial nb support)
1615 <p
>As usual, if you use Bitcoin and want to show your support of my
1616 activities, please send Bitcoin donations to my address
1617 <b
><a href=
"bitcoin:
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b
</a
></b
>.
</p
>