4 Network Working Group F. Yergeau
5 Internet Draft G. Nicol
6 <draft-ietf-html-i18n-04.txt> G. Adams
7 Expires 2 December 1996 M. Duerst
11 Internationalization of the Hypertext Markup Language
16 This document is an Internet-Draft. Internet-Drafts are working doc-
17 uments of the Internet Engineering Task Force (IETF), its areas, and
18 its working groups. Note that other groups may also distribute work-
19 ing documents as Internet-Drafts.
21 Internet-Drafts are draft documents valid for a maximum of six
22 months. Internet-Drafts may be updated, replaced, or obsoleted by
23 other documents at any time. It is not appropriate to use Internet-
24 Drafts as reference material or to cite them other than as a "working
25 draft" or "work in progress".
27 To learn the current status of any Internet-Draft, please check the
28 1id-abstracts.txt listing contained in the Internet-Drafts Shadow
29 Directories on ds.internic.net (US East Coast), nic.nordu.net
30 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
33 Distribution of this document is unlimited. Please send comments to
34 the HTML working group (HTML-WG) of the Internet Engineering Task
35 Force (IETF) at <html-wg@w3.org>. Subscription address is <html-wg-
36 request@w3.org>. Discussions of the group are archived at
37 <URL:http://www.acl.lanl.gov/HTML_WG/archives.html>.
42 The Hypertext Markup Language (HTML) is a simple markup language used
43 to create hypertext documents that are platform independent. Ini-
44 tially, the application of HTML on the World Wide Web was seriously
45 restricted by its reliance on the ISO-8859-1 coded character set,
46 which is appropriate only for Western European languages. Despite
47 this restriction, HTML has been widely used with other languages,
48 using other coded character sets or character encodings, at the
49 expense of interoperability.
51 This document is meant to address the issue of the
55 Expires 2 December 1996 [Page 1]
57 Internet Draft HTML internationalization 27 May 1996
60 internationalization (i18n, i followed by 18 letters followed by n)
61 of HTML by extending the specification of HTML and giving additional
62 recommendations for proper internationalization support. A foremost
63 consideration is to make sure that HTML remains a valid application
64 of SGML, while enabling its use in all languages of the world.
69 1. Introduction .................................................. 2
70 1.1. Scope ...................................................... 3
71 1.2. Conformance ................................................ 3
72 2. The document character set ..................................... 4
73 2.1. Reference processing model ................................. 4
74 2.2. The document character set ................................. 6
75 2.3. Undisplayable characters ................................... 8
76 3. The LANG attribute.............................................. 8
77 4. Additional entities, attributes and elements ................... 9
78 4.1. Full Latin-1 entity set .................................... 9
79 4.2. Markup for language-dependent presentation ................. 9
80 5. Forms ..........................................................15
81 5.1. DTD additions ..............................................15
82 5.2. Form submission ............................................15
83 6. Miscellaneous ..................................................17
84 7. HTML public text ...............................................18
85 7.1. HTML DTD ...................................................18
86 7.2. SGML declaration for HTML ..................................34
87 7.3. ISO Latin 1 character entity set ...........................35
88 Bibliography ......................................................38
89 Authors' Addresses ................................................40
94 The Hypertext Markup Language (HTML) is a simple markup language used
95 to create hypertext documents that are platform independent. Ini-
96 tially, the application of HTML on the World Wide Web was seriously
97 restricted by its reliance on the ISO-8859-1 coded character set,
98 which is appropriate only for Western European languages. Despite
99 this restriction, HTML has been widely used with other languages,
100 using other coded character sets or character encodings, through var-
101 ious ad hoc extensions to the language [TAKADA].
103 This document is meant to address the issue of the internationaliza-
104 tion of HTML by extending the specification of HTML and giving addi-
105 tional recommendations for proper internationalization support. It
106 is in good part based on a paper by one of the authors on multilin-
107 gualism on the WWW [NICOL]. A foremost consideration is to make sure
111 Expires 2 December 1996 [Page 2]
113 Internet Draft HTML internationalization 27 May 1996
116 that HTML remains a valid application of SGML, while enabling its use
117 in all languages of the world.
119 The specific issues addressed are the SGML document character set to
120 be used for HTML, the proper treatment of the charset parameter asso-
121 ciated with the "text/html" content type and the specification of
122 some additional elements and entities.
127 HTML has been in use by the World-Wide Web (WWW) global information
128 initiative since 1990. This specification extends the capabilities
129 of HTML 2.0 (RFC 1866), primarily by removing the restriction to the
130 ISO-8859-1 coded character set [ISO-8859-1].
132 HTML is an application of ISO Standard 8879:1986, Information Pro-
133 cessing Text and Office Systems -- Standard Generalized Markup Lan-
134 guage (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a
135 formal definition of the HTML syntax in terms of SGML. This specifi-
136 cation amends the DTD of HTML in order to make it applicable to docu-
137 ments encompassing a character repertoire much larger than that of
138 ISO-8859-1, while still remaining SGML conformant.
140 Both formal and actual development of HTML are advancing very fast.
141 The features described in this document are designed so that they can
142 (and should) be added to other forms of HTML besides that described
143 in RFC 1866. Where indicated, attributes introduced here should be
144 extended to the appropriate elements.
149 This specification changes slightly the conformance requirements of
150 HTML documents and HTML user agents.
154 All HTML 2.0 conforming documents remain conforming with this speci-
155 fication. However, the extensions introduced here make valid cer-
156 tains documents that would not be HTML 2.0 conforming, in particular
157 those containing characters or character references outside of the
158 repertoire of ISO 8859-1, and those containing markup introduced
167 Expires 2 December 1996 [Page 3]
169 Internet Draft HTML internationalization 27 May 1996
174 In addition to the requirements of RFC 1866, the following require-
175 ments are placed on HTML user agents.
177 To ensure interoperability and proper support for at least
178 ISO-8859-1 in an environment where character encoding schemes
179 other than ISO-8859-1 are present, user agents must correctly
180 interpret the charset parameter accompanying an HTML document
181 received from the network.
183 Furthermore, conforming user-agents are required to at least parse
184 correctly all numeric character references within the range of ISO
187 Conforming user-agents are required to apply the BIDI presentation
188 algorithm if they display right-to-left characters. If there is
189 no displayable right-to-left character in a document, there is no
190 need to apply BIDI processing.
192 2. The document character set
194 2.1. Reference processing model
196 This overview explains a reference processing model used for HTML,
197 and in particular the SGML concept of a document character set. An
198 actual implementation may widely differ in its internal workings from
199 the model given below, but should behave as described to an outside
202 Because there are various widely differing encodings of text, SGML
203 does not directly address the question of how characters are encoded
204 e.g. in a file. SGML views the characters as a single set (called a
205 "character repertoire"), and a "code set" that assigns an integer
206 number (known as "character number") to each character in the reper-
207 toire. The document character set declaration defines what each of
208 the character numbers represents [GOLD90, p. 451]. In most cases, an
209 SGML DTD and all documents that refer to it have a single document
210 character set, and all markup and data characters are part of this
213 HTML, as an application of SGML, does not directly address the ques-
214 tion of how characters are encoded as octets in external representa-
215 tions such as files. This is deferred to mechanisms external to HTML,
216 such as MIME as used by the HTTP protocol or by electronic mail.
218 For the HTTP protocol [RFC1945], the way characters are encoded is
223 Expires 2 December 1996 [Page 4]
225 Internet Draft HTML internationalization 27 May 1996
228 defined by the "charset" parameter[1] of the "Content-Type" field of
229 the header of an HTTP response. For example, to indicate that the
230 transmitted document is encoded in the "JIS" encoding of Japanese
231 [RFC1468], the header will contain the following line:
233 Content-Type: text/html; charset=ISO-2022-JP
235 The HTTP protocol also defines a mechanism for the client to specify
236 the character encodings it can accept. Clients and servers are
237 strongly requested to use these mechanisms to assure correct trans-
238 mission and interpretation of any document. Provisions that can be
239 taken to help correct interpretation, even in cases where a server or
240 client do not yet use these mechanisms, are described in section 6.
242 Similarly, if HTML documents are transferred by electronic mail, the
243 character encoding is defined by the "charset" parameter of the "Con-
244 tent-Type" MIME header line [RFC1521], and defaults to US-ASCII in
247 In the case any other way of transferring and storing HTML documents
248 are defined or become popular, it is advised that similar provisions
249 be made to clearly identify the character encoding used and/or to use
250 a single/default encoding capable of representing the widest range of
251 characters used in an international context.
253 Whatever the external character encoding may be, the reference pro-
254 cessing model translates it to a representation of the document char-
255 acter set specified in Section 2.2 before processing specific to
256 SGML/HTML. The reference processing model can be depicted as fol-
259 [resource]->[decoder]->[entity ]->[ SGML ]->[application]->[display]
265 The decoder is responsible for decoding the external representation
266 of the resource to a representation using the document character set.
267 The entity manager, the parser, and the application deal only with
268 characters of the document character set. A display-oriented part of
269 the application or the display machinery itself may again convert
271 1 The term "charset" in MIME is used to designate a char-
272 acter encoding, rather than a coded character set as the
273 term may suggest. A character encoding is a mapping (possi-
274 bly many-to-one) of a sequence of octets to a sequence of
275 characters taken from one or more character repertoires.
279 Expires 2 December 1996 [Page 5]
281 Internet Draft HTML internationalization 27 May 1996
284 characters represented in the document character set to some other
285 representation more suitable for their purpose. In any case, the
286 entity manager, the parser, and the application, as far as character
287 semantics are concerned, are using the HTML document character set
290 An actual implementation may choose, or not, to translate the docu-
291 ment into some encoding of the document character set as described
292 above; the behaviour described by this reference processing model can
293 be achieved otherwise. This subject is well out of the scope of this
294 specification, however, and the reader is invited to consult the SGML
295 standard [ISO-8879] or an SGML handbook [BRYAN88] [GOLD90] [VANH90]
296 [SQ91] for further information.
298 The most important consequence of this reference processing model is
299 that numeric character references are always resolved with respect to
300 the fixed document character set, and thus to the same characters,
301 whatever the external encoding actually used. For an example, see
304 2.2. The document character set
306 The document character set, in the SGML sense, is the Universal Char-
307 acter Set (UCS) of ISO 10646:1993 [ISO-10646], as amended. Cur-
308 rently, this is code-by-code identical with the Unicode standard,
309 version 1.1 [UNICODE].
311 NOTE -- implementers should be aware that ISO 10646 is
312 amended from time to time; 4 amendments have been adopted
313 since the initial 1993 publication, none of which signifi-
314 cantly affects this specification. A fifth amendment, now
315 under consideration, will introduce incompatible changes to
316 the standard: 6556 Korean Hangul syllables allocated
317 between code positions 3400 and 4DFF (hexadecimal) will be
318 moved to new positions (and 4516 new syllables added), thus
319 making references to the old positions invalid. Since the
320 Unicode consortium has already adopted a corresponding
321 amendment for inclusion in the forthcoming Unicode 2.0,
322 adoption of DAM 5 is considered likely and implementers
323 should probably consider the old code positions as already
324 invalid. Despite this one-time change, the relevant stan-
325 dard bodies appear to remain committed not to change any
326 allocated code position in the future. To encode Korean
327 Hangul irrespective of these changes, the combining Hangul
328 Jamo in the range 1110-11F9 can be used.
330 The adoption of this document character set implies a change in the
331 SGML declaration specified in the HTML 2.0 specification (section 9.5
335 Expires 2 December 1996 [Page 6]
337 Internet Draft HTML internationalization 27 May 1996
340 of [RFC1866]). The change amounts to removing the first BASESET
341 specification and its accompanying DESCSET declaration, replacing
342 them with the following declaration:
344 BASESET "ISO Registration Number 177//CHARSET
345 ISO/IEC 10646-1:1993 UCS-4 with implementation level 3
357 Making the UCS the document character set does not create non-
358 conformance of any expression, construct or document that is conform-
359 ing to HTML 2.0. It does make conforming certain constructs that are
360 not admissible in HTML 2.0. One consequence is that data characters
361 outside the repertoire of ISO-8859-1, but within that of UCS-4 become
362 valid SGML characters. Another is that the upper limit of the range
363 of numeric character references is extended from 255 to 2147483645;
364 thus, И is a valid reference to a "CYRILLIC CAPITAL LETTER I".
365 [ERCS] is a good source of information on Unicode and SGML, although
366 its scope and technical content differ greatly from this specifica-
369 NOTE -- the above SGML declaration, like that of HTML 2.0,
370 specifies the character numbers 128 to 159 (80 to 9F hex)
371 as UNUSED. This means that numeric character references
372 within that range (e.g. ’) are illegal in HTML. Nei-
373 ther ISO 8859-1 nor ISO 10646 contain characters in that
374 range, which is reserved for control characters.
376 ISO 10646-1:1993 is the most encompassing character set currently
377 existing, and there is no other character set that could take its
378 place as the document character set for HTML. If nevertheless for a
379 specific application there is a need to use characters outside this
380 standard, this should be done by avoiding any conflicts with present
381 or future versions of ISO 10646, i.e. by assigning these characters
382 to a private zone. Also, it should be borne in mind that such a use
383 will be highly unportable; in many cases, it may be better to use
391 Expires 2 December 1996 [Page 7]
393 Internet Draft HTML internationalization 27 May 1996
396 2.3. Undisplayable characters
398 With the document character set being the full ISO 10646, the possi-
399 bility that a character cannot be displayed due to lack of appropri-
400 ate resources (fonts) cannot be avoided. Because there are many dif-
401 ferent things that can be done in such a case, this document does not
402 prescribe any specific behaviour. Depending on the implementation,
403 this may also be handled by the underlaying display system and not
404 the application itself. The following considerations, however, may
407 - A clearly visible, but unobtrusive behaviour should be preferred.
408 Some documents may contain many characters that cannot be renden-
409 dered, and so showing an alert for each of them is not the right
412 - In case a numeric representation of the missing character is
413 given, its hexadecimal (not decimal) form is to be preferred,
414 because this form is used in character set standards [ERCS].
416 3. The LANG attribute
418 Language tags can be used to control rendering of a marked up docu-
419 ment in various ways: glyph disambiguation, in cases where the char-
420 acter encoding is not sufficient to resolve to a specific glyph; quo-
421 tation marks; hyphenation; ligatures; spacing; voice synthesis; etc.
422 Independently of rendering issues, language markup is useful as con-
423 tent markup for purposes such as classification and searching.
425 Since any text can logically be assigned a language, almost all HTML
426 elements admit the LANG attribute. The DTD reflects this. It is
427 also intended that any new element introduced in later versions of
428 HTML will admit the LANG attribute, unless there is a good reason not
431 The language attribute, LANG, takes as its value a language tag that
432 identifies a natural language spoken, written, or otherwise conveyed
433 by human beings for communication of information to other human
434 beings. Computer languages are explicitly excluded.
436 The syntax and registry of HTML language tags is the same as that
437 defined by RFC 1766 [RFC1766]. In summary, a language tag is composed
438 of one or more parts: A primary language tag and a possibly empty
441 language-tag = primary-tag *( "-" subtag )
442 primary-tag = 1*8ALPHA
447 Expires 2 December 1996 [Page 8]
449 Internet Draft HTML internationalization 27 May 1996
452 Whitespace is not allowed within the tag and all tags are case-
453 insensitive. The namespace of language tags is administered by the
454 IANA. Example tags include:
456 en, en-US, en-cockney, i-cherokee, x-pig-latin
458 In the context of HTML, a language tag is not to be interpreted as a
459 single token, as per RFC 1766, but as a hierarchy. For example, a
460 user agent that adjusts rendering according to language should con-
461 sider that it has a match when a language tag in a style sheet entry
462 matches the initial portion of the language tag of an element. An
463 exact match should be preferred. This interpretation allows an ele-
464 ment marked up as, for instance, "en-US" to trigger styles corre-
465 sponding to, in order of preference, US-English ("en-US") or 'plain'
466 or 'international' English ("en").
468 NOTE -- using the language tag as a hierarchy does not
469 imply that all languages with a common prefix will be
470 understood by those fluent in one or more of those lan-
471 guages; it simply allows the user to request this commonal-
472 ity when it is true for that user.
474 The rendering of elements may be affected by the LANG attribute. For
475 any element, the value of the LANG attribute overrides the value
476 specified by the LANG attribute of any enclosing element and the
477 value (if any) of the HTTP Content-Language header. If none of these
478 are set, a suitable default, perhaps controlled by user preferences,
479 by automatic context analysis or by the user's locale, should be used
480 to control rendering.
482 4. Additional entities, attributes and elements
484 4.1. Full Latin-1 entity set
486 According to the suggestion of section 14 of [RFC1866], the set of
487 Latin-1 entities is extended to cover the whole right part of
488 ISO-8859-1 (all code positions with the high-order bit set), includ-
489 ing the already commonly used , © and ®. The names of
490 the entities are taken from the appendices of SGML [ISO-8879]. A
491 list is provided in section 7.3 of this specification.
493 4.2. Markup for language-dependent presentation
498 For the correct presentation of text in certain languages (irrespec-
499 tive of formatting issues), some support in the form of additional
503 Expires 2 December 1996 [Page 9]
505 Internet Draft HTML internationalization 27 May 1996
508 entities and elements is needed.
510 In particular, the following features are dealt with:
512 - Markup of bidirectional text, i.e. text where left-to-right and
513 right-to-left scripts are mixed.
515 - Control of cursive joining behaviour in contexts where the default
516 behaviour is not appropriate.
518 - Language-dependent rendering of short (in-line) quotations.
520 - Better justification control for languages where this is impor-
523 - Superscripts and subscripts for languages where they appear as
524 part of general text.
526 Some of the above features need very little additional support; oth-
527 ers need more. The additional features are introduced below with
528 brief comments only. Explanations on cursive joining behaviour and
529 bidirectional text follow later. For cursive joining behaviour and
530 bidirectional text, this document follows [UNICODE] in that: i) char-
531 acter semantics, where applicable, are identical to [UNICODE], and
532 ii) where functionality is moved to HTML as a higher level protocol,
533 this is done in a way that allows straightforward conversion to the
534 lower-level mechanisms defined in [UNICODE].
537 4.2.2. List of entities, elements, and attributes
539 First, a generic container is needed to carry the LANG and DIR (see
540 below) attributes in cases where no other element is appropriate; the
541 SPAN element is introduced for that purpose.
543 A set of named character entities is added for use with bidirectional
544 rendering and cursive joining control:
546 <!ENTITY zwnj CDATA "‌"--=zero width non-joiner-->
547 <!ENTITY zwj CDATA "‍"--=zero width joiner-->
548 <!ENTITY lrm CDATA "‎"--=left-to-right mark-->
549 <!ENTITY rlm CDATA "‏"--=right-to-left mark-->
551 These entities can be used in place of the corresponding formatting
552 characters whenever convenient, for example to ease keyboard entry or
553 when a formatting character is not available in the character encod-
559 Expires 2 December 1996 [Page 10]
561 Internet Draft HTML internationalization 27 May 1996
564 Next, an attribute called DIR is introduced, restricted to the values
565 LTR (left-to-right) and RTL (right-to-left) and admitted by most ele-
566 ments, for the indication of directionality in the context of bidi-
567 rectional text (see 4.2.4 below for details). Since any text and
568 many other elements (e.g. tables) can logically be assigned a direc-
569 tionality, almost all HTML elements admit the DIR attribute. The DTD
570 reflects this. It is also intended that any new element introduced
571 in later versions of HTML will admit the DIR attribute, unless there
572 is a good reason not to do so.
574 A new element called BDO (BIDI Override) is introduced, which
575 requires the DIR attribute to specify whether the override is left-
576 to-right or right-to-left. This element is required for bidirec-
577 tional text control; for detailed explanations, see section 4.2.4.
579 The <Q> element is introduced to allow language-dependent rendering
580 of short quotations depending on language and platform capability.
581 As the following examples show, in particular the quotation marks
582 surrounding the quotation are affected: "a quotation in English",
583 `another, slightly better one', ,,a quotation in German'', << a quo-
584 tation in French >>. The contents of the <Q> element does not
585 include quotation marks, they have to be added by the rendering pro-
588 NOTE -- <Q> elements can be nested. Many languages use dif-
589 ferent quotation styles for outer and inner quotations, and
590 this should be respected by user-agents implementing this
593 Many languages require superscripts for proper rendering: as an exam-
594 ple, the French "Mlle Dupont" should have "lle" in superscript. The
595 <SUP> element, and its sibling <SUB>, are introduced to allow proper
596 markup of such text. <SUP> and <SUB> contents are restricted to
597 PCDATA to avoid nesting problems.
599 Finally, in many languages text justification is much more important
600 than it is in Western languages, and justifies markup. The ALIGN
601 attribute, admitting values of LEFT, RIGHT, CENTER and JUSTIFY, is
602 added to a selection of elements where it makes sense (block-like).
603 If a user-agent chooses to have LEFT as a default for blocks of left-
604 to-right directionality, it should use RIGHT for blocks of right-to-
607 In the DTD, the LANG and DIR attributes are grouped together in a
608 parameter entity called attrs. In addition, the ID and CLASS
609 attributes from RFC 1942 [RFC1942] were added to attrs, as was done
610 in the latter. The ID, and CLASS attributes are required for use with
611 style sheets, and RFC 1942 defines them as follows:
615 Expires 2 December 1996 [Page 11]
617 Internet Draft HTML internationalization 27 May 1996
620 ID Used to define a document-wide identifier. This can be used
621 for naming positions within documents as the destination of a
622 hypertext link. It may also be used by style sheets for ren-
623 dering an element in a unique style. An ID attribute value is
624 an SGML NAME token. NAME tokens are formed by an initial let-
625 ter followed by letters, digits, "-" and "." characters. The
626 letters are restricted to A-Z and a-z.
628 CLASS A space separated list of SGML NAME tokens. CLASS names spec-
629 ify that the element belongs to the corresponding named
630 classes. It allows authors to distinguish different roles
631 played by the same tag. The classes may be used by style
632 sheets to provide different renderings as appropriate to
635 4.2.3. Cursive joining behaviour
637 Markup is needed in some cases to force cursive joining behavior in
638 contexts in which it would not normally occur, or to block it when it
639 would normally occur.
641 The zero-width joiner and non-joiner (‍ and ‌) are used to
642 control cursive joining behaviour. For example, ARABIC LETTER HEH is
643 used in isolation to abbreviate "Hijri" (the Islamic calendrical sys-
644 tem); however, the initial form of the letter is desired, because the
645 isolated form of HEH looks like the digit five as employed in Arabic
646 script. This is obtained by following the HEH with a zero-width
647 joiner whose only effect is to provide context. In Persian texts,
648 there are cases where a letter that normally would join a subsequent
649 letter in a cursive connection does not. Here a zero-width non-
652 4.2.4. Bidirectional text
654 Many languages are written in horizontal lines from left to right,
655 while others are written from right to left. When both writing
656 directions are present, one talks of bidirectional text (BIDI for
657 short). BIDI text requires markup in special circumstances where
658 ambiguities as to the directionality of some characters have to be
659 resolved. This markup affects the ability to render BIDI text in a
660 semantically legible fashion. That is, without this special BIDI
661 markup, cases arise which would prevent *any* rendering whatsoever
662 that reflected the basic meaning of the text. Plain text may contain
663 this markup (joining or BIDI) in the form of special-purpose charac-
664 ters; in HTML, these are supplemented by SGML markup.
666 BIDI is a complex issue, and implementers are advised to consult
667 appropriate documentation such as [UNICODE]. Here, explanations are
671 Expires 2 December 1996 [Page 12]
673 Internet Draft HTML internationalization 27 May 1996
676 given only as far as they are needed to understand the necessity of
677 the features introduced and to define their exact semantics.
679 The Unicode BIDI algorithm is based on a logical sequence of text
680 characters and works mainly by reference to the implicit directional-
681 ity of characters (e.g. Hebrew and Arabic characters are specified to
682 be rendered from right to left, etc.).
684 The left-to-right and right-to-left marks (‎ and ‏) are used
685 to disambiguate directionality of neutral characters. For example,
686 when a double quote sits between an Arabic and a Latin letter, its
687 direction is ambiguous; if a directional mark is added on one side
688 such that the quotation mark is surrounded by characters of only one
689 directionality, the ambiguity is removed. These characters are like
690 zero width spaces which have a directional property (but no word/line
693 Nested embeddings of contra-directional text runs, due to nested quo-
694 tations or to the pasting of text from one BIDI context to another,
695 is also a case where the implicit directionality of characters is not
696 sufficient, requiring markup. Also, it is frequently desirable to
697 specify the basic directionality of a block of text. For these pur-
698 poses, the DIR attribute is used.
700 On block-type elements, the DIR attribute indicates the base direc-
701 tionality of the text in the block; if omitted it is inherited from
702 the parent element. The default directionality of the overall HTML
703 document is left-to-right.
705 On inline elements, it makes the element start a new embedding level
706 (to be explained below); if omitted the inline element does not start
707 a new embedding level.
709 NOTE -- the PRE, XMP and LISTING elements admit the DIR
710 attribute, indicating that the contents should not be con-
711 sidered as preformatted with respect to bidirectional lay-
712 out. The BIDI algorithm still needs to be applied to each
715 Following is an example of a case where embedding is needed, showing
718 Given the following latin (upper case) and arabic (lower
719 case) letters in backing store with the specified embed-
722 <SPAN DIR=LTR> AB <SPAN DIR=RTL> xy <SPAN DIR=LTR> CD
723 </SPAN> zw </SPAN> EF </SPAN>
727 Expires 2 December 1996 [Page 13]
729 Internet Draft HTML internationalization 27 May 1996
732 One gets the following rendering (with [] showing the
733 directional transitions):
735 [ AB [ wz [ CD ] yx ] EF ]
737 On the other hand, without this markup and with a base
738 direction of LTR one gets the following rendering:
740 [ AB [ yx ] CD [ wz ] EF ]
742 Notice that yx is on the left and wz on the right unlike
743 the above case where the embedding levels are used. With-
744 out the embedding markup one has at most two levels: a base
745 directional level and a single counterflow directional
748 The DIR attribute on inline elements is equivalent to the formatting
749 characters LEFT-TO-RIGHT EMBEDDING (202A) and RIGHT-TO-LEFT EMBED-
750 DING (202B) of ISO 10646. The end tag of the element is equivalent
751 to the POP DIRECTIONAL FORMATTING (202C) character.
753 Directional override, as provided by the <BDO> element, is needed to
754 deal with unusual short pieces of text in which directionality cannot
755 be resolved from context in an unambiguous fashion. For example, it
756 can be used to force left-to-right (or right-to-left) display of part
757 numbers composed of Latin letters, digits and Hebrew letters.
759 The effect of <BDO> is to force the directionality of all characters
760 within it to the value of DIR, irrespective of their intrinsic direc-
761 tional properties. It is equivalent to using the LEFT-TO-RIGHT OVER-
762 RIDE (202D) or RIGHT-TO-LEFT OVERRIDE (202E) characters of ISO 10646,
763 the end tag again being equivalent to the POP DIRECTIONAL FORMATTING
766 NOTE -- authors and authoring software writers should be
767 aware that conflicts can arise if the DIR attribute is used
768 on inline elements (including <BDO>) concurrently with the
769 use of the corresponding ISO 10646 formatting characters.
770 Preferably one or the other should be used exclusively; the
771 markup method is better able to guarantee document struc-
772 tural integrity, and alleviates some problems when editing
773 bidirectional HTML text with a simple text editor, but some
774 software may be more apt at using the 10646 characters. If
775 both methods are used, great care should be exercised to
776 insure proper nesting of markup and directional embedding
777 or override; otherwise, rendering results are undefined.
783 Expires 2 December 1996 [Page 14]
785 Internet Draft HTML internationalization 27 May 1996
793 It is natural to expect input in any language in forms, as they pro-
794 vide one of the only ways of obtaining user input. While this is pri-
795 marily a UI issue, there are some things that should be specified at
796 the HTML level to guide behavior and promote interoperability.
798 To ensure full interoperability, it is necessary for the user agent
799 (and the user) to have an indication of the character encoding(s)
800 that the server providing a form will be able to handle upon submis-
801 sion of the filled-in form. Such an indication is provided by the
802 ACCEPT-CHARSET attribute of the INPUT and TEXTAREA elements, modeled
803 on the HTTP Accept-Charset header (see [HTTP-1.1]), which contains a
804 space and/or comma delimited list of character sets acceptable to the
805 server. A user agent may want to somehow advise the user of the con-
806 tents of this attribute, or to restrict his possibility to enter
807 characters outside the repertoires of the listed character sets.
809 NOTE -- The list of character sets is to be interpreted as
810 an EXCLUSIVE-OR list; the server announces that it is ready
811 to accept any ONE of these character encoding schemes for
812 each part of a multipart entity. The client may perform
813 character encoding translation to satisfy the server if
816 NOTE -- The default value for the ACCEPT-CHARSET attribute
817 of an INPUT or TEXTAREA element is the reserved value
818 "UNKNOWN". A user agent may interpret that value as the
819 character encoding scheme that was used to transmit the
820 document containing that element.
825 The HTML 2.0 form submission mechanism, based on the "application/x-
826 www-form-urlencoded" media type, is ill-equipped with regard to
827 internationalization. In fact, since URLs are restricted to ASCII
828 characters, the mechanism is akward even for ISO-8859-1 text. Sec-
829 tion 2.2 of [RFC1738] specifies that octets may be encoded using the
830 "%HH" notation, but text submitted from a form is composed of charac-
831 ters, not octets. Lacking a specification of a character encoding
832 scheme, the "%HH" notation has no well-defined meaning.
834 The best solution is to use the "multipart/form-data" media type
835 described in [RFC1867] with the POST method of form submission. This
839 Expires 2 December 1996 [Page 15]
841 Internet Draft HTML internationalization 27 May 1996
844 mechanism encapsulates the value part of each name-value pair in a
845 body-part of a multipart MIME body that is sent as the HTTP entity;
846 each body part can be labeled with an appropriate Content-Type,
847 including if necessary a charset parameter that specifies the charac-
848 ter encoding scheme. The changes to the DTD necessary to support
849 this method of form submission have been incorporated in the DTD
850 included in this specification.
852 A less satisfactory solution is to add a MIME charset parameter to
853 the "application/x-www-form-urlencoded" media type specifier sent
854 along with a POST method form submission, with the understanding that
855 the URL encoding of [RFC1738] is applied on top of the specified
856 character encoding, as a kind of implicit Content-Transfer-Encoding.
858 One problem with both solutions above is that current browsers do not
859 generally allow for bookmarks to specify the POST method; this should
860 be improved. Conversely, the GET method could be used with the form
861 data transmitted in the body instead of in the URL. Nothing in the
862 protocol seems to prevent it, but no implementations appear to exist
865 How the user agent determines the encoding of the text entered by the
866 user is outside the scope of this specification.
868 NOTE -- Designers of forms and their handling scripts
869 should be aware of an important caveat: when the default
870 value of a field (the VALUE attribute) is returned upon
871 form submission (i.e. the user did not modify this value),
872 it cannot be guaranteed to be transmitted as a sequence of
873 octets identical to that in the source document -- only as
874 a possibly different but valid encoding of the same
875 sequence of text elements. This may be true even if the
876 encoding of the document containing the form and that used
877 for submission are the same.
879 Differences can occur when a sequence of characters can be
880 represented by various sequences of octets, and also when a
881 composite sequence (a base character plus one or more com-
882 bining diacritics) can be represented by either a different
883 but equivalent composite sequence or by a fully precomposed
884 character. For instance, the UCS-2 sequence 00EA+0232
885 (LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT + COMBINING
886 DOT BELOW) may be transformed into 1EC7 (LATIN SMALL LETTER
887 E WITH CIRCUMFLEX ACCENT AND DOT BELOW), into
888 0065+0302+0323 (LATIN SMALL LETTER E + COMBINING CIRCUMFLEX
889 ACCENT + COMBINING DOT BELOW), as well as into other equiv-
890 alent composite sequences.
895 Expires 2 December 1996 [Page 16]
897 Internet Draft HTML internationalization 27 May 1996
902 Proper interpretation of a text document requires that the character
903 encoding scheme be known. Current HTTP servers, however, do not gen-
904 erally include an appropriate charset parameter with the Content-Type
905 header. This is bad behaviour[2], and as such strongly discouraged,
906 but some preventive measures can be taken to minimize the detrimental
909 In the case where a document is accessed from a hyperlink in an ori-
910 gin HTML document, a CHARSET attribute is added to the attribute list
911 of elements with link semantics (A and LINK), specifically by adding
912 it to the linkExtraAttributes entity. The value of that attribute is
913 to be considered a hint to the User Agent as to the character encod-
914 ing scheme used by the ressource pointed to by the hyperlink; it
915 should be the appropriate value of the MIME charset parameter for
918 In any document, it is possible to include an indication of the
919 encoding scheme like the following, as early as possible within the
920 HEAD of the document:
922 <META HTTP-EQUIV="Content-Type"
923 CONTENT="text/html; charset=ISO-2022-JP">
925 This is not foolproof, but will work if the encoding scheme is such
926 that ASCII characters stand for themselves at least until the META
927 element is parsed. Note that there are better ways for a server to
928 obtain character encoding information, instead of the unreliable
929 <META> above; see [NICOL2] for some details and a proposal.
931 For definiteness, the "charset" parameter received from the source of
932 the document should be considered the most authoritative, followed in
933 order of preference by the contents of a META element such as the
934 above, and finally the CHARSET parameter of the anchor that was fol-
937 When HTML text is transmitted directly in UCS-2 or UCS-4 form, the
938 question of byte order arises: does the high-order byte of each
939 multi-byte character come first or last? For definiteness, this
940 specification recommends that UCS-2 and UCS-4 be transmitted in big-
942 2 This bad behaviour is even encouraged by the continued
943 existence of browsers that declare an unrecognized media
944 type when they receive a charset parameter. User agent
945 implementators are strongly encouraged to make their soft-
946 ware tolerant of this parameter, even if they cannot take
951 Expires 2 December 1996 [Page 17]
953 Internet Draft HTML internationalization 27 May 1996
956 endian byte order (high order byte first), which corresponds to the
957 established network byte order for two- and four-byte quantities, to
958 the Unicode recommendation for serialized text data and to RFC 1641.
959 Furthermore, to maximize chances of proper interpretation, it is rec-
960 ommended that documents transmitted as UCS-2 or UCS-4 always begin
961 with a ZERO-WIDTH NON-BREAKING SPACE character (hexadecimal FEFF or
962 0000FEFF) which, when byte-reversed becomes number FFFE or FFFE0000,
963 a character guaranteed to be never assigned. Thus, a user-agent
964 receiving an FFFE as the first octets of a text would know that bytes
965 have to be reversed for the remainder of the text.
967 There exist so-called UCS Transformation Formats than can be used to
968 transmit UCS data, in addition to UCS-2 and UCS-4. UTF-7 [RFC1642]
969 and UTF-8 [UTF-8] have favorable properties (no byte-ordering prob-
970 lem, different flavours of ASCII compatibility) that make them worthy
971 of consideration, especially for transmission of multilingual text.
972 Another encoding scheme, MNEM [RFC1345], also has interesting proper-
973 ties and the capability to transmit the full UCS. The UTF-1 trans-
974 formation format of ISO 10646:1993 (registered by IANA as
975 ISO-10646-UTF-1), has been removed from ISO 10646 by amendment 4, and
978 The SOFT HYPHEN character (U+00AD) needs a little attention from
979 user-agent implementers. It is present in many character sets
980 (including the whole ISO 8859 series and, of course, ISO 10646), and
981 has semantics different from the plain HYPHEN. If not used for
982 hyphenation, the soft hyphen must be completely ignored. For exam-
983 ple, "rec­ord" should display as "record", should match a search
984 for "record", and should sort as "record". Non-observance of these
985 semantics effectively discourages its use on the World Wide Web, even
986 with software that does support it.
992 This section contains a DTD for HTML based on the HTML 2.0 DTD of RFC
993 1866, incorporating the changes for file upload as specified in RFC
994 1867, and the changes deriving from this document.
998 Document Type Definition for the HyperText Markup Language,
999 extended for internationalisation (HTML DTD)
1001 Last revised: 96/05/27
1003 Authors: Daniel W. Connolly <connolly@w3.org>
1007 Expires 2 December 1996 [Page 18]
1009 Internet Draft HTML internationalization 27 May 1996
1012 Francois Yergeau <yergeau@alis.com>
1013 See Also: html.decl, html-1.dtd
1014 http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
1017 <!ENTITY % HTML.Version
1018 "-//IETF//DTD HTML//EN"
1022 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
1030 <!--============ Feature Test Entities ========================-->
1032 <!ENTITY % HTML.Recommended "IGNORE"
1033 -- Certain features of the language are necessary for
1034 compatibility with widespread usage, but they may
1035 compromise the structural integrity of a document.
1036 This feature test entity enables a more prescriptive
1037 document type definition that eliminates
1041 <![ %HTML.Recommended [
1042 <!ENTITY % HTML.Deprecated "IGNORE">
1045 <!ENTITY % HTML.Deprecated "INCLUDE"
1046 -- Certain features of the language are necessary for
1047 compatibility with earlier versions of the specification,
1048 but they tend to be used and implemented inconsistently,
1049 and their use is deprecated. This feature test entity
1050 enables a document type definition that eliminates
1054 <!ENTITY % HTML.Highlighting "INCLUDE"
1055 -- Use this feature test entity to validate that a
1056 document uses no highlighting tags, which may be
1057 ignored on minimal implementations.
1063 Expires 2 December 1996 [Page 19]
1065 Internet Draft HTML internationalization 27 May 1996
1068 <!ENTITY % HTML.Forms "INCLUDE"
1069 -- Use this feature test entity to validate that a document
1070 contains no forms, which may not be supported in minimal
1074 <!--============== Imported Names ==============================-->
1076 <!ENTITY % Content-Type "CDATA"
1077 -- meaning an internet media type
1078 (aka MIME content type, as per RFC1521)
1081 <!ENTITY % HTTP-Method "GET | POST"
1082 -- as per HTTP specification, RFC1945
1085 <!--========= DTD "Macros" =====================-->
1087 <!ENTITY % heading "H1|H2|H3|H4|H5|H6">
1089 <!ENTITY % list " UL | OL | DIR | MENU " >
1091 <!ENTITY % attrs -- common attributes for elements --
1092 "LANG NAME #IMPLIED -- RFC 1766 language tag --
1093 DIR (ltr|rtl) #IMPLIED -- text directionnality --
1094 ID ID #IMPLIED -- element identifier (from RFC1942) --
1095 CLASS NAMES #IMPLIED -- for subclassing elements (from RFC1942) --">
1097 <!ENTITY % just -- an attribute for text justification --
1098 "ALIGN (left|right|center|justify) #IMPLIED"
1099 -- default is left for ltr paragraphs, right for rtl -- >
1101 <!--======= Character mnemonic entities =================-->
1103 <!ENTITY % ISOlat1 PUBLIC
1104 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
1107 <!ENTITY amp CDATA "&" -- ampersand -->
1108 <!ENTITY gt CDATA ">" -- greater than -->
1109 <!ENTITY lt CDATA "<" -- less than -->
1110 <!ENTITY quot CDATA """ -- double quote -->
1112 <!--Entities for language-dependent presentation (BIDI and contextual analysis) -->
1113 <!ENTITY zwnj CDATA "‌"-- zero width non-joiner-->
1114 <!ENTITY zwj CDATA "‍"-- zero width joiner-->
1115 <!ENTITY lrm CDATA "‎"-- left-to-right mark-->
1119 Expires 2 December 1996 [Page 20]
1121 Internet Draft HTML internationalization 27 May 1996
1124 <!ENTITY rlm CDATA "‏"-- right-to-left mark-->
1127 <!--========= SGML Document Access (SDA) Parameter Entities =====-->
1129 <!-- HTML contains SGML Document Access (SDA) fixed attributes
1130 in support of easy transformation to the International Committee
1131 for Accessible Document Design (ICADD) DTD
1132 "-//EC-USA-CDA/ICADD//DTD ICADD22//EN".
1133 ICADD applications are designed to support usable access to
1134 structured information by print-impaired individuals through
1135 Braille, large print and voice synthesis. For more information on
1137 - ISO 12083:1993, Annex A.8, Facilities for Braille,
1138 large print and computer voice
1140 <ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu>
1141 - Usenet news group bit.listserv.easi
1142 - Recording for the Blind, +1 800 221 4792
1145 <!ENTITY % SDAFORM "SDAFORM CDATA #FIXED"
1146 -- one to one mapping -->
1147 <!ENTITY % SDARULE "SDARULE CDATA #FIXED"
1148 -- context-sensitive mapping -->
1149 <!ENTITY % SDAPREF "SDAPREF CDATA #FIXED"
1150 -- generated text prefix -->
1151 <!ENTITY % SDASUFF "SDASUFF CDATA #FIXED"
1152 -- generated text suffix -->
1153 <!ENTITY % SDASUSP "SDASUSP NAME #FIXED"
1154 -- suspend transform process -->
1157 <!--========== Text Markup =====================-->
1159 <![ %HTML.Highlighting [
1161 <!ENTITY % font " TT | B | I ">
1163 <!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE ">
1165 <!ENTITY % text "#PCDATA|A|IMG|BR|%phrase|%font|SPAN|Q|BDO|SUP|SUB">
1167 <!ELEMENT (%font;|%phrase) - - (%text)*>
1168 <!ATTLIST ( TT | CODE | SAMP | KBD | VAR )
1175 Expires 2 December 1996 [Page 21]
1177 Internet Draft HTML internationalization 27 May 1996
1180 <!ATTLIST ( B | STRONG )
1184 <!ATTLIST ( I | EM | CITE )
1189 <!-- <TT> Typewriter text -->
1190 <!-- <B> Bold text -->
1191 <!-- <I> Italic text -->
1193 <!-- <EM> Emphasized phrase -->
1194 <!-- <STRONG> Strong emphasis -->
1195 <!-- <CODE> Source code phrase -->
1196 <!-- <SAMP> Sample text or characters -->
1197 <!-- <KBD> Keyboard phrase, e.g. user input -->
1198 <!-- <VAR> Variable phrase or substitutable -->
1199 <!-- <CITE> Name or title of cited work -->
1201 <!ENTITY % pre.content "#PCDATA|A|HR|BR|%font|%phrase|SPAN|BDO">
1205 <!ENTITY % text "#PCDATA|A|IMG|BR|SPAN|Q|BDO|SUP|SUB">
1207 <!ELEMENT BR - O EMPTY>
1212 <!-- <BR> Line break -->
1214 <!ELEMENT SPAN - - (%text)*>
1217 %SDAFORM; "other #Attlist"
1220 <!-- <SPAN> Generic inline container -->
1221 <!-- <SPAN DIR=...> New counterflow embedding -->
1222 <!-- <SPAN LANG="..."> Language of contents -->
1224 <!ELEMENT Q - - (%text)*>
1231 Expires 2 December 1996 [Page 22]
1233 Internet Draft HTML internationalization 27 May 1996
1239 <!-- <Q> Short quotation -->
1240 <!-- <Q LANG=xx> Language of quotation is xx -->
1241 <!-- <Q DIR=...> New conterflow embedding -->
1243 <!ELEMENT BDO - - (%text)+>
1246 DIR (ltr|rtl) #REQUIRED
1247 %SDAPREF "Bidi Override #Attval(DIR): "
1251 <!-- <BDO DIR=...> Override directionality of text to value of DIR -->
1252 <!-- <BDO LANG=...> Language of contents -->
1254 <!ELEMENT (SUP|SUB) - - (#PCDATA)>
1257 %SDAPREF "Superscript(#content)"
1261 %SDAPREF "Subscript(#content)"
1264 <!-- <SUP> Superscript -->
1265 <!-- <SUB> Subscript -->
1267 <!--========= Link Markup ======================-->
1269 <!ENTITY % linkType "NAMES">
1271 <!ENTITY % linkExtraAttributes
1272 "REL %linkType #IMPLIED
1273 REV %linkType #IMPLIED
1275 TITLE CDATA #IMPLIED
1276 METHODS NAMES #IMPLIED
1277 CHARSET NAME #IMPLIED
1280 <![ %HTML.Recommended [
1281 <!ENTITY % A.content "(%text)*"
1282 -- <H1><a name="xxx">Heading</a></H1>
1287 Expires 2 December 1996 [Page 23]
1289 Internet Draft HTML internationalization 27 May 1996
1292 <a name="xxx"><H1>Heading</H1></a>
1296 <!ENTITY % A.content "(%heading|%text)*">
1298 <!ELEMENT A - - %A.content -(A)>
1303 %linkExtraAttributes;
1304 %SDAPREF; "<Anchor: #AttList>"
1306 <!-- <A> Anchor; source/destination of link -->
1307 <!-- <A NAME="..."> Name of this anchor -->
1308 <!-- <A HREF="..."> Address of link destination -->
1309 <!-- <A URN="..."> Permanent address of destination -->
1310 <!-- <A REL=...> Relationship to destination -->
1311 <!-- <A REV=...> Relationship of destination to this -->
1312 <!-- <A TITLE="..."> Title of destination (advisory) -->
1313 <!-- <A METHODS="..."> Operations on destination (advisory) -->
1314 <!-- <A CHARSET="..."> Charset of destination (advisory) -->
1315 <!-- <A LANG="..."> Language of contents btw <A> and </A> -->
1316 <!-- <A DIR=...> Contents is a new counterflow embedding -->
1318 <!--========== Images ==========================-->
1320 <!ELEMENT IMG - O EMPTY>
1325 ALIGN (top|middle|bottom) #IMPLIED
1326 ISMAP (ISMAP) #IMPLIED
1327 %SDAPREF; "<Fig><?SDATrans Img: #AttList>#AttVal(Alt)</Fig>"
1330 <!-- <IMG> Image; icon, glyph or illustration -->
1331 <!-- <IMG SRC="..."> Address of image object -->
1332 <!-- <IMG ALT="..."> Textual alternative -->
1333 <!-- <IMG ALIGN=...> Position relative to text -->
1334 <!-- <IMG LANG=...> Image contains "text" in that language -->
1335 <!-- <IMG DIR=rtl> Inline image acts as a right-to-left
1336 embedding w/r to BIDI algorithm -->
1337 <!-- <IMG ISMAP> Each pixel can be a link -->
1339 <!--========== Paragraphs=======================-->
1343 Expires 2 December 1996 [Page 24]
1345 Internet Draft HTML internationalization 27 May 1996
1348 <!ELEMENT P - O (%text)*>
1355 <!-- <P> Paragraph -->
1356 <!-- <P LANG="..."> Language of paragraph text -->
1357 <!-- <P DIR=...> Base directionality of paragraph -->
1358 <!-- <P ALIGN=...> Paragraph alignment (justification) -->
1360 <!--========== Headings, Titles, Sections ===============-->
1362 <!ELEMENT HR - O EMPTY>
1365 %SDAPREF; "&#RE;&#RE;"
1368 <!-- <HR> Horizontal rule -->
1370 <!ELEMENT ( %heading ) - - (%text;)*>
1399 Expires 2 December 1996 [Page 25]
1401 Internet Draft HTML internationalization 27 May 1996
1410 <!-- <H1> Heading, level 1 -->
1411 <!-- <H2> Heading, level 2 -->
1412 <!-- <H3> Heading, level 3 -->
1413 <!-- <H4> Heading, level 4 -->
1414 <!-- <H5> Heading, level 5 -->
1415 <!-- <H6> Heading, level 6 -->
1418 <!--========== Text Flows ======================-->
1421 <!ENTITY % block.forms "BLOCKQUOTE | FORM | ISINDEX">
1424 <!ENTITY % block.forms "BLOCKQUOTE">
1426 <![ %HTML.Deprecated [
1427 <!ENTITY % preformatted "PRE | XMP | LISTING">
1430 <!ENTITY % preformatted "PRE">
1432 <!ENTITY % block "P | %list | DL
1436 <!ENTITY % flow "(%text|%block)*">
1438 <!ENTITY % pre.content "#PCDATA | A | HR | BR | SPAN | BDO">
1439 <!ELEMENT PRE - - (%pre.content)*>
1442 WIDTH NUMBER #implied
1446 <!-- <PRE> Preformatted text -->
1447 <!-- <PRE WIDTH=...> Maximum characters per line -->
1448 <!-- <PRE DIR=...> Base direction of preformatted block -->
1449 <!-- <PRE LANG=...> Language of contents -->
1451 <![ %HTML.Deprecated [
1455 Expires 2 December 1996 [Page 26]
1457 Internet Draft HTML internationalization 27 May 1996
1460 <!ENTITY % literal "CDATA"
1461 -- historical, non-conforming parsing mode where
1462 the only markup signal is the end tag
1466 <!ELEMENT (XMP|LISTING) - - %literal>
1470 %SDAPREF; "Example:&#RE;"
1475 %SDAPREF; "Listing:&#RE;"
1478 <!-- <XMP> Example section -->
1479 <!-- <LISTING> Computer listing -->
1481 <!ELEMENT PLAINTEXT - O %literal>
1482 <!-- <PLAINTEXT> Plain text passage -->
1491 <!--========== Lists ==================-->
1493 <!ELEMENT DL - - (DT | DD)+>
1496 COMPACT (COMPACT) #IMPLIED
1498 %SDAPREF; "Definition List:"
1501 <!ELEMENT DT - O (%text)*>
1507 <!ELEMENT DD - O %flow>
1511 Expires 2 December 1996 [Page 27]
1513 Internet Draft HTML internationalization 27 May 1996
1521 <!-- <DL> Definition list, or glossary -->
1522 <!-- <DL COMPACT> Compact style list -->
1523 <!-- <DT> Term in definition list -->
1524 <!-- <DD> Definition of term -->
1526 <!ELEMENT (OL|UL) - - (LI)+>
1530 COMPACT (COMPACT) #IMPLIED
1536 COMPACT (COMPACT) #IMPLIED
1539 <!-- <UL> Unordered list -->
1540 <!-- <UL COMPACT> Compact list style -->
1541 <!-- <OL> Ordered, or numbered list -->
1542 <!-- <OL COMPACT> Compact list style -->
1545 <!ELEMENT (DIR|MENU) - - (LI)+ -(%block)>
1549 COMPACT (COMPACT) #IMPLIED
1551 %SDAPREF; "<LHead>Directory</LHead>"
1556 COMPACT (COMPACT) #IMPLIED
1558 %SDAPREF; "<LHead>Menu</LHead>"
1561 <!-- <DIR> Directory list -->
1562 <!-- <DIR COMPACT> Compact list style -->
1563 <!-- <MENU> Menu list -->
1567 Expires 2 December 1996 [Page 28]
1569 Internet Draft HTML internationalization 27 May 1996
1572 <!-- <MENU COMPACT> Compact list style -->
1574 <!ELEMENT LI - O %flow>
1581 <!-- <LI> List item -->
1583 <!--========== Document Body ===================-->
1585 <![ %HTML.Recommended [
1586 <!ENTITY % body.content "(%heading|%block|HR|ADDRESS|IMG)*"
1595 <!ENTITY % body.content "(%heading | %text | %block |
1598 <!ELEMENT BODY O O %body.content>
1603 <!-- <BODY> Document body -->
1604 <!-- <BODY DIR=...> Base direction of whole body -->
1605 <!-- <BODY LANG=...> Language of contents -->
1607 <!ELEMENT BLOCKQUOTE - - %body.content>
1608 <!ATTLIST BLOCKQUOTE
1614 <!-- <BLOCKQUOTE> Quoted passage -->
1616 <!ELEMENT ADDRESS - - (%text|P)*>
1623 Expires 2 December 1996 [Page 29]
1625 Internet Draft HTML internationalization 27 May 1996
1629 %SDAPREF; "Address:&#RE;"
1632 <!-- <ADDRESS> Address, signature, or byline -->
1635 <!--======= Forms ====================-->
1639 <!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
1642 ACTION CDATA #IMPLIED
1643 METHOD (%HTTP-Method) GET
1644 ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
1645 %SDAPREF; "<Para>Form:</Para>"
1646 %SDASUFF; "<Para>Form End.</Para>"
1649 <!-- <FORM> Fill-out or data-entry form -->
1650 <!-- <FORM ACTION="..."> Address for completed form -->
1651 <!-- <FORM METHOD=...> Method of submitting form -->
1652 <!-- <FORM ENCTYPE="..."> Representation of form data -->
1653 <!-- <FORM DIR=...> Base direction of form -->
1654 <!-- <FORM LANG=...> Language of contents -->
1656 <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
1657 RADIO | SUBMIT | RESET |
1658 IMAGE | HIDDEN | FILE )">
1659 <!ELEMENT INPUT - O EMPTY>
1662 TYPE %InputType TEXT
1664 VALUE CDATA #IMPLIED
1666 CHECKED (CHECKED) #IMPLIED
1668 MAXLENGTH NUMBER #IMPLIED
1669 ALIGN (top|middle|bottom) #IMPLIED
1670 ACCEPT CDATA #IMPLIED --list of content types --
1671 ACCEPT-CHARSET CDATA #IMPLIED --list of charsets accepted by server --
1675 <!-- <INPUT> Form input datum -->
1679 Expires 2 December 1996 [Page 30]
1681 Internet Draft HTML internationalization 27 May 1996
1684 <!-- <INPUT TYPE=...> Type of input interaction -->
1685 <!-- <INPUT NAME=...> Name of form datum -->
1686 <!-- <INPUT VALUE="..."> Default/initial/selected value -->
1687 <!-- <INPUT SRC="..."> Address of image -->
1688 <!-- <INPUT CHECKED> Initial state is "on" -->
1689 <!-- <INPUT SIZE=...> Field size hint -->
1690 <!-- <INPUT MAXLENGTH=...> Data length maximum -->
1691 <!-- <INPUT ALIGN=...> Image alignment -->
1692 <!-- <INPUT ACCEPT="..."> List of desired media types -->
1693 <!-- <INPUT ACCEPT-CHARSET="..."> List of acceptable charsets -->
1695 <!ELEMENT SELECT - - (OPTION+) -(INPUT|SELECT|TEXTAREA)>
1698 NAME CDATA #REQUIRED
1699 SIZE NUMBER #IMPLIED
1700 MULTIPLE (MULTIPLE) #IMPLIED
1703 "<LHead>Select #AttVal(Multiple)</LHead>"
1706 <!-- <SELECT> Selection of option(s) -->
1707 <!-- <SELECT NAME=...> Name of form datum -->
1708 <!-- <SELECT SIZE=...> Options displayed at a time -->
1709 <!-- <SELECT MULTIPLE> Multiple selections allowed -->
1711 <!ELEMENT OPTION - O (#PCDATA)*>
1714 SELECTED (SELECTED) #IMPLIED
1715 VALUE CDATA #IMPLIED
1718 "Option: #AttVal(Value) #AttVal(Selected)"
1721 <!-- <OPTION> A selection option -->
1722 <!-- <OPTION SELECTED> Initial state -->
1723 <!-- <OPTION VALUE="..."> Form datum value for this option-->
1725 <!ELEMENT TEXTAREA - - (#PCDATA)* -(INPUT|SELECT|TEXTAREA)>
1728 NAME CDATA #REQUIRED
1729 ROWS NUMBER #REQUIRED
1730 COLS NUMBER #REQUIRED
1731 ACCEPT-CHARSET CDATA #IMPLIED -- list of charsets accepted by server --
1735 Expires 2 December 1996 [Page 31]
1737 Internet Draft HTML internationalization 27 May 1996
1741 %SDAPREF; "Input Text -- #AttVal(Name): "
1744 <!-- <TEXTAREA> An area for text input -->
1745 <!-- <TEXTAREA NAME=...> Name of form datum -->
1746 <!-- <TEXTAREA ROWS=...> Height of area -->
1747 <!-- <TEXTAREA COLS=...> Width of area -->
1752 <!--======= Document Head ======================-->
1754 <![ %HTML.Recommended [
1755 <!ENTITY % head.extra "">
1757 <!ENTITY % head.extra "& NEXTID?">
1759 <!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra">
1761 <!ELEMENT HEAD O O (%head.content) +(META|LINK)>
1765 <!-- <HEAD> Document head -->
1767 <!ELEMENT TITLE - - (#PCDATA)* -(META|LINK)>
1772 <!-- <TITLE> Title of document -->
1774 <!ELEMENT LINK - O EMPTY>
1777 HREF CDATA #REQUIRED
1778 %linkExtraAttributes;
1779 %SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>" >
1781 <!-- <LINK> Link from this document -->
1782 <!-- <LINK HREF="..."> Address of link destination -->
1783 <!-- <LINK URN="..."> Lasting name of destination -->
1784 <!-- <LINK REL=...> Relationship to destination -->
1785 <!-- <LINK REV=...> Relationship of destination to this -->
1786 <!-- <LINK TITLE="..."> Title of destination (advisory) -->
1787 <!-- <LINK CHARSET="..."> Charset of destination (advisory) -->
1791 Expires 2 December 1996 [Page 32]
1793 Internet Draft HTML internationalization 27 May 1996
1796 <!-- <LINK METHODS="..."> Operations allowed (advisory) -->
1798 <!ELEMENT ISINDEX - O EMPTY>
1802 "<Para>[Document is indexed/searchable.]</Para>">
1804 <!-- <ISINDEX> Document is a searchable index -->
1806 <!ELEMENT BASE - O EMPTY>
1808 HREF CDATA #REQUIRED >
1810 <!-- <BASE> Base context document -->
1811 <!-- <BASE HREF="..."> Address for this document -->
1813 <!ELEMENT NEXTID - O EMPTY>
1817 <!-- <NEXTID> Next ID to use for link name -->
1818 <!-- <NEXTID N=...> Next ID to use for link name -->
1820 <!ELEMENT META - O EMPTY>
1822 HTTP-EQUIV NAME #IMPLIED
1824 CONTENT CDATA #REQUIRED >
1826 <!-- <META> Generic Meta-information -->
1827 <!-- <META HTTP-EQUIV=...> HTTP response header name -->
1828 <!-- <META NAME=...> Meta-information name -->
1829 <!-- <META CONTENT="..."> Associated information -->
1831 <!--======= Document Structure =================-->
1833 <![ %HTML.Deprecated [
1834 <!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
1836 <!ENTITY % html.content "HEAD, BODY">
1838 <!ELEMENT HTML O O (%html.content)>
1839 <!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'">
1847 Expires 2 December 1996 [Page 33]
1849 Internet Draft HTML internationalization 27 May 1996
1855 <!-- <HTML> HTML Document -->
1858 7.2. SGML Declaration for HTML
1860 <!SGML "ISO 8879:1986"
1862 SGML Declaration for HyperText Markup Language version 2.x
1863 (HTML 2.x = HTML 2.0 + i18n).
1868 BASESET "ISO Registration Number 177//CHARSET
1869 ISO/IEC 10646-1:1993 UCS-4 with
1870 implementation level 3//ESC 2/5 2/15 4/6"
1888 SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1889 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
1890 BASESET "ISO 646:1983//CHARSET
1891 International Reference Version
1903 Expires 2 December 1996 [Page 34]
1905 Internet Draft HTML internationalization 27 May 1996
1912 NAMECASE GENERAL YES
1914 DELIM GENERAL SGMLREF
1920 NAMELEN 72 -- somewhat arbitrary; taken from
1921 internet line length conventions --
1942 APPINFO "SDA" -- conforming SGML Document Access application
1947 7.3. ISO Latin 1 entity set
1949 The following public text lists each of the characters specified in
1950 the Added Latin 1 entity set, along with its name, syntax for use,
1951 and description. This list is derived from ISO Standard
1952 8879:1986//ENTITIES Added Latin 1//EN. HTML includes the entire
1953 entity set, and adds entities for all missing characters in the right
1959 Expires 2 December 1996 [Page 35]
1961 Internet Draft HTML internationalization 27 May 1996
1964 <!-- (C) International Organization for Standardization 1986
1965 Permission to copy in any form is granted for use with
1966 conforming SGML systems and applications as defined in
1967 ISO 8879, provided this notice is included in all copies.
1969 <!-- Character entity set. Typical invocation:
1970 <!ENTITY % ISOlat1 PUBLIC
1971 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
1974 <!ENTITY nbsp CDATA " " -- no-break space -->
1975 <!ENTITY iexcl CDATA "¡" -- inverted exclamation mark -->
1976 <!ENTITY cent CDATA "¢" -- cent sign -->
1977 <!ENTITY pound CDATA "£" -- pound sterling sign -->
1978 <!ENTITY curren CDATA "¤" -- general currency sign -->
1979 <!ENTITY yen CDATA "¥" -- yen sign -->
1980 <!ENTITY brvbar CDATA "¦" -- broken (vertical) bar -->
1981 <!ENTITY sect CDATA "§" -- section sign -->
1982 <!ENTITY uml CDATA "¨" -- umlaut (dieresis) -->
1983 <!ENTITY copy CDATA "©" -- copyright sign -->
1984 <!ENTITY ordf CDATA "ª" -- ordinal indicator, feminine -->
1985 <!ENTITY laquo CDATA "«" -- angle quotation mark, left -->
1986 <!ENTITY not CDATA "¬" -- not sign -->
1987 <!ENTITY shy CDATA "­" -- soft hyphen -->
1988 <!ENTITY reg CDATA "®" -- registered sign -->
1989 <!ENTITY macr CDATA "¯" -- macron -->
1990 <!ENTITY deg CDATA "°" -- degree sign -->
1991 <!ENTITY plusmn CDATA "±" -- plus-or-minus sign -->
1992 <!ENTITY sup2 CDATA "²" -- superscript two -->
1993 <!ENTITY sup3 CDATA "³" -- superscript three -->
1994 <!ENTITY acute CDATA "´" -- acute accent -->
1995 <!ENTITY micro CDATA "µ" -- micro sign -->
1996 <!ENTITY para CDATA "¶" -- pilcrow (paragraph sign) -->
1997 <!ENTITY middot CDATA "·" -- middle dot -->
1998 <!ENTITY cedil CDATA "¸" -- cedilla -->
1999 <!ENTITY sup1 CDATA "¹" -- superscript one -->
2000 <!ENTITY ordm CDATA "º" -- ordinal indicator, masculine -->
2001 <!ENTITY raquo CDATA "»" -- angle quotation mark, right -->
2002 <!ENTITY frac14 CDATA "¼" -- fraction one-quarter -->
2003 <!ENTITY frac12 CDATA "½" -- fraction one-half -->
2004 <!ENTITY frac34 CDATA "¾" -- fraction three-quarters -->
2005 <!ENTITY iquest CDATA "¿" -- inverted question mark -->
2006 <!ENTITY Agrave CDATA "À" -- capital A, grave accent -->
2007 <!ENTITY Aacute CDATA "Á" -- capital A, acute accent -->
2008 <!ENTITY Acirc CDATA "Â" -- capital A, circumflex accent -->
2009 <!ENTITY Atilde CDATA "Ã" -- capital A, tilde -->
2010 <!ENTITY Auml CDATA "Ä" -- capital A, dieresis or umlaut mark -->
2011 <!ENTITY Aring CDATA "Å" -- capital A, ring -->
2015 Expires 2 December 1996 [Page 36]
2017 Internet Draft HTML internationalization 27 May 1996
2020 <!ENTITY AElig CDATA "Æ" -- capital AE diphthong (ligature) -->
2021 <!ENTITY Ccedil CDATA "Ç" -- capital C, cedilla -->
2022 <!ENTITY Egrave CDATA "È" -- capital E, grave accent -->
2023 <!ENTITY Eacute CDATA "É" -- capital E, acute accent -->
2024 <!ENTITY Ecirc CDATA "Ê" -- capital E, circumflex accent -->
2025 <!ENTITY Euml CDATA "Ë" -- capital E, dieresis or umlaut mark -->
2026 <!ENTITY Igrave CDATA "Ì" -- capital I, grave accent -->
2027 <!ENTITY Iacute CDATA "Í" -- capital I, acute accent -->
2028 <!ENTITY Icirc CDATA "Î" -- capital I, circumflex accent -->
2029 <!ENTITY Iuml CDATA "Ï" -- capital I, dieresis or umlaut mark -->
2030 <!ENTITY ETH CDATA "Ð" -- capital Eth, Icelandic -->
2031 <!ENTITY Ntilde CDATA "Ñ" -- capital N, tilde -->
2032 <!ENTITY Ograve CDATA "Ò" -- capital O, grave accent -->
2033 <!ENTITY Oacute CDATA "Ó" -- capital O, acute accent -->
2034 <!ENTITY Ocirc CDATA "Ô" -- capital O, circumflex accent -->
2035 <!ENTITY Otilde CDATA "Õ" -- capital O, tilde -->
2036 <!ENTITY Ouml CDATA "Ö" -- capital O, dieresis or umlaut mark -->
2037 <!ENTITY times CDATA "×" -- multiply sign -->
2038 <!ENTITY Oslash CDATA "Ø" -- capital O, slash -->
2039 <!ENTITY Ugrave CDATA "Ù" -- capital U, grave accent -->
2040 <!ENTITY Uacute CDATA "Ú" -- capital U, acute accent -->
2041 <!ENTITY Ucirc CDATA "Û" -- capital U, circumflex accent -->
2042 <!ENTITY Uuml CDATA "Ü" -- capital U, dieresis or umlaut mark -->
2043 <!ENTITY Yacute CDATA "Ý" -- capital Y, acute accent -->
2044 <!ENTITY THORN CDATA "Þ" -- capital Thorn, Icelandic -->
2045 <!ENTITY szlig CDATA "ß" -- small sharp s, German (sz ligature) -->
2046 <!ENTITY agrave CDATA "à" -- small a, grave accent -->
2047 <!ENTITY aacute CDATA "á" -- small a, acute accent -->
2048 <!ENTITY acirc CDATA "â" -- small a, circumflex accent -->
2049 <!ENTITY atilde CDATA "ã" -- small a, tilde -->
2050 <!ENTITY auml CDATA "ä" -- small a, dieresis or umlaut mark -->
2051 <!ENTITY aring CDATA "å" -- small a, ring -->
2052 <!ENTITY aelig CDATA "æ" -- small ae diphthong (ligature) -->
2053 <!ENTITY ccedil CDATA "ç" -- small c, cedilla -->
2054 <!ENTITY egrave CDATA "è" -- small e, grave accent -->
2055 <!ENTITY eacute CDATA "é" -- small e, acute accent -->
2056 <!ENTITY ecirc CDATA "ê" -- small e, circumflex accent -->
2057 <!ENTITY euml CDATA "ë" -- small e, dieresis or umlaut mark -->
2058 <!ENTITY igrave CDATA "ì" -- small i, grave accent -->
2059 <!ENTITY iacute CDATA "í" -- small i, acute accent -->
2060 <!ENTITY icirc CDATA "î" -- small i, circumflex accent -->
2061 <!ENTITY iuml CDATA "ï" -- small i, dieresis or umlaut mark -->
2062 <!ENTITY eth CDATA "ð" -- small eth, Icelandic -->
2063 <!ENTITY ntilde CDATA "ñ" -- small n, tilde -->
2064 <!ENTITY ograve CDATA "ò" -- small o, grave accent -->
2065 <!ENTITY oacute CDATA "ó" -- small o, acute accent -->
2066 <!ENTITY ocirc CDATA "ô" -- small o, circumflex accent -->
2067 <!ENTITY otilde CDATA "õ" -- small o, tilde -->
2071 Expires 2 December 1996 [Page 37]
2073 Internet Draft HTML internationalization 27 May 1996
2076 <!ENTITY ouml CDATA "ö" -- small o, dieresis or umlaut mark -->
2077 <!ENTITY divide CDATA "÷" -- divide sign -->
2078 <!ENTITY oslash CDATA "ø" -- small o, slash -->
2079 <!ENTITY ugrave CDATA "ù" -- small u, grave accent -->
2080 <!ENTITY uacute CDATA "ú" -- small u, acute accent -->
2081 <!ENTITY ucirc CDATA "û" -- small u, circumflex accent -->
2082 <!ENTITY uuml CDATA "ü" -- small u, dieresis or umlaut mark -->
2083 <!ENTITY yacute CDATA "ý" -- small y, acute accent -->
2084 <!ENTITY thorn CDATA "þ" -- small thorn, Icelandic -->
2085 <!ENTITY yuml CDATA "ÿ" -- small y, dieresis or umlaut mark -->
2090 [BRYAN88] M. Bryan, "SGML -- An Author's Guide to the Standard
2091 Generalized Markup Language", Addison-Wesley, Reading,
2094 [ERCS] Extended Reference Concrete Syntax for SGML.
2095 <http://www.sgmlopen.org/sgml/docs/ercs/ercs-
2098 [GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed.,
2099 Oxford University Press, 1990.
2101 [HTTP-1.1] R.T. Fielding, H. Frystyk Nielsen, and T. Berners-Lee,
2102 "Hypertext Transfer Protocol -- HTTP/1.1", Work in
2103 progress (draft-ietf-http-v11-spec-03.txt), MIT/LCS,
2106 [ISO-639] ISO 639:1988. Codes pour la représentation des noms de
2107 langue. Technical content in
2108 <http://www.sil.org/sgml/iso639a.html>
2110 [ISO-3166] ISO 3166:1993. Codes pour la représentation des noms
2113 [ISO-8601] ISO 8601:1988. Éléments de données et formats
2114 d'échange -- Échange d'information -- Représentation
2115 de la date et de l'heure.
2117 [ISO-8859-1] ISO 8859-1:1987. International Standard -- Informa-
2118 tion Processing -- 8-bit Single-Byte Coded Graphic
2119 Character Sets -- Part 1: Latin Alphabet No. 1.
2121 [ISO-8879] ISO 8879:1986. International Standard -- Information
2122 Processing -- Text and Office Systems -- Standard Gen-
2123 eralized Markup Language (SGML).
2127 Expires 2 December 1996 [Page 38]
2129 Internet Draft HTML internationalization 27 May 1996
2132 [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor-
2133 mation technology -- Universal Multiple-Octet Coded
2134 Character Set (UCS) -- Part 1: Architecture and Basic
2137 [NICOL] G.T. Nicol, "The Multilingual World Wide Web", Elec-
2138 tronic Book Technologies, 1995,
2139 <http://www.ebt.com/docs/multling.html>
2141 [NICOL2] G.T. Nicol, "MIME Header Supplemented File Type", Work
2142 in progress, <draft-nicol-mime-header-type-00.txt>,
2145 [RFC1345] K. Simonsen, "Character Mnemonics & Character Sets",
2146 RFC 1345, Rationel Almen Planlaegning, June 1992.
2148 [RFC1468] J. Murai, M. Crispin and E. van der Poel, "Japanese
2149 Character Encoding for Internet Messages", RFC 1468,
2150 Keio University, Panda Programming, June 1993.
2152 [RFC1521] N. Borenstein and N. Freed, "MIME (Multipurpose Inter-
2153 net Mail Extensions) Part One: Mechanisms for Specify-
2154 ing and Describing the Format of Internet Message Bod-
2155 ies", RFC 1521, Bellcore, Innosoft, September 1993.
2157 [RFC1641] D. Goldsmith, M.Davis, "Using Unicode with MIME", RFC
2158 1641, Taligent inc., July 1994.
2160 [RFC1642] D. Goldsmith, M. Davis, "UTF-7: A Mail-safe Transfor-
2161 mation Format of Unicode", RFC 1642, Taligent inc.,
2164 [RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
2165 Resource Locators (URL)", RFC 1738, CERN, Xerox PARC,
2166 University of Minnesota, October 1994.
2168 [RFC1766] H. Alverstrand, "Tags for the Identification of Lan-
2169 guages", RFC 1766, UNINETT, March 1995.
2171 [RFC1866] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan-
2172 guage - 2.0", RFC 1866, MIT/W3C, November 1995.
2174 [RFC1867] E. Nebel and L. Masinter, "Form-based File Upload in
2175 HTML", RFC 1867, Xerox Corporation, November 1995.
2177 [RFC1942] D. Raggett, "HTML Tables", RFC 1942, W3C, May 1996.
2183 Expires 2 December 1996 [Page 39]
2185 Internet Draft HTML internationalization 27 May 1996
2188 [RFC1945] T. Berners-Lee, R.T. Fielding, and H. Frystyk Nielsen,
2189 "Hypertext Transfer Protocol -- HTTP/1.0", RFC 1945,
2190 MIT/LCS, UC Irvine, May 1996.
2192 [SQ91] SoftQuad, "The SGML Primer", 3rd ed., SoftQuad Inc.,
2195 [TAKADA] Toshihiro Takada, "Multilingual Information Exchange
2196 through the World-Wide Web", Computer Networks and
2197 ISDN Systems, Vol. 27, No. 2, Nov. 1994 , p. 235-241.
2199 [TEI] TEI Guidelines for Electronic Text Encoding and Inter-
2200 change. <http://etext.virgina.edu/TEI.html>
2202 [UNICODE] The Unicode Consortium, "The Unicode Standard --
2203 Worldwide Character Encoding -- Version 1.0", Addison-
2204 Wesley, Volume 1, 1991, Volume 2, 1992, and Technical
2205 Report #4, 1993. The BIDI algorithm is in appendix A
2206 of volume 1, with corrections in appendix D of volume
2209 [UTF-8] ISO/IEC 10646-1:1993 AMENDMENT 2 (1996). UCS Transfor-
2210 mation Format 8 (UTF-8).
2212 [VANH90] E. van Hervijnen, "Practical SGML", Kluwer Academicq
2213 Publishers Group, Norwell and Dordrecht, 1990.
2219 100, boul. Alexis-Nihon, bureau 600
2223 Tel: +1 (514) 747-2547
2224 Fax: +1 (514) 747-2561
2225 EMail: fyergeau@alis.com
2229 Electronic Book Technologies, Japan
2235 Tel: +81-3-3230-8161
2239 Expires 2 December 1996 [Page 40]
2241 Internet Draft HTML internationalization 27 May 1996
2244 Fax: +81-3-3230-8163
2245 EMail: gtn@ebt.com, gtn@twics.co.jp
2254 Tel: +1 (617) 864-5524
2255 Fax: +1 (617) 864-4965
2256 EMail: glenn@spyglass.com
2260 Multimedia-Laboratory
2261 Department of Computer Science
2262 University of Zurich
2263 Winterthurerstrasse 190
2267 Tel: +41 1 257 43 16
2268 Fax: +41 1 363 00 35
2269 E-mail: mduerst@ifi.unizh.ch
2295 Expires 2 December 1996 [Page 41]