gopher.r-36.net

       rfc2047.txt - rohrpost - A commandline mail client to change the world as we see it.
 (HTM) git clone git://r-36.net/rohrpost
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       rfc2047.txt (33262B)
       ---
            1 
            2 
            3 
            4 
            5 
            6 
            7 Network Working Group                                           K. Moore
            8 Request for Comments: 2047                       University of Tennessee
            9 Obsoletes: 1521, 1522, 1590                                November 1996
           10 Category: Standards Track
           11 
           12 
           13         MIME (Multipurpose Internet Mail Extensions) Part Three:
           14               Message Header Extensions for Non-ASCII Text
           15 
           16 Status of this Memo
           17 
           18    This document specifies an Internet standards track protocol for the
           19    Internet community, and requests discussion and suggestions for
           20    improvements.  Please refer to the current edition of the "Internet
           21    Official Protocol Standards" (STD 1) for the standardization state
           22    and status of this protocol.  Distribution of this memo is unlimited.
           23 
           24 Abstract
           25 
           26    STD 11, RFC 822, defines a message representation protocol specifying
           27    considerable detail about US-ASCII message headers, and leaves the
           28    message content, or message body, as flat US-ASCII text.  This set of
           29    documents, collectively called the Multipurpose Internet Mail
           30    Extensions, or MIME, redefines the format of messages to allow for
           31 
           32    (1) textual message bodies in character sets other than US-ASCII,
           33 
           34    (2) an extensible set of different formats for non-textual message
           35        bodies,
           36 
           37    (3) multi-part message bodies, and
           38 
           39    (4) textual header information in character sets other than US-ASCII.
           40 
           41    These documents are based on earlier work documented in RFC 934, STD
           42    11, and RFC 1049, but extends and revises them.  Because RFC 822 said
           43    so little about message bodies, these documents are largely
           44    orthogonal to (rather than a revision of) RFC 822.
           45 
           46    This particular document is the third document in the series.  It
           47    describes extensions to RFC 822 to allow non-US-ASCII text data in
           48    Internet mail header fields.
           49 
           50 
           51 
           52 
           53 
           54 
           55 
           56 
           57 
           58 Moore                       Standards Track                     [Page 1]
           59 
           60 RFC 2047               Message Header Extensions           November 1996
           61 
           62 
           63    Other documents in this series include:
           64 
           65    + RFC 2045, which specifies the various headers used to describe
           66      the structure of MIME messages.
           67 
           68    + RFC 2046, which defines the general structure of the MIME media
           69      typing system and defines an initial set of media types,
           70 
           71    + RFC 2048, which specifies various IANA registration procedures
           72      for MIME-related facilities, and
           73 
           74    + RFC 2049, which describes MIME conformance criteria and
           75      provides some illustrative examples of MIME message formats,
           76      acknowledgements, and the bibliography.
           77 
           78    These documents are revisions of RFCs 1521, 1522, and 1590, which
           79    themselves were revisions of RFCs 1341 and 1342.  An appendix in RFC
           80    2049 describes differences and changes from previous versions.
           81 
           82 1. Introduction
           83 
           84    RFC 2045 describes a mechanism for denoting textual body parts which
           85    are coded in various character sets, as well as methods for encoding
           86    such body parts as sequences of printable US-ASCII characters.  This
           87    memo describes similar techniques to allow the encoding of non-ASCII
           88    text in various portions of a RFC 822 [2] message header, in a manner
           89    which is unlikely to confuse existing message handling software.
           90 
           91    Like the encoding techniques described in RFC 2045, the techniques
           92    outlined here were designed to allow the use of non-ASCII characters
           93    in message headers in a way which is unlikely to be disturbed by the
           94    quirks of existing Internet mail handling programs.  In particular,
           95    some mail relaying programs are known to (a) delete some message
           96    header fields while retaining others, (b) rearrange the order of
           97    addresses in To or Cc fields, (c) rearrange the (vertical) order of
           98    header fields, and/or (d) "wrap" message headers at different places
           99    than those in the original message.  In addition, some mail reading
          100    programs are known to have difficulty correctly parsing message
          101    headers which, while legal according to RFC 822, make use of
          102    backslash-quoting to "hide" special characters such as "<", ",", or
          103    ":", or which exploit other infrequently-used features of that
          104    specification.
          105 
          106    While it is unfortunate that these programs do not correctly
          107    interpret RFC 822 headers, to "break" these programs would cause
          108    severe operational problems for the Internet mail system.  The
          109    extensions described in this memo therefore do not rely on little-
          110    used features of RFC 822.
          111 
          112 
          113 
          114 Moore                       Standards Track                     [Page 2]
          115 
          116 RFC 2047               Message Header Extensions           November 1996
          117 
          118 
          119    Instead, certain sequences of "ordinary" printable ASCII characters
          120    (known as "encoded-words") are reserved for use as encoded data.  The
          121    syntax of encoded-words is such that they are unlikely to
          122    "accidentally" appear as normal text in message headers.
          123    Furthermore, the characters used in encoded-words are restricted to
          124    those which do not have special meanings in the context in which the
          125    encoded-word appears.
          126 
          127    Generally, an "encoded-word" is a sequence of printable ASCII
          128    characters that begins with "=?", ends with "?=", and has two "?"s in
          129    between.  It specifies a character set and an encoding method, and
          130    also includes the original text encoded as graphic ASCII characters,
          131    according to the rules for that encoding method.
          132 
          133    A mail composer that implements this specification will provide a
          134    means of inputting non-ASCII text in header fields, but will
          135    translate these fields (or appropriate portions of these fields) into
          136    encoded-words before inserting them into the message header.
          137 
          138    A mail reader that implements this specification will recognize
          139    encoded-words when they appear in certain portions of the message
          140    header.  Instead of displaying the encoded-word "as is", it will
          141    reverse the encoding and display the original text in the designated
          142    character set.
          143 
          144 NOTES
          145 
          146    This memo relies heavily on notation and terms defined RFC 822 and
          147    RFC 2045.  In particular, the syntax for the ABNF used in this memo
          148    is defined in RFC 822, as well as many of the terminal or nonterminal
          149    symbols from RFC 822 are used in the grammar for the header
          150    extensions defined here.  Among the symbols defined in RFC 822 and
          151    referenced in this memo are: 'addr-spec', 'atom', 'CHAR', 'comment',
          152    'CTLs', 'ctext', 'linear-white-space', 'phrase', 'quoted-pair'.
          153    'quoted-string', 'SPACE', and 'word'.  Successful implementation of
          154    this protocol extension requires careful attention to the RFC 822
          155    definitions of these terms.
          156 
          157    When the term "ASCII" appears in this memo, it refers to the "7-Bit
          158    American Standard Code for Information Interchange", ANSI X3.4-1986.
          159    The MIME charset name for this character set is "US-ASCII".  When not
          160    specifically referring to the MIME charset name, this document uses
          161    the term "ASCII", both for brevity and for consistency with RFC 822.
          162    However, implementors are warned that the character set name must be
          163    spelled "US-ASCII" in MIME message and body part headers.
          164 
          165 
          166 
          167 
          168 
          169 
          170 Moore                       Standards Track                     [Page 3]
          171 
          172 RFC 2047               Message Header Extensions           November 1996
          173 
          174 
          175    This memo specifies a protocol for the representation of non-ASCII
          176    text in message headers.  It specifically DOES NOT define any
          177    translation between "8-bit headers" and pure ASCII headers, nor is
          178    any such translation assumed to be possible.
          179 
          180 2. Syntax of encoded-words
          181 
          182    An 'encoded-word' is defined by the following ABNF grammar.  The
          183    notation of RFC 822 is used, with the exception that white space
          184    characters MUST NOT appear between components of an 'encoded-word'.
          185 
          186    encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
          187 
          188    charset = token    ; see section 3
          189 
          190    encoding = token   ; see section 4
          191 
          192    token = 1*<Any CHAR except SPACE, CTLs, and especials>
          193 
          194    especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
          195                <"> / "/" / "[" / "]" / "?" / "." / "="
          196 
          197    encoded-text = 1*<Any printable ASCII character other than "?"
          198                      or SPACE>
          199                   ; (but see "Use of encoded-words in message
          200                   ; headers", section 5)
          201 
          202    Both 'encoding' and 'charset' names are case-independent.  Thus the
          203    charset name "ISO-8859-1" is equivalent to "iso-8859-1", and the
          204    encoding named "Q" may be spelled either "Q" or "q".
          205 
          206    An 'encoded-word' may not be more than 75 characters long, including
          207    'charset', 'encoding', 'encoded-text', and delimiters.  If it is
          208    desirable to encode more text than will fit in an 'encoded-word' of
          209    75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
          210    be used.
          211 
          212    While there is no limit to the length of a multiple-line header
          213    field, each line of a header field that contains one or more
          214    'encoded-word's is limited to 76 characters.
          215 
          216    The length restrictions are included both to ease interoperability
          217    through internetwork mail gateways, and to impose a limit on the
          218    amount of lookahead a header parser must employ (while looking for a
          219    final ?= delimiter) before it can decide whether a token is an
          220    "encoded-word" or something else.
          221 
          222 
          223 
          224 
          225 
          226 Moore                       Standards Track                     [Page 4]
          227 
          228 RFC 2047               Message Header Extensions           November 1996
          229 
          230 
          231    IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
          232    by an RFC 822 parser.  As a consequence, unencoded white space
          233    characters (such as SPACE and HTAB) are FORBIDDEN within an
          234    'encoded-word'.  For example, the character sequence
          235 
          236       =?iso-8859-1?q?this is some text?=
          237 
          238    would be parsed as four 'atom's, rather than as a single 'atom' (by
          239    an RFC 822 parser) or 'encoded-word' (by a parser which understands
          240    'encoded-words').  The correct way to encode the string "this is some
          241    text" is to encode the SPACE characters as well, e.g.
          242 
          243       =?iso-8859-1?q?this=20is=20some=20text?=
          244 
          245    The characters which may appear in 'encoded-text' are further
          246    restricted by the rules in section 5.
          247 
          248 3. Character sets
          249 
          250    The 'charset' portion of an 'encoded-word' specifies the character
          251    set associated with the unencoded text.  A 'charset' can be any of
          252    the character set names allowed in an MIME "charset" parameter of a
          253    "text/plain" body part, or any character set name registered with
          254    IANA for use with the MIME text/plain content-type.
          255 
          256    Some character sets use code-switching techniques to switch between
          257    "ASCII mode" and other modes.  If unencoded text in an 'encoded-word'
          258    contains a sequence which causes the charset interpreter to switch
          259    out of ASCII mode, it MUST contain additional control codes such that
          260    ASCII mode is again selected at the end of the 'encoded-word'.  (This
          261    rule applies separately to each 'encoded-word', including adjacent
          262    'encoded-word's within a single header field.)
          263 
          264    When there is a possibility of using more than one character set to
          265    represent the text in an 'encoded-word', and in the absence of
          266    private agreements between sender and recipients of a message, it is
          267    recommended that members of the ISO-8859-* series be used in
          268    preference to other character sets.
          269 
          270 4. Encodings
          271 
          272    Initially, the legal values for "encoding" are "Q" and "B".  These
          273    encodings are described below.  The "Q" encoding is recommended for
          274    use when most of the characters to be encoded are in the ASCII
          275    character set; otherwise, the "B" encoding should be used.
          276    Nevertheless, a mail reader which claims to recognize 'encoded-word's
          277    MUST be able to accept either encoding for any character set which it
          278    supports.
          279 
          280 
          281 
          282 Moore                       Standards Track                     [Page 5]
          283 
          284 RFC 2047               Message Header Extensions           November 1996
          285 
          286 
          287    Only a subset of the printable ASCII characters may be used in
          288    'encoded-text'.  Space and tab characters are not allowed, so that
          289    the beginning and end of an 'encoded-word' are obvious.  The "?"
          290    character is used within an 'encoded-word' to separate the various
          291    portions of the 'encoded-word' from one another, and thus cannot
          292    appear in the 'encoded-text' portion.  Other characters are also
          293    illegal in certain contexts.  For example, an 'encoded-word' in a
          294    'phrase' preceding an address in a From header field may not contain
          295    any of the "specials" defined in RFC 822.  Finally, certain other
          296    characters are disallowed in some contexts, to ensure reliability for
          297    messages that pass through internetwork mail gateways.
          298 
          299    The "B" encoding automatically meets these requirements.  The "Q"
          300    encoding allows a wide range of printable characters to be used in
          301    non-critical locations in the message header (e.g., Subject), with
          302    fewer characters available for use in other locations.
          303 
          304 4.1. The "B" encoding
          305 
          306    The "B" encoding is identical to the "BASE64" encoding defined by RFC
          307    2045.
          308 
          309 4.2. The "Q" encoding
          310 
          311    The "Q" encoding is similar to the "Quoted-Printable" content-
          312    transfer-encoding defined in RFC 2045.  It is designed to allow text
          313    containing mostly ASCII characters to be decipherable on an ASCII
          314    terminal without decoding.
          315 
          316    (1) Any 8-bit value may be represented by a "=" followed by two
          317        hexadecimal digits.  For example, if the character set in use
          318        were ISO-8859-1, the "=" character would thus be encoded as
          319        "=3D", and a SPACE by "=20".  (Upper case should be used for
          320        hexadecimal digits "A" through "F".)
          321 
          322    (2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
          323        represented as "_" (underscore, ASCII 95.).  (This character may
          324        not pass through some internetwork mail gateways, but its use
          325        will greatly enhance readability of "Q" encoded data with mail
          326        readers that do not support this encoding.)  Note that the "_"
          327        always represents hexadecimal 20, even if the SPACE character
          328        occupies a different code position in the character set in use.
          329 
          330    (3) 8-bit values which correspond to printable ASCII characters other
          331        than "=", "?", and "_" (underscore), MAY be represented as those
          332        characters.  (But see section 5 for restrictions.)  In
          333        particular, SPACE and TAB MUST NOT be represented as themselves
          334        within encoded words.
          335 
          336 
          337 
          338 Moore                       Standards Track                     [Page 6]
          339 
          340 RFC 2047               Message Header Extensions           November 1996
          341 
          342 
          343 5. Use of encoded-words in message headers
          344 
          345    An 'encoded-word' may appear in a message header or body part header
          346    according to the following rules:
          347 
          348 (1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
          349     in any Subject or Comments header field, any extension message
          350     header field, or any MIME body part field for which the field body
          351     is defined as '*text'.  An 'encoded-word' may also appear in any
          352     user-defined ("X-") message or body part header field.
          353 
          354     Ordinary ASCII text and 'encoded-word's may appear together in the
          355     same header field.  However, an 'encoded-word' that appears in a
          356     header field defined as '*text' MUST be separated from any adjacent
          357     'encoded-word' or 'text' by 'linear-white-space'.
          358 
          359 (2) An 'encoded-word' may appear within a 'comment' delimited by "(" and
          360     ")", i.e., wherever a 'ctext' is allowed.  More precisely, the RFC
          361     822 ABNF definition for 'comment' is amended as follows:
          362 
          363     comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")"
          364 
          365     A "Q"-encoded 'encoded-word' which appears in a 'comment' MUST NOT
          366     contain the characters "(", ")" or "
          367     'encoded-word' that appears in a 'comment' MUST be separated from
          368     any adjacent 'encoded-word' or 'ctext' by 'linear-white-space'.
          369 
          370     It is important to note that 'comment's are only recognized inside
          371     "structured" field bodies.  In fields whose bodies are defined as
          372     '*text', "(" and ")" are treated as ordinary characters rather than
          373     comment delimiters, and rule (1) of this section applies.  (See RFC
          374     822, sections 3.1.2 and 3.1.3)
          375 
          376 (3) As a replacement for a 'word' entity within a 'phrase', for example,
          377     one that precedes an address in a From, To, or Cc header.  The ABNF
          378     definition for 'phrase' from RFC 822 thus becomes:
          379 
          380     phrase = 1*( encoded-word / word )
          381 
          382     In this case the set of characters that may be used in a "Q"-encoded
          383     'encoded-word' is restricted to: <upper and lower case ASCII
          384     letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_"
          385     (underscore, ASCII 95.)>.  An 'encoded-word' that appears within a
          386     'phrase' MUST be separated from any adjacent 'word', 'text' or
          387     'special' by 'linear-white-space'.
          388 
          389 
          390 
          391 
          392 
          393 
          394 Moore                       Standards Track                     [Page 7]
          395 
          396 RFC 2047               Message Header Extensions           November 1996
          397 
          398 
          399    These are the ONLY locations where an 'encoded-word' may appear.  In
          400    particular:
          401 
          402    + An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'.
          403 
          404    + An 'encoded-word' MUST NOT appear within a 'quoted-string'.
          405 
          406    + An 'encoded-word' MUST NOT be used in a Received header field.
          407 
          408    + An 'encoded-word' MUST NOT be used in parameter of a MIME
          409      Content-Type or Content-Disposition field, or in any structured
          410      field body except within a 'comment' or 'phrase'.
          411 
          412    The 'encoded-text' in an 'encoded-word' must be self-contained;
          413    'encoded-text' MUST NOT be continued from one 'encoded-word' to
          414    another.  This implies that the 'encoded-text' portion of a "B"
          415    'encoded-word' will be a multiple of 4 characters long; for a "Q"
          416    'encoded-word', any "=" character that appears in the 'encoded-text'
          417    portion will be followed by two hexadecimal characters.
          418 
          419    Each 'encoded-word' MUST encode an integral number of octets.  The
          420    'encoded-text' in each 'encoded-word' must be well-formed according
          421    to the encoding specified; the 'encoded-text' may not be continued in
          422    the next 'encoded-word'.  (For example, "=?charset?Q?=?=
          423    =?charset?Q?AB?=" would be illegal, because the two hex digits "AB"
          424    must follow the "=" in the same 'encoded-word'.)
          425 
          426    Each 'encoded-word' MUST represent an integral number of characters.
          427    A multi-octet character may not be split across adjacent 'encoded-
          428    word's.
          429 
          430    Only printable and white space character data should be encoded using
          431    this scheme.  However, since these encoding schemes allow the
          432    encoding of arbitrary octet values, mail readers that implement this
          433    decoding should also ensure that display of the decoded data on the
          434    recipient's terminal will not cause unwanted side-effects.
          435 
          436    Use of these methods to encode non-textual data (e.g., pictures or
          437    sounds) is not defined by this memo.  Use of 'encoded-word's to
          438    represent strings of purely ASCII characters is allowed, but
          439    discouraged.  In rare cases it may be necessary to encode ordinary
          440    text that looks like an 'encoded-word'.
          441 
          442 
          443 
          444 
          445 
          446 
          447 
          448 
          449 
          450 Moore                       Standards Track                     [Page 8]
          451 
          452 RFC 2047               Message Header Extensions           November 1996
          453 
          454 
          455 6. Support of 'encoded-word's by mail readers
          456 
          457 6.1. Recognition of 'encoded-word's in message headers
          458 
          459    A mail reader must parse the message and body part headers according
          460    to the rules in RFC 822 to correctly recognize 'encoded-word's.
          461 
          462    'encoded-word's are to be recognized as follows:
          463 
          464    (1) Any message or body part header field defined as '*text', or any
          465        user-defined header field, should be parsed as follows: Beginning
          466        at the start of the field-body and immediately following each
          467        occurrence of 'linear-white-space', each sequence of up to 75
          468        printable characters (not containing any 'linear-white-space')
          469        should be examined to see if it is an 'encoded-word' according to
          470        the syntax rules in section 2.  Any other sequence of printable
          471        characters should be treated as ordinary ASCII text.
          472 
          473    (2) Any header field not defined as '*text' should be parsed
          474        according to the syntax rules for that header field.  However,
          475        any 'word' that appears within a 'phrase' should be treated as an
          476        'encoded-word' if it meets the syntax rules in section 2.
          477        Otherwise it should be treated as an ordinary 'word'.
          478 
          479    (3) Within a 'comment', any sequence of up to 75 printable characters
          480        (not containing 'linear-white-space'), that meets the syntax
          481        rules in section 2, should be treated as an 'encoded-word'.
          482        Otherwise it should be treated as normal comment text.
          483 
          484    (4) A MIME-Version header field is NOT required to be present for
          485        'encoded-word's to be interpreted according to this
          486        specification.  One reason for this is that the mail reader is
          487        not expected to parse the entire message header before displaying
          488        lines that may contain 'encoded-word's.
          489 
          490 6.2. Display of 'encoded-word's
          491 
          492    Any 'encoded-word's so recognized are decoded, and if possible, the
          493    resulting unencoded text is displayed in the original character set.
          494 
          495    NOTE: Decoding and display of encoded-words occurs *after* a
          496    structured field body is parsed into tokens.  It is therefore
          497    possible to hide 'special' characters in encoded-words which, when
          498    displayed, will be indistinguishable from 'special' characters in the
          499    surrounding text.  For this and other reasons, it is NOT generally
          500    possible to translate a message header containing 'encoded-word's to
          501    an unencoded form which can be parsed by an RFC 822 mail reader.
          502 
          503 
          504 
          505 
          506 Moore                       Standards Track                     [Page 9]
          507 
          508 RFC 2047               Message Header Extensions           November 1996
          509 
          510 
          511    When displaying a particular header field that contains multiple
          512    'encoded-word's, any 'linear-white-space' that separates a pair of
          513    adjacent 'encoded-word's is ignored.  (This is to allow the use of
          514    multiple 'encoded-word's to represent long strings of unencoded text,
          515    without having to separate 'encoded-word's where spaces occur in the
          516    unencoded text.)
          517 
          518    In the event other encodings are defined in the future, and the mail
          519    reader does not support the encoding used, it may either (a) display
          520    the 'encoded-word' as ordinary text, or (b) substitute an appropriate
          521    message indicating that the text could not be decoded.
          522 
          523    If the mail reader does not support the character set used, it may
          524    (a) display the 'encoded-word' as ordinary text (i.e., as it appears
          525    in the header), (b) make a "best effort" to display using such
          526    characters as are available, or (c) substitute an appropriate message
          527    indicating that the decoded text could not be displayed.
          528 
          529    If the character set being used employs code-switching techniques,
          530    display of the encoded text implicitly begins in "ASCII mode".  In
          531    addition, the mail reader must ensure that the output device is once
          532    again in "ASCII mode" after the 'encoded-word' is displayed.
          533 
          534 6.3. Mail reader handling of incorrectly formed 'encoded-word's
          535 
          536    It is possible that an 'encoded-word' that is legal according to the
          537    syntax defined in section 2, is incorrectly formed according to the
          538    rules for the encoding being used.   For example:
          539 
          540    (1) An 'encoded-word' which contains characters which are not legal
          541        for a particular encoding (for example, a "-" in the "B"
          542        encoding, or a SPACE or HTAB in either the "B" or "Q" encoding),
          543        is incorrectly formed.
          544 
          545    (2) Any 'encoded-word' which encodes a non-integral number of
          546        characters or octets is incorrectly formed.
          547 
          548    A mail reader need not attempt to display the text associated with an
          549    'encoded-word' that is incorrectly formed.  However, a mail reader
          550    MUST NOT prevent the display or handling of a message because an
          551    'encoded-word' is incorrectly formed.
          552 
          553 7. Conformance
          554 
          555    A mail composing program claiming compliance with this specification
          556    MUST ensure that any string of non-white-space printable ASCII
          557    characters within a '*text' or '*ctext' that begins with "=?" and
          558    ends with "?=" be a valid 'encoded-word'.  ("begins" means: at the
          559 
          560 
          561 
          562 Moore                       Standards Track                    [Page 10]
          563 
          564 RFC 2047               Message Header Extensions           November 1996
          565 
          566 
          567    start of the field-body, immediately following 'linear-white-space',
          568    or immediately following a "(" for an 'encoded-word' within '*ctext';
          569    "ends" means: at the end of the field-body, immediately preceding
          570    'linear-white-space', or immediately preceding a ")" for an
          571    'encoded-word' within '*ctext'.)  In addition, any 'word' within a
          572    'phrase' that begins with "=?" and ends with "?=" must be a valid
          573    'encoded-word'.
          574 
          575    A mail reading program claiming compliance with this specification
          576    must be able to distinguish 'encoded-word's from 'text', 'ctext', or
          577    'word's, according to the rules in section 6, anytime they appear in
          578    appropriate places in message headers.  It must support both the "B"
          579    and "Q" encodings for any character set which it supports.  The
          580    program must be able to display the unencoded text if the character
          581    set is "US-ASCII".  For the ISO-8859-* character sets, the mail
          582    reading program must at least be able to display the characters which
          583    are also in the ASCII set.
          584 
          585 8. Examples
          586 
          587    The following are examples of message headers containing 'encoded-
          588    word's:
          589 
          590    From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
          591    To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
          592    CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>
          593    Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
          594     =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
          595 
          596       Note: In the first 'encoded-word' of the Subject field above, the
          597       last "=" at the end of the 'encoded-text' is necessary because each
          598       'encoded-word' must be self-contained (the "=" character completes a
          599       group of 4 base64 characters representing 2 octets).  An additional
          600       octet could have been encoded in the first 'encoded-word' (so that
          601       the encoded-word would contain an exact multiple of 3 encoded
          602       octets), except that the second 'encoded-word' uses a different
          603       'charset' than the first one.
          604 
          605    From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef@admin.kth.se>
          606    To: ietf-822@dimacs.rutgers.edu, ojarnef@admin.kth.se
          607    Subject: Time for ISO 10646?
          608 
          609    To: Dave Crocker <dcrocker@mordor.stanford.edu>
          610    Cc: ietf-822@dimacs.rutgers.edu, paf@comsol.se
          611    From: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf@nada.kth.se>
          612    Subject: Re: RFC-HDR care and feeding
          613 
          614 
          615 
          616 
          617 
          618 Moore                       Standards Track                    [Page 11]
          619 
          620 RFC 2047               Message Header Extensions           November 1996
          621 
          622 
          623    From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
          624          (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)
          625    To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>, Ned Freed
          626       <ned@innosoft.com>, Keith Moore <moore@cs.utk.edu>
          627    Subject: Test of new header generator
          628    MIME-Version: 1.0
          629    Content-type: text/plain; charset=ISO-8859-1
          630 
          631    The following examples illustrate how text containing 'encoded-word's
          632    which appear in a structured field body.  The rules are slightly
          633    different for fields defined as '*text' because "(" and ")" are not
          634    recognized as 'comment' delimiters.  [Section 5, paragraph (1)].
          635 
          636    In each of the following examples, if the same sequence were to occur
          637    in a '*text' field, the "displayed as" form would NOT be treated as
          638    encoded words, but be identical to the "encoded form".  This is
          639    because each of the encoded-words in the following examples is
          640    adjacent to a "(" or ")" character.
          641 
          642    encoded form                                displayed as
          643    ---------------------------------------------------------------------
          644    (=?ISO-8859-1?Q?a?=)                        (a)
          645 
          646    (=?ISO-8859-1?Q?a?= b)                      (a b)
          647 
          648            Within a 'comment', white space MUST appear between an
          649            'encoded-word' and surrounding text.  [Section 5,
          650            paragraph (2)].  However, white space is not needed between
          651            the initial "(" that begins the 'comment', and the
          652            'encoded-word'.
          653 
          654 
          655    (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=)     (ab)
          656 
          657            White space between adjacent 'encoded-word's is not
          658            displayed.
          659 
          660    (=?ISO-8859-1?Q?a?=  =?ISO-8859-1?Q?b?=)    (ab)
          661 
          662         Even multiple SPACEs between 'encoded-word's are ignored
          663         for the purpose of display.
          664 
          665    (=?ISO-8859-1?Q?a?=                         (ab)
          666        =?ISO-8859-1?Q?b?=)
          667 
          668            Any amount of linear-space-white between 'encoded-word's,
          669            even if it includes a CRLF followed by one or more SPACEs,
          670            is ignored for the purposes of display.
          671 
          672 
          673 
          674 Moore                       Standards Track                    [Page 12]
          675 
          676 RFC 2047               Message Header Extensions           November 1996
          677 
          678 
          679    (=?ISO-8859-1?Q?a_b?=)                      (a b)
          680 
          681            In order to cause a SPACE to be displayed within a portion
          682            of encoded text, the SPACE MUST be encoded as part of the
          683            'encoded-word'.
          684 
          685    (=?ISO-8859-1?Q?a?= =?ISO-8859-2?Q?_b?=)    (a b)
          686 
          687            In order to cause a SPACE to be displayed between two strings
          688            of encoded text, the SPACE MAY be encoded as part of one of
          689            the 'encoded-word's.
          690 
          691 9. References
          692 
          693    [RFC 822] Crocker, D., "Standard for the Format of ARPA Internet Text
          694        Messages", STD 11, RFC 822, UDEL, August 1982.
          695 
          696    [RFC 2049] Borenstein, N., and N. Freed, "Multipurpose Internet Mail
          697        Extensions (MIME) Part Five: Conformance Criteria and Examples",
          698        RFC 2049, November 1996.
          699 
          700    [RFC 2045] Borenstein, N., and N. Freed, "Multipurpose Internet Mail
          701        Extensions (MIME) Part One: Format of Internet Message Bodies",
          702        RFC 2045, November 1996.
          703 
          704    [RFC 2046] Borenstein N., and N. Freed, "Multipurpose Internet Mail
          705        Extensions (MIME) Part Two: Media Types", RFC 2046,
          706        November 1996.
          707 
          708    [RFC 2048] Freed, N., Klensin, J., and J. Postel, "Multipurpose
          709        Internet Mail Extensions (MIME) Part Four: Registration
          710        Procedures", RFC 2048, November 1996.
          711 
          712 
          713 
          714 
          715 
          716 
          717 
          718 
          719 
          720 
          721 
          722 
          723 
          724 
          725 
          726 
          727 
          728 
          729 
          730 Moore                       Standards Track                    [Page 13]
          731 
          732 RFC 2047               Message Header Extensions           November 1996
          733 
          734 
          735 10. Security Considerations
          736 
          737    Security issues are not discussed in this memo.
          738 
          739 11. Acknowledgements
          740 
          741    The author wishes to thank Nathaniel Borenstein, Issac Chan, Lutz
          742    Donnerhacke, Paul Eggert, Ned Freed, Andreas M. Kirchwitz, Olle
          743    Jarnefors, Mike Rosin, Yutaka Sato, Bart Schaefer, and Kazuhiko
          744    Yamamoto, for their helpful advice, insightful comments, and
          745    illuminating questions in response to earlier versions of this
          746    specification.
          747 
          748 12. Author's Address
          749 
          750    Keith Moore
          751    University of Tennessee
          752    107 Ayres Hall
          753    Knoxville TN 37996-1301
          754 
          755    EMail: moore@cs.utk.edu
          756 
          757 
          758 
          759 
          760 
          761 
          762 
          763 
          764 
          765 
          766 
          767 
          768 
          769 
          770 
          771 
          772 
          773 
          774 
          775 
          776 
          777 
          778 
          779 
          780 
          781 
          782 
          783 
          784 
          785 
          786 Moore                       Standards Track                    [Page 14]
          787 
          788 RFC 2047               Message Header Extensions           November 1996
          789 
          790 
          791 Appendix - changes since RFC 1522 (in no particular order)
          792 
          793    + explicitly state that the MIME-Version is not requried to use
          794      'encoded-word's.
          795 
          796    + add explicit note that SPACEs and TABs are not allowed within
          797      'encoded-word's, explaining that an 'encoded-word' must look like an
          798      'atom' to an RFC822 parser.values, to be precise).
          799 
          800    + add examples from Olle Jarnefors (thanks!) which illustrate how
          801      encoded-words with adjacent linear-white-space are displayed.
          802 
          803    + explicitly list terms defined in RFC822 and referenced in this memo
          804 
          805    + fix transcription typos that caused one or two lines and a couple of
          806      characters to disappear in the resulting text, due to nroff quirks.
          807 
          808    + clarify that encoded-words are allowed in '*text' fields in both
          809      RFC822 headers and MIME body part headers, but NOT as parameter
          810      values.
          811 
          812    + clarify the requirement to switch back to ASCII within the encoded
          813      portion of an 'encoded-word', for any charset that uses code switching
          814      sequences.
          815 
          816    + add a note about 'encoded-word's being delimited by "(" and ")"
          817      within a comment, but not in a *text (how bizarre!).
          818 
          819    + fix the Andre Pirard example to get rid of the trailing "_" after
          820      the =E9.  (no longer needed post-1342).
          821 
          822    + clarification: an 'encoded-word' may appear immediately following
          823      the initial "(" or immediately before the final ")" that delimits a
          824      comment, not just adjacent to "(" and ")" *within* *ctext.
          825 
          826    + add a note to explain that a "B" 'encoded-word' will always have a
          827      multiple of 4 characters in the 'encoded-text' portion.
          828 
          829    + add note about the "=" in the examples
          830 
          831    + note that processing of 'encoded-word's occurs *after* parsing, and
          832      some of the implications thereof.
          833 
          834    + explicitly state that you can't expect to translate between
          835      1522 and either vanilla 822 or so-called "8-bit headers".
          836 
          837    + explicitly state that 'encoded-word's are not valid within a
          838      'quoted-string'.
          839 
          840 
          841 
          842 Moore                       Standards Track                    [Page 15]
          843