rfc2231.txt - rohrpost - A commandline mail client to change the world as we see it.
 (HTM) git clone git://r-36.net/rohrpost
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       rfc2231.txt (19280B)
       ---
            1 
            2 
            3 
            4 
            5 
            6 
            7 Network Working Group                                         N. Freed
            8 Request for Comments: 2231                                    Innosoft
            9 Updates: 2045, 2047, 2183                                     K. Moore
           10 Obsoletes: 2184                                University of Tennessee
           11 Category: Standards Track                                November 1997
           12 
           13 
           14            MIME Parameter Value and Encoded Word Extensions:
           15               Character Sets, Languages, and Continuations
           16 
           17 
           18 Status of this Memo
           19 
           20    This document specifies an Internet standards track protocol for the
           21    Internet community, and requests discussion and suggestions for
           22    improvements.  Please refer to the current edition of the "Internet
           23    Official Protocol Standards" (STD 1) for the standardization state
           24    and status of this protocol.  Distribution of this memo is unlimited.
           25 
           26 Copyright Notice
           27 
           28    Copyright (C) The Internet Society (1997).  All Rights Reserved.
           29 
           30 1.  Abstract
           31 
           32    This memo defines extensions to the RFC 2045 media type and RFC 2183
           33    disposition parameter value mechanisms to provide
           34 
           35     (1)   a means to specify parameter values in character sets
           36           other than US-ASCII,
           37 
           38     (2)   to specify the language to be used should the value be
           39           displayed, and
           40 
           41     (3)   a continuation mechanism for long parameter values to
           42           avoid problems with header line wrapping.
           43 
           44    This memo also defines an extension to the encoded words defined in
           45    RFC 2047 to allow the specification of the language to be used for
           46    display as well as the character set.
           47 
           48 2.  Introduction
           49 
           50    The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, RFC-
           51    2046, RFC-2047, RFC-2048, RFC-2049], define a message format that
           52    allows for:
           53 
           54 
           55 
           56 
           57 
           58 Freed & Moore               Standards Track                     [Page 1]
           59 
           60 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
           61 
           62 
           63     (1)   textual message bodies in character sets other than
           64           US-ASCII,
           65 
           66     (2)   non-textual message bodies,
           67 
           68     (3)   multi-part message bodies, and
           69 
           70     (4)   textual header information in character sets other than
           71           US-ASCII.
           72 
           73    MIME is now widely deployed and is used by a variety of Internet
           74    protocols, including, of course, Internet email.  However, MIME's
           75    success has resulted in the need for additional mechanisms that were
           76    not provided in the original protocol specification.
           77 
           78    In particular, existing MIME mechanisms provide for named media type
           79    (content-type field) parameters as well as named disposition
           80    (content-disposition field).  A MIME media type may specify any
           81    number of parameters associated with all of its subtypes, and any
           82    specific subtype may specify additional parameters for its own use. A
           83    MIME disposition value may specify any number of associated
           84    parameters, the most important of which is probably the attachment
           85    disposition's filename parameter.
           86 
           87    These parameter names and values end up appearing in the content-type
           88    and content-disposition header fields in Internet email.  This
           89    inherently imposes three crucial limitations:
           90 
           91     (1)   Lines in Internet email header fields are folded
           92           according to RFC 822 folding rules.  This makes long
           93           parameter values problematic.
           94 
           95     (2)   MIME headers, like the RFC 822 headers they often
           96           appear in, are limited to 7bit US-ASCII, and the
           97           encoded-word mechanisms of RFC 2047 are not available
           98           to parameter values.  This makes it impossible to have
           99           parameter values in character sets other than US-ASCII
          100           without specifying some sort of private per-parameter
          101           encoding.
          102 
          103     (3)   It has recently become clear that character set
          104           information is not sufficient to properly display some
          105           sorts of information -- language information is also
          106           needed [RFC-2130].  For example, support for
          107           handicapped users may require reading text string
          108 
          109 
          110 
          111 
          112 
          113 
          114 Freed & Moore               Standards Track                     [Page 2]
          115 
          116 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
          117 
          118 
          119           aloud. The language the text is written in is needed
          120           for this to be done correctly.  Some parameter values
          121           may need to be displayed, hence there is a need to
          122           allow for the inclusion of language information.
          123 
          124    The last problem on this list is also an issue for the encoded words
          125    defined by RFC 2047, as encoded words are intended primarily for
          126    display purposes.
          127 
          128    This document defines extensions that address all of these
          129    limitations. All of these extensions are implemented in a fashion
          130    that is completely compatible at a syntactic level with existing MIME
          131    implementations. In addition, the extensions are designed to have as
          132    little impact as possible on existing uses of MIME.
          133 
          134    IMPORTANT NOTE:  These mechanisms end up being somewhat gibbous when
          135    they actually are used. As such, these mechanisms should not be used
          136    lightly; they should be reserved for situations where a real need for
          137    them exists.
          138 
          139 2.1.  Requirements notation
          140 
          141    This document occasionally uses terms that appear in capital letters.
          142    When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
          143    appear capitalized, they are being used to indicate particular
          144    requirements of this specification. A discussion of the meanings of
          145    these terms appears in [RFC- 2119].
          146 
          147 3.  Parameter Value Continuations
          148 
          149    Long MIME media type or disposition parameter values do not interact
          150    well with header line wrapping conventions.  In particular, proper
          151    header line wrapping depends on there being places where linear
          152    whitespace (LWSP) is allowed, which may or may not be present in a
          153    parameter value, and even if present may not be recognizable as such
          154    since specific knowledge of parameter value syntax may not be
          155    available to the agent doing the line wrapping. The result is that
          156    long parameter values may end up getting truncated or otherwise
          157    damaged by incorrect line wrapping implementations.
          158 
          159    A mechanism is therefore needed to break up parameter values into
          160    smaller units that are amenable to line wrapping. Any such mechanism
          161    MUST be compatible with existing MIME processors. This means that
          162 
          163     (1)   the mechanism MUST NOT change the syntax of MIME media
          164           type and disposition lines, and
          165 
          166 
          167 
          168 
          169 
          170 Freed & Moore               Standards Track                     [Page 3]
          171 
          172 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
          173 
          174 
          175     (2)   the mechanism MUST NOT depend on parameter ordering
          176           since MIME states that parameters are not order
          177           sensitive.  Note that while MIME does prohibit
          178           modification of MIME headers during transport, it is
          179           still possible that parameters will be reordered when
          180           user agent level processing is done.
          181 
          182    The obvious solution, then, is to use multiple parameters to contain
          183    a single parameter value and to use some kind of distinguished name
          184    to indicate when this is being done.  And this obvious solution is
          185    exactly what is specified here: The asterisk character ("*") followed
          186    by a decimal count is employed to indicate that multiple parameters
          187    are being used to encapsulate a single parameter value.  The count
          188    starts at 0 and increments by 1 for each subsequent section of the
          189    parameter value.  Decimal values are used and neither leading zeroes
          190    nor gaps in the sequence are allowed.
          191 
          192    The original parameter value is recovered by concatenating the
          193    various sections of the parameter, in order.  For example, the
          194    content-type field
          195 
          196         Content-Type: message/external-body; access-type=URL;
          197          URL*0="ftp://";
          198          URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
          199 
          200    is semantically identical to
          201 
          202         Content-Type: message/external-body; access-type=URL;
          203           URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
          204 
          205    Note that quotes around parameter values are part of the value
          206    syntax; they are NOT part of the value itself.  Furthermore, it is
          207    explicitly permitted to have a mixture of quoted and unquoted
          208    continuation fields.
          209 
          210 4.  Parameter Value Character Set and Language Information
          211 
          212    Some parameter values may need to be qualified with character set or
          213    language information.  It is clear that a distinguished parameter
          214    name is needed to identify when this information is present along
          215    with a specific syntax for the information in the value itself.  In
          216    addition, a lightweight encoding mechanism is needed to accommodate 8
          217    bit information in parameter values.
          218 
          219 
          220 
          221 
          222 
          223 
          224 
          225 
          226 Freed & Moore               Standards Track                     [Page 4]
          227 
          228 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
          229 
          230 
          231    Asterisks ("*") are reused to provide the indicator that language and
          232    character set information is present and encoding is being used. A
          233    single quote ("'") is used to delimit the character set and language
          234    information at the beginning of the parameter value. Percent signs
          235    ("%") are used as the encoding flag, which agrees with RFC 2047.
          236 
          237    Specifically, an asterisk at the end of a parameter name acts as an
          238    indicator that character set and language information may appear at
          239    the beginning of the parameter value. A single quote is used to
          240    separate the character set, language, and actual value information in
          241    the parameter value string, and an percent sign is used to flag
          242    octets encoded in hexadecimal.  For example:
          243 
          244         Content-Type: application/x-stuff;
          245          title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
          246 
          247    Note that it is perfectly permissible to leave either the character
          248    set or language field blank.  Note also that the single quote
          249    delimiters MUST be present even when one of the field values is
          250    omitted.  This is done when either character set, language, or both
          251    are not relevant to the parameter value at hand.  This MUST NOT be
          252    done in order to indicate a default character set or language --
          253    parameter field definitions MUST NOT assign a default character set
          254    or language.
          255 
          256 4.1.  Combining Character Set, Language, and Parameter Continuations
          257 
          258    Character set and language information may be combined with the
          259    parameter continuation mechanism. For example:
          260 
          261    Content-Type: application/x-stuff
          262     title*0*=us-ascii'en'This%20is%20even%20more%20
          263     title*1*=%2A%2A%2Afun%2A%2A%2A%20
          264     title*2="isn't it!"
          265 
          266    Note that:
          267 
          268     (1)   Language and character set information only appear at
          269           the beginning of a given parameter value.
          270 
          271     (2)   Continuations do not provide a facility for using more
          272           than one character set or language in the same
          273           parameter value.
          274 
          275     (3)   A value presented using multiple continuations may
          276           contain a mixture of encoded and unencoded segments.
          277 
          278 
          279 
          280 
          281 
          282 Freed & Moore               Standards Track                     [Page 5]
          283 
          284 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
          285 
          286 
          287     (4)   The first segment of a continuation MUST be encoded if
          288           language and character set information are given.
          289 
          290     (5)   If the first segment of a continued parameter value is
          291           encoded the language and character set field delimiters
          292           MUST be present even when the fields are left blank.
          293 
          294 5.  Language specification in Encoded Words
          295 
          296    RFC 2047 provides support for non-US-ASCII character sets in RFC 822
          297    message header comments, phrases, and any unstructured text field.
          298    This is done by defining an encoded word construct which can appear
          299    in any of these places.  Given that these are fields intended for
          300    display, it is sometimes necessary to associate language information
          301    with encoded words as well as just the character set.  This
          302    specification extends the definition of an encoded word to allow the
          303    inclusion of such information.  This is simply done by suffixing the
          304    character set specification with an asterisk followed by the language
          305    tag.  For example:
          306 
          307           From: =?US-ASCII*EN?Q?Keith_Moore?= <moore@cs.utk.edu>
          308 
          309 6.  IMAP4 Handling of Parameter Values
          310 
          311    IMAP4 [RFC-2060] servers SHOULD decode parameter value continuations
          312    when generating the BODY and BODYSTRUCTURE fetch attributes.
          313 
          314 7.  Modifications to MIME ABNF
          315 
          316    The ABNF for MIME parameter values given in RFC 2045 is:
          317 
          318    parameter := attribute "=" value
          319 
          320    attribute := token
          321                 ; Matching of attributes
          322                 ; is ALWAYS case-insensitive.
          323 
          324    This specification changes this ABNF to:
          325 
          326    parameter := regular-parameter / extended-parameter
          327 
          328    regular-parameter := regular-parameter-name "=" value
          329 
          330    regular-parameter-name := attribute [section]
          331 
          332    attribute := 1*attribute-char
          333 
          334 
          335 
          336 
          337 
          338 Freed & Moore               Standards Track                     [Page 6]
          339 
          340 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
          341 
          342 
          343    attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs,
          344                      "*", "'", "%", or tspecials>
          345 
          346    section := initial-section / other-sections
          347 
          348    initial-section := "*0"
          349 
          350    other-sections := "*" ("1" / "2" / "3" / "4" / "5" /
          351                           "6" / "7" / "8" / "9") *DIGIT)
          352 
          353    extended-parameter := (extended-initial-name "="
          354                           extended-value) /
          355                          (extended-other-names "="
          356                           extended-other-values)
          357 
          358    extended-initial-name := attribute [initial-section] "*"
          359 
          360    extended-other-names := attribute other-sections "*"
          361 
          362    extended-initial-value := [charset] "'" [language] "'"
          363                              extended-other-values
          364 
          365    extended-other-values := *(ext-octet / attribute-char)
          366 
          367    ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
          368 
          369    charset := <registered character set name>
          370 
          371    language := <registered language tag [RFC-1766]>
          372 
          373    The ABNF given in RFC 2047 for encoded-words is:
          374 
          375    encoded-word := "=?" charset "?" encoding "?" encoded-text "?="
          376 
          377    This specification changes this ABNF to:
          378 
          379    encoded-word := "=?" charset ["*" language] "?" encoded-text "?="
          380 
          381 8.  Character sets which allow specification of language
          382 
          383    In the future it is likely that some character sets will provide
          384    facilities for inline language labeling. Such facilities are
          385    inherently more flexible than those defined here as they allow for
          386    language switching in the middle of a string.
          387 
          388 
          389 
          390 
          391 
          392 
          393 
          394 Freed & Moore               Standards Track                     [Page 7]
          395 
          396 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
          397 
          398 
          399    If and when such facilities are developed they SHOULD be used in
          400    preference to the language labeling facilities specified here. Note
          401    that all the mechanisms defined here allow for the omission of
          402    language labels so as to be able to accommodate this possible future
          403    usage.
          404 
          405 9.  Security Considerations
          406 
          407    This RFC does not discuss security issues and is not believed to
          408    raise any security issues not already endemic in electronic mail and
          409    present in fully conforming implementations of MIME.
          410 
          411 10.  References
          412 
          413    [RFC-822]
          414         Crocker, D., "Standard for the Format of ARPA Internet
          415         Text Messages", STD 11, RFC 822 August 1982.
          416 
          417    [RFC-1766]
          418         Alvestrand, H., "Tags for the Identification of
          419         Languages", RFC 1766, March 1995.
          420 
          421    [RFC-2045]
          422         Freed, N., and N. Borenstein, "Multipurpose Internet Mail
          423         Extensions (MIME) Part One: Format of Internet Message
          424         Bodies", RFC 2045, December 1996.
          425 
          426    [RFC-2046]
          427         Freed, N. and N. Borenstein, "Multipurpose Internet Mail
          428         Extensions (MIME) Part Two: Media Types", RFC 2046,
          429         December 1996.
          430 
          431    [RFC-2047]
          432         Moore, K., "Multipurpose Internet Mail Extensions (MIME)
          433         Part Three: Representation of Non-ASCII Text in Internet
          434         Message Headers", RFC 2047, December 1996.
          435 
          436    [RFC-2048]
          437         Freed, N., Klensin, J. and J. Postel, "Multipurpose
          438         Internet Mail Extensions (MIME) Part Four: MIME
          439         Registration Procedures", RFC 2048, December 1996.
          440 
          441    [RFC-2049]
          442         Freed, N. and N. Borenstein, "Multipurpose Internet Mail
          443         Extensions (MIME) Part Five: Conformance Criteria and
          444         Examples", RFC 2049, December 1996.
          445 
          446 
          447 
          448 
          449 
          450 Freed & Moore               Standards Track                     [Page 8]
          451 
          452 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
          453 
          454 
          455    [RFC-2060]
          456         Crispin, M., "Internet Message Access Protocol - Version
          457         4rev1", RFC 2060, December 1996.
          458 
          459    [RFC-2119]
          460         Bradner, S., "Key words for use in RFCs to Indicate
          461         Requirement Levels", RFC 2119, March 1997.
          462 
          463    [RFC-2130]
          464         Weider, C., Preston, C., Simonsen, K., Alvestrand, H.,
          465         Atkinson, R., Crispin, M., and P. Svanberg, "Report from the
          466         IAB Character Set Workshop", RFC 2130, April 1997.
          467 
          468    [RFC-2183]
          469         Troost, R., Dorner, S. and K. Moore, "Communicating
          470         Presentation Information in Internet Messages:  The
          471         Content-Disposition Header", RFC 2183, August 1997.
          472 
          473 11.  Authors' Addresses
          474 
          475    Ned Freed
          476    Innosoft International, Inc.
          477    1050 Lakes Drive
          478    West Covina, CA 91790
          479    USA
          480 
          481    Phone: +1 626 919 3600
          482    Fax:   +1 626 919 3614
          483    EMail: ned.freed@innosoft.com
          484 
          485 
          486    Keith Moore
          487    Computer Science Dept.
          488    University of Tennessee
          489    107 Ayres Hall
          490    Knoxville, TN 37996-1301
          491    USA
          492 
          493    EMail: moore@cs.utk.edu
          494 
          495 
          496 
          497 
          498 
          499 
          500 
          501 
          502 
          503 
          504 
          505 
          506 Freed & Moore               Standards Track                     [Page 9]
          507 
          508 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
          509 
          510 
          511 12.  Full Copyright Statement
          512 
          513    Copyright (C) The Internet Society (1997).  All Rights Reserved.
          514 
          515    This document and translations of it may be copied and furnished to
          516    others, and derivative works that comment on or otherwise explain it
          517    or assist in its implementation may be prepared, copied, published
          518    and distributed, in whole or in part, without restriction of any
          519    kind, provided that the above copyright notice and this paragraph are
          520    included on all such copies and derivative works.  However, this
          521    document itself may not be modified in any way, such as by removing
          522    the copyright notice or references to the Internet Society or other
          523    Internet organizations, except as needed for the purpose of
          524    developing Internet standards in which case the procedures for
          525    copyrights defined in the Internet Standards process must be
          526    followed, or as required to translate it into languages other than
          527    English.
          528 
          529    The limited permissions granted above are perpetual and will not be
          530    revoked by the Internet Society or its successors or assigns.
          531 
          532    This document and the information contained herein is provided on an
          533    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
          534    TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
          535    BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
          536    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
          537    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
          538 
          539 
          540 
          541 
          542 
          543 
          544 
          545 
          546 
          547 
          548 
          549 
          550 
          551 
          552 
          553 
          554 
          555 
          556 
          557 
          558 
          559 
          560 
          561 
          562 Freed & Moore               Standards Track                    [Page 10]
          563