rfc1341.txt - rohrpost - A commandline mail client to change the world as we see it.
(HTM) git clone git://r-36.net/rohrpost
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
rfc1341.txt (211117B)
---
1
2
3
4
5
6
7 Network Working Group N. Borenstein, Bellcore
8 Request for Comments: 1341 N. Freed, Innosoft
9 June 1992
10
11
12
13 MIME (Multipurpose Internet Mail Extensions):
14
15
16 Mechanisms for Specifying and Describing
17 the Format of Internet Message Bodies
18
19
20 Status of this Memo
21
22 This RFC specifies an IAB standards track protocol for the
23 Internet community, and requests discussion and suggestions
24 for improvements. Please refer to the current edition of
25 the "IAB Official Protocol Standards" for the
26 standardization state and status of this protocol.
27 Distribution of this memo is unlimited.
28
29 Abstract
30
31 RFC 822 defines a message representation protocol which
32 specifies considerable detail about message headers, but
33 which leaves the message content, or message body, as flat
34 ASCII text. This document redefines the format of message
35 bodies to allow multi-part textual and non-textual message
36 bodies to be represented and exchanged without loss of
37 information. This is based on earlier work documented in
38 RFC 934 and RFC 1049, but extends and revises that work.
39 Because RFC 822 said so little about message bodies, this
40 document is largely orthogonal to (rather than a revision
41 of) RFC 822.
42
43 In particular, this document is designed to provide
44 facilities to include multiple objects in a single message,
45 to represent body text in character sets other than US-
46 ASCII, to represent formatted multi-font text messages, to
47 represent non-textual material such as images and audio
48 fragments, and generally to facilitate later extensions
49 defining new types of Internet mail for use by cooperating
50 mail agents.
51
52 This document does NOT extend Internet mail header fields to
53 permit anything other than US-ASCII text data. It is
54 recognized that such extensions are necessary, and they are
55 the subject of a companion document [RFC -1342].
56
57 A table of contents appears at the end of this document.
58
59
60
61
62
63
64 Borenstein & Freed [Page i]
65
66
67
68
69
70
71
72 1 Introduction
73
74 Since its publication in 1982, RFC 822 [RFC-822] has defined
75 the standard format of textual mail messages on the
76 Internet. Its success has been such that the RFC 822 format
77 has been adopted, wholly or partially, well beyond the
78 confines of the Internet and the Internet SMTP transport
79 defined by RFC 821 [RFC-821]. As the format has seen wider
80 use, a number of limitations have proven increasingly
81 restrictive for the user community.
82
83 RFC 822 was intended to specify a format for text messages.
84 As such, non-text messages, such as multimedia messages that
85 might include audio or images, are simply not mentioned.
86 Even in the case of text, however, RFC 822 is inadequate for
87 the needs of mail users whose languages require the use of
88 character sets richer than US ASCII [US-ASCII]. Since RFC
89 822 does not specify mechanisms for mail containing audio,
90 video, Asian language text, or even text in most European
91 languages, additional specifications are needed
92
93 One of the notable limitations of RFC 821/822 based mail
94 systems is the fact that they limit the contents of
95 electronic mail messages to relatively short lines of
96 seven-bit ASCII. This forces users to convert any non-
97 textual data that they may wish to send into seven-bit bytes
98 representable as printable ASCII characters before invoking
99 a local mail UA (User Agent, a program with which human
100 users send and receive mail). Examples of such encodings
101 currently used in the Internet include pure hexadecimal,
102 uuencode, the 3-in-4 base 64 scheme specified in RFC 1113,
103 the Andrew Toolkit Representation [ATK], and many others.
104
105 The limitations of RFC 822 mail become even more apparent as
106 gateways are designed to allow for the exchange of mail
107 messages between RFC 822 hosts and X.400 hosts. X.400 [X400]
108 specifies mechanisms for the inclusion of non-textual body
109 parts within electronic mail messages. The current
110 standards for the mapping of X.400 messages to RFC 822
111 messages specify that either X.400 non-textual body parts
112 should be converted to (not encoded in) an ASCII format, or
113 that they should be discarded, notifying the RFC 822 user
114 that discarding has occurred. This is clearly undesirable,
115 as information that a user may wish to receive is lost.
116 Even though a user's UA may not have the capability of
117 dealing with the non-textual body part, the user might have
118 some mechanism external to the UA that can extract useful
119 information from the body part. Moreover, it does not allow
120 for the fact that the message may eventually be gatewayed
121 back into an X.400 message handling system (i.e., the X.400
122 message is "tunneled" through Internet mail), where the
123 non-textual information would definitely become useful
124 again.
125
126
127
128
129 Borenstein & Freed [Page 1]
130
131
132
133
134 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
135
136
137 This document describes several mechanisms that combine to
138 solve most of these problems without introducing any serious
139 incompatibilities with the existing world of RFC 822 mail.
140 In particular, it describes:
141
142 1. A MIME-Version header field, which uses a version number
143 to declare a message to be conformant with this
144 specification and allows mail processing agents to
145 distinguish between such messages and those generated
146 by older or non-conformant software, which is presumed
147 to lack such a field.
148
149 2. A Content-Type header field, generalized from RFC 1049
150 [RFC-1049], which can be used to specify the type and
151 subtype of data in the body of a message and to fully
152 specify the native representation (encoding) of such
153 data.
154
155 2.a. A "text" Content-Type value, which can be used to
156 represent textual information in a number of
157 character sets and formatted text description
158 languages in a standardized manner.
159
160 2.b. A "multipart" Content-Type value, which can be
161 used to combine several body parts, possibly of
162 differing types of data, into a single message.
163
164 2.c. An "application" Content-Type value, which can be
165 used to transmit application data or binary data,
166 and hence, among other uses, to implement an
167 electronic mail file transfer service.
168
169 2.d. A "message" Content-Type value, for encapsulating
170 a mail message.
171
172 2.e An "image" Content-Type value, for transmitting
173 still image (picture) data.
174
175 2.f. An "audio" Content-Type value, for transmitting
176 audio or voice data.
177
178 2.g. A "video" Content-Type value, for transmitting
179 video or moving image data, possibly with audio as
180 part of the composite video data format.
181
182 3. A Content-Transfer-Encoding header field, which can be
183 used to specify an auxiliary encoding that was applied
184 to the data in order to allow it to pass through mail
185 transport mechanisms which may have data or character
186 set limitations.
187
188 4. Two optional header fields that can be used to further
189 describe the data in a message body, the Content-ID and
190 Content-Description header fields.
191
192
193
194 Borenstein & Freed [Page 2]
195
196
197
198
199 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
200
201
202 MIME has been carefully designed as an extensible mechanism,
203 and it is expected that the set of content-type/subtype
204 pairs and their associated parameters will grow
205 significantly with time. Several other MIME fields, notably
206 including character set names, are likely to have new values
207 defined over time. In order to ensure that the set of such
208 values is developed in an orderly, well-specified, and
209 public manner, MIME defines a registration process which
210 uses the Internet Assigned Numbers Authority (IANA) as a
211 central registry for such values. Appendix F provides
212 details about how IANA registration is accomplished.
213
214 Finally, to specify and promote interoperability, Appendix A
215 of this document provides a basic applicability statement
216 for a subset of the above mechanisms that defines a minimal
217 level of "conformance" with this document.
218
219 HISTORICAL NOTE: Several of the mechanisms described in
220 this document may seem somewhat strange or even baroque at
221 first reading. It is important to note that compatibility
222 with existing standards AND robustness across existing
223 practice were two of the highest priorities of the working
224 group that developed this document. In particular,
225 compatibility was always favored over elegance.
226
227 2 Notations, Conventions, and Generic BNF Grammar
228
229 This document is being published in two versions, one as
230 plain ASCII text and one as PostScript. The latter is
231 recommended, though the textual contents are identical. An
232 Andrew-format copy of this document is also available from
233 the first author (Borenstein).
234
235 Although the mechanisms specified in this document are all
236 described in prose, most are also described formally in the
237 modified BNF notation of RFC 822. Implementors will need to
238 be familiar with this notation in order to understand this
239 specification, and are referred to RFC 822 for a complete
240 explanation of the modified BNF notation.
241
242 Some of the modified BNF in this document makes reference to
243 syntactic entities that are defined in RFC 822 and not in
244 this document. A complete formal grammar, then, is obtained
245 by combining the collected grammar appendix of this document
246 with that of RFC 822.
247
248 The term CRLF, in this document, refers to the sequence of
249 the two ASCII characters CR (13) and LF (10) which, taken
250 together, in this order, denote a line break in RFC 822
251 mail.
252
253 The term "character set", wherever it is used in this
254 document, refers to a coded character set, in the sense of
255 ISO character set standardization work, and must not be
256
257
258
259 Borenstein & Freed [Page 3]
260
261
262
263
264 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
265
266
267 misinterpreted as meaning "a set of characters."
268
269 The term "message", when not further qualified, means either
270 the (complete or "top-level") message being transferred on a
271 network, or a message encapsulated in a body of type
272 "message".
273
274 The term "body part", in this document, means one of the
275 parts of the body of a multipart entity. A body part has a
276 header and a body, so it makes sense to speak about the body
277 of a body part.
278
279 The term "entity", in this document, means either a message
280 or a body part. All kinds of entities share the property
281 that they have a header and a body.
282
283 The term "body", when not further qualified, means the body
284 of an entity, that is the body of either a message or of a
285 body part.
286
287 Note : the previous four definitions are clearly circular.
288 This is unavoidable, since the overal structure of a MIME
289 message is indeed recursive.
290
291 In this document, all numeric and octet values are given in
292 decimal notation.
293
294 It must be noted that Content-Type values, subtypes, and
295 parameter names as defined in this document are case-
296 insensitive. However, parameter values are case-sensitive
297 unless otherwise specified for the specific parameter.
298
299 FORMATTING NOTE: This document has been carefully formatted
300 for ease of reading. The PostScript version of this
301 document, in particular, places notes like this one, which
302 may be skipped by the reader, in a smaller, italicized,
303 font, and indents it as well. In the text version, only the
304 indentation is preserved, so if you are reading the text
305 version of this you might consider using the PostScript
306 version instead. However, all such notes will be indented
307 and preceded by "NOTE:" or some similar introduction, even
308 in the text version.
309
310 The primary purpose of these non-essential notes is to
311 convey information about the rationale of this document, or
312 to place this document in the proper historical or
313 evolutionary context. Such information may be skipped by
314 those who are focused entirely on building a compliant
315 implementation, but may be of use to those who wish to
316 understand why this document is written as it is.
317
318 For ease of recognition, all BNF definitions have been
319 placed in a fixed-width font in the PostScript version of
320 this document.
321
322
323
324 Borenstein & Freed [Page 4]
325
326
327
328
329 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
330
331
332 3 The MIME-Version Header Field
333
334 Since RFC 822 was published in 1982, there has really been
335 only one format standard for Internet messages, and there
336 has been little perceived need to declare the format
337 standard in use. This document is an independent document
338 that complements RFC 822. Although the extensions in this
339 document have been defined in such a way as to be compatible
340 with RFC 822, there are still circumstances in which it
341 might be desirable for a mail-processing agent to know
342 whether a message was composed with the new standard in
343 mind.
344
345 Therefore, this document defines a new header field, "MIME-
346 Version", which is to be used to declare the version of the
347 Internet message body format standard in use.
348
349 Messages composed in accordance with this document MUST
350 include such a header field, with the following verbatim
351 text:
352
353 MIME-Version: 1.0
354
355 The presence of this header field is an assertion that the
356 message has been composed in compliance with this document.
357
358 Since it is possible that a future document might extend the
359 message format standard again, a formal BNF is given for the
360 content of the MIME-Version field:
361
362 MIME-Version := text
363
364 Thus, future format specifiers, which might replace or
365 extend "1.0", are (minimally) constrained by the definition
366 of "text", which appears in RFC 822.
367
368 Note that the MIME-Version header field is required at the
369 top level of a message. It is not required for each body
370 part of a multipart entity. It is required for the embedded
371 headers of a body of type "message" if and only if the
372 embedded message is itself claimed to be MIME-compliant.
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389 Borenstein & Freed [Page 5]
390
391
392
393
394 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
395
396
397 4 The Content-Type Header Field
398
399 The purpose of the Content-Type field is to describe the
400 data contained in the body fully enough that the receiving
401 user agent can pick an appropriate agent or mechanism to
402 present the data to the user, or otherwise deal with the
403 data in an appropriate manner.
404
405 HISTORICAL NOTE: The Content-Type header field was first
406 defined in RFC 1049. RFC 1049 Content-types used a simpler
407 and less powerful syntax, but one that is largely compatible
408 with the mechanism given here.
409
410 The Content-Type header field is used to specify the nature
411 of the data in the body of an entity, by giving type and
412 subtype identifiers, and by providing auxiliary information
413 that may be required for certain types. After the type and
414 subtype names, the remainder of the header field is simply a
415 set of parameters, specified in an attribute/value notation.
416 The set of meaningful parameters differs for the different
417 types. The ordering of parameters is not significant.
418 Among the defined parameters is a "charset" parameter by
419 which the character set used in the body may be declared.
420 Comments are allowed in accordance with RFC 822 rules for
421 structured header fields.
422
423 In general, the top-level Content-Type is used to declare
424 the general type of data, while the subtype specifies a
425 specific format for that type of data. Thus, a Content-Type
426 of "image/xyz" is enough to tell a user agent that the data
427 is an image, even if the user agent has no knowledge of the
428 specific image format "xyz". Such information can be used,
429 for example, to decide whether or not to show a user the raw
430 data from an unrecognized subtype -- such an action might be
431 reasonable for unrecognized subtypes of text, but not for
432 unrecognized subtypes of image or audio. For this reason,
433 registered subtypes of audio, image, text, and video, should
434 not contain embedded information that is really of a
435 different type. Such compound types should be represented
436 using the "multipart" or "application" types.
437
438 Parameters are modifiers of the content-subtype, and do not
439 fundamentally affect the requirements of the host system.
440 Although most parameters make sense only with certain
441 content-types, others are "global" in the sense that they
442 might apply to any subtype. For example, the "boundary"
443 parameter makes sense only for the "multipart" content-type,
444 but the "charset" parameter might make sense with several
445 content-types.
446
447 An initial set of seven Content-Types is defined by this
448 document. This set of top-level names is intended to be
449 substantially complete. It is expected that additions to
450 the larger set of supported types can generally be
451
452
453
454 Borenstein & Freed [Page 6]
455
456
457
458
459 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
460
461
462 accomplished by the creation of new subtypes of these
463 initial types. In the future, more top-level types may be
464 defined only by an extension to this standard. If another
465 primary type is to be used for any reason, it must be given
466 a name starting with "X-" to indicate its non-standard
467 status and to avoid a potential conflict with a future
468 official name.
469
470 In the Extended BNF notation of RFC 822, a Content-Type
471 header field value is defined as follows:
472
473 Content-Type := type "/" subtype *[";" parameter]
474
475 type := "application" / "audio"
476 / "image" / "message"
477 / "multipart" / "text"
478 / "video" / x-token
479
480 x-token := <The two characters "X-" followed, with no
481 intervening white space, by any token>
482
483 subtype := token
484
485 parameter := attribute "=" value
486
487 attribute := token
488
489 value := token / quoted-string
490
491 token := 1*<any CHAR except SPACE, CTLs, or tspecials>
492
493 tspecials := "(" / ")" / "<" / ">" / "@" ; Must be in
494 / "," / ";" / ":" / "\" / <"> ; quoted-string,
495 / "/" / "[" / "]" / "?" / "." ; to use within
496 / "=" ; parameter values
497
498 Note that the definition of "tspecials" is the same as the
499 RFC 822 definition of "specials" with the addition of the
500 three characters "/", "?", and "=".
501
502 Note also that a subtype specification is MANDATORY. There
503 are no default subtypes.
504
505 The type, subtype, and parameter names are not case
506 sensitive. For example, TEXT, Text, and TeXt are all
507 equivalent. Parameter values are normally case sensitive,
508 but certain parameters are interpreted to be case-
509 insensitive, depending on the intended use. (For example,
510 multipart boundaries are case-sensitive, but the "access-
511 type" for message/External-body is not case-sensitive.)
512
513 Beyond this syntax, the only constraint on the definition of
514 subtype names is the desire that their uses must not
515 conflict. That is, it would be undesirable to have two
516
517
518
519 Borenstein & Freed [Page 7]
520
521
522
523
524 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
525
526
527 different communities using "Content-Type:
528 application/foobar" to mean two different things. The
529 process of defining new content-subtypes, then, is not
530 intended to be a mechanism for imposing restrictions, but
531 simply a mechanism for publicizing the usages. There are,
532 therefore, two acceptable mechanisms for defining new
533 Content-Type subtypes:
534
535 1. Private values (starting with "X-") may be
536 defined bilaterally between two cooperating
537 agents without outside registration or
538 standardization.
539
540 2. New standard values must be documented,
541 registered with, and approved by IANA, as
542 described in Appendix F. Where intended for
543 public use, the formats they refer to must
544 also be defined by a published specification,
545 and possibly offered for standardization.
546
547 The seven standard initial predefined Content-Types are
548 detailed in the bulk of this document. They are:
549
550 text -- textual information. The primary subtype,
551 "plain", indicates plain (unformatted) text. No
552 special software is required to get the full
553 meaning of the text, aside from support for the
554 indicated character set. Subtypes are to be used
555 for enriched text in forms where application
556 software may enhance the appearance of the text,
557 but such software must not be required in order to
558 get the general idea of the content. Possible
559 subtypes thus include any readable word processor
560 format. A very simple and portable subtype,
561 richtext, is defined in this document.
562 multipart -- data consisting of multiple parts of
563 independent data types. Four initial subtypes
564 are defined, including the primary "mixed"
565 subtype, "alternative" for representing the same
566 data in multiple formats, "parallel" for parts
567 intended to be viewed simultaneously, and "digest"
568 for multipart entities in which each part is of
569 type "message".
570 message -- an encapsulated message. A body of
571 Content-Type "message" is itself a fully formatted
572 RFC 822 conformant message which may contain its
573 own different Content-Type header field. The
574 primary subtype is "rfc822". The "partial"
575 subtype is defined for partial messages, to permit
576 the fragmented transmission of bodies that are
577 thought to be too large to be passed through mail
578 transport facilities. Another subtype,
579 "External-body", is defined for specifying large
580 bodies by reference to an external data source.
581
582
583
584 Borenstein & Freed [Page 8]
585
586
587
588
589 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
590
591
592 image -- image data. Image requires a display device
593 (such as a graphical display, a printer, or a FAX
594 machine) to view the information. Initial
595 subtypes are defined for two widely-used image
596 formats, jpeg and gif.
597 audio -- audio data, with initial subtype "basic".
598 Audio requires an audio output device (such as a
599 speaker or a telephone) to "display" the contents.
600 video -- video data. Video requires the capability to
601 display moving images, typically including
602 specialized hardware and software. The initial
603 subtype is "mpeg".
604 application -- some other kind of data, typically
605 either uninterpreted binary data or information to
606 be processed by a mail-based application. The
607 primary subtype, "octet-stream", is to be used in
608 the case of uninterpreted binary data, in which
609 case the simplest recommended action is to offer
610 to write the information into a file for the user.
611 Two additional subtypes, "ODA" and "PostScript",
612 are defined for transporting ODA and PostScript
613 documents in bodies. Other expected uses for
614 "application" include spreadsheets, data for
615 mail-based scheduling systems, and languages for
616 "active" (computational) email. (Note that active
617 email entails several securityconsiderations,
618 which are discussed later in this memo,
619 particularly in the context of
620 application/PostScript.)
621
622 Default RFC 822 messages are typed by this protocol as plain
623 text in the US-ASCII character set, which can be explicitly
624 specified as "Content-type: text/plain; charset=us-ascii".
625 If no Content-Type is specified, either by error or by an
626 older user agent, this default is assumed. In the presence
627 of a MIME-Version header field, a receiving User Agent can
628 also assume that plain US-ASCII text was the sender's
629 intent. In the absence of a MIME-Version specification,
630 plain US-ASCII text must still be assumed, but the sender's
631 intent might have been otherwise.
632
633 RATIONALE: In the absence of any Content-Type header field
634 or MIME-Version header field, it is impossible to be certain
635 that a message is actually text in the US-ASCII character
636 set, since it might well be a message that, using the
637 conventions that predate this document, includes text in
638 another character set or non-textual data in a manner that
639 cannot be automatically recognized (e.g., a uuencoded
640 compressed UNIX tar file). Although there is no fully
641 acceptable alternative to treating such untyped messages as
642 "text/plain; charset=us-ascii", implementors should remain
643 aware that if a message lacks both the MIME-Version and the
644 Content-Type header fields, it may in practice contain
645 almost anything.
646
647
648
649 Borenstein & Freed [Page 9]
650
651
652
653
654 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
655
656
657 It should be noted that the list of Content-Type values
658 given here may be augmented in time, via the mechanisms
659 described above, and that the set of subtypes is expected to
660 grow substantially.
661
662 When a mail reader encounters mail with an unknown Content-
663 type value, it should generally treat it as equivalent to
664 "application/octet-stream", as described later in this
665 document.
666
667 5 The Content-Transfer-Encoding Header Field
668
669 Many Content-Types which could usefully be transported via
670 email are represented, in their "natural" format, as 8-bit
671 character or binary data. Such data cannot be transmitted
672 over some transport protocols. For example, RFC 821
673 restricts mail messages to 7-bit US-ASCII data with 1000
674 character lines.
675
676 It is necessary, therefore, to define a standard mechanism
677 for re-encoding such data into a 7-bit short-line format.
678 This document specifies that such encodings will be
679 indicated by a new "Content-Transfer-Encoding" header field.
680 The Content-Transfer-Encoding field is used to indicate the
681 type of transformation that has been used in order to
682 represent the body in an acceptable manner for transport.
683
684 Unlike Content-Types, a proliferation of Content-Transfer-
685 Encoding values is undesirable and unnecessary. However,
686 establishing only a single Content-Transfer-Encoding
687 mechanism does not seem possible. There is a tradeoff
688 between the desire for a compact and efficient encoding of
689 largely-binary data and the desire for a readable encoding
690 of data that is mostly, but not entirely, 7-bit data. For
691 this reason, at least two encoding mechanisms are necessary:
692 a "readable" encoding and a "dense" encoding.
693
694 The Content-Transfer-Encoding field is designed to specify
695 an invertible mapping between the "native" representation of
696 a type of data and a representation that can be readily
697 exchanged using 7 bit mail transport protocols, such as
698 those defined by RFC 821 (SMTP). This field has not been
699 defined by any previous standard. The field's value is a
700 single token specifying the type of encoding, as enumerated
701 below. Formally:
702
703 Content-Transfer-Encoding := "BASE64" / "QUOTED-PRINTABLE" /
704 "8BIT" / "7BIT" /
705 "BINARY" / x-token
706
707 These values are not case sensitive. That is, Base64 and
708 BASE64 and bAsE64 are all equivalent. An encoding type of
709 7BIT requires that the body is already in a seven-bit mail-
710 ready representation. This is the default value -- that is,
711
712
713
714 Borenstein & Freed [Page 10]
715
716
717
718
719 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
720
721
722 "Content-Transfer-Encoding: 7BIT" is assumed if the
723 Content-Transfer-Encoding header field is not present.
724
725 The values "8bit", "7bit", and "binary" all imply that NO
726 encoding has been performed. However, they are potentially
727 useful as indications of the kind of data contained in the
728 object, and therefore of the kind of encoding that might
729 need to be performed for transmission in a given transport
730 system. "7bit" means that the data is all represented as
731 short lines of US-ASCII data. "8bit" means that the lines
732 are short, but there may be non-ASCII characters (octets
733 with the high-order bit set). "Binary" means that not only
734 may non-ASCII characters be present, but also that the lines
735 are not necessarily short enough for SMTP transport.
736
737 The difference between "8bit" (or any other conceivable
738 bit-width token) and the "binary" token is that "binary"
739 does not require adherence to any limits on line length or
740 to the SMTP CRLF semantics, while the bit-width tokens do
741 require such adherence. If the body contains data in any
742 bit-width other than 7-bit, the appropriate bit-width
743 Content-Transfer-Encoding token must be used (e.g., "8bit"
744 for unencoded 8 bit wide data). If the body contains binary
745 data, the "binary" Content-Transfer-Encoding token must be
746 used.
747
748 NOTE: The distinction between the Content-Transfer-Encoding
749 values of "binary," "8bit," etc. may seem unimportant, in
750 that all of them really mean "none" -- that is, there has
751 been no encoding of the data for transport. However, clear
752 labeling will be of enormous value to gateways between
753 future mail transport systems with differing capabilities in
754 transporting data that do not meet the restrictions of RFC
755 821 transport.
756
757 As of the publication of this document, there are no
758 standardized Internet transports for which it is legitimate
759 to include unencoded 8-bit or binary data in mail bodies.
760 Thus there are no circumstances in which the "8bit" or
761 "binary" Content-Transfer-Encoding is actually legal on the
762 Internet. However, in the event that 8-bit or binary mail
763 transport becomes a reality in Internet mail, or when this
764 document is used in conjunction with any other 8-bit or
765 binary-capable transport mechanism, 8-bit or binary bodies
766 should be labeled as such using this mechanism.
767
768 NOTE: The five values defined for the Content-Transfer-
769 Encoding field imply nothing about the Content-Type other
770 than the algorithm by which it was encoded or the transport
771 system requirements if unencoded.
772
773 Implementors may, if necessary, define new Content-
774 Transfer-Encoding values, but must use an x-token, which is
775 a name prefixed by "X-" to indicate its non-standard status,
776
777
778
779 Borenstein & Freed [Page 11]
780
781
782
783
784 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
785
786
787 e.g., "Content-Transfer-Encoding: x-my-new-encoding".
788 However, unlike Content-Types and subtypes, the creation of
789 new Content-Transfer-Encoding values is explicitly and
790 strongly discouraged, as it seems likely to hinder
791 interoperability with little potential benefit. Their use
792 is allowed only as the result of an agreement between
793 cooperating user agents.
794
795 If a Content-Transfer-Encoding header field appears as part
796 of a message header, it applies to the entire body of that
797 message. If a Content-Transfer-Encoding header field
798 appears as part of a body part's headers, it applies only to
799 the body of that body part. If an entity is of type
800 "multipart" or "message", the Content-Transfer-Encoding is
801 not permitted to have any value other than a bit width
802 (e.g., "7bit", "8bit", etc.) or "binary".
803
804 It should be noted that email is character-oriented, so that
805 the mechanisms described here are mechanisms for encoding
806 arbitrary byte streams, not bit streams. If a bit stream is
807 to be encoded via one of these mechanisms, it must first be
808 converted to an 8-bit byte stream using the network standard
809 bit order ("big-endian"), in which the earlier bits in a
810 stream become the higher-order bits in a byte. A bit stream
811 not ending at an 8-bit boundary must be padded with zeroes.
812 This document provides a mechanism for noting the addition
813 of such padding in the case of the application Content-Type,
814 which has a "padding" parameter.
815
816 The encoding mechanisms defined here explicitly encode all
817 data in ASCII. Thus, for example, suppose an entity has
818 header fields such as:
819
820 Content-Type: text/plain; charset=ISO-8859-1
821 Content-transfer-encoding: base64
822
823 This should be interpreted to mean that the body is a base64
824 ASCII encoding of data that was originally in ISO-8859-1,
825 and will be in that character set again after decoding.
826
827 The following sections will define the two standard encoding
828 mechanisms. The definition of new content-transfer-
829 encodings is explicitly discouraged and should only occur
830 when absolutely necessary. All content-transfer-encoding
831 namespace except that beginning with "X-" is explicitly
832 reserved to the IANA for future use. Private agreements
833 about content-transfer-encodings are also explicitly
834 discouraged.
835
836 Certain Content-Transfer-Encoding values may only be used on
837 certain Content-Types. In particular, it is expressly
838 forbidden to use any encodings other than "7bit", "8bit", or
839 "binary" with any Content-Type that recursively includes
840 other Content-Type fields, notably the "multipart" and
841
842
843
844 Borenstein & Freed [Page 12]
845
846
847
848
849 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
850
851
852 "message" Content-Types. All encodings that are desired for
853 bodies of type multipart or message must be done at the
854 innermost level, by encoding the actual body that needs to
855 be encoded.
856
857 NOTE ON ENCODING RESTRICTIONS: Though the prohibition
858 against using content-transfer-encodings on data of type
859 multipart or message may seem overly restrictive, it is
860 necessary to prevent nested encodings, in which data are
861 passed through an encoding algorithm multiple times, and
862 must be decoded multiple times in order to be properly
863 viewed. Nested encodings add considerable complexity to
864 user agents: aside from the obvious efficiency problems
865 with such multiple encodings, they can obscure the basic
866 structure of a message. In particular, they can imply that
867 several decoding operations are necessary simply to find out
868 what types of objects a message contains. Banning nested
869 encodings may complicate the job of certain mail gateways,
870 but this seems less of a problem than the effect of nested
871 encodings on user agents.
872
873 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT-
874 TRANSFER-ENCODING: It may seem that the Content-Transfer-
875 Encoding could be inferred from the characteristics of the
876 Content-Type that is to be encoded, or, at the very least,
877 that certain Content-Transfer-Encodings could be mandated
878 for use with specific Content-Types. There are several
879 reasons why this is not the case. First, given the varying
880 types of transports used for mail, some encodings may be
881 appropriate for some Content-Type/transport combinations and
882 not for others. (For example, in an 8-bit transport, no
883 encoding would be required for text in certain character
884 sets, while such encodings are clearly required for 7-bit
885 SMTP.) Second, certain Content-Types may require different
886 types of transfer encoding under different circumstances.
887 For example, many PostScript bodies might consist entirely
888 of short lines of 7-bit data and hence require little or no
889 encoding. Other PostScript bodies (especially those using
890 Level 2 PostScript's binary encoding mechanism) may only be
891 reasonably represented using a binary transport encoding.
892 Finally, since Content-Type is intended to be an open-ended
893 specification mechanism, strict specification of an
894 association between Content-Types and encodings effectively
895 couples the specification of an application protocol with a
896 specific lower-level transport. This is not desirable since
897 the developers of a Content-Type should not have to be aware
898 of all the transports in use and what their limitations are.
899
900 NOTE ON TRANSLATING ENCODINGS: The quoted-printable and
901 base64 encodings are designed so that conversion between
902 them is possible. The only issue that arises in such a
903 conversion is the handling of line breaks. When converting
904 from quoted-printable to base64 a line break must be
905 converted into a CRLF sequence. Similarly, a CRLF sequence
906
907
908
909 Borenstein & Freed [Page 13]
910
911
912
913
914 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
915
916
917 in base64 data should be converted to a quoted-printable
918 line break, but ONLY when converting text data.
919
920 NOTE ON CANONICAL ENCODING MODEL: There was some
921 confusion, in earlier drafts of this memo, regarding the
922 model for when email data was to be converted to canonical
923 form and encoded, and in particular how this process would
924 affect the treatment of CRLFs, given that the representation
925 of newlines varies greatly from system to system. For this
926 reason, a canonical model for encoding is presented as
927 Appendix H.
928
929 5.1 Quoted-Printable Content-Transfer-Encoding
930
931 The Quoted-Printable encoding is intended to represent data
932 that largely consists of octets that correspond to printable
933 characters in the ASCII character set. It encodes the data
934 in such a way that the resulting octets are unlikely to be
935 modified by mail transport. If the data being encoded are
936 mostly ASCII text, the encoded form of the data remains
937 largely recognizable by humans. A body which is entirely
938 ASCII may also be encoded in Quoted-Printable to ensure the
939 integrity of the data should the message pass through a
940 character-translating, and/or line-wrapping gateway.
941
942 In this encoding, octets are to be represented as determined
943 by the following rules:
944
945 Rule #1: (General 8-bit representation) Any octet,
946 except those indicating a line break according to the
947 newline convention of the canonical form of the data
948 being encoded, may be represented by an "=" followed by
949 a two digit hexadecimal representation of the octet's
950 value. The digits of the hexadecimal alphabet, for this
951 purpose, are "0123456789ABCDEF". Uppercase letters must
952 be
953 used when sending hexadecimal data, though a robust
954 implementation may choose to recognize lowercase
955 letters on receipt. Thus, for example, the value 12
956 (ASCII form feed) can be represented by "=0C", and the
957 value 61 (ASCII EQUAL SIGN) can be represented by
958 "=3D". Except when the following rules allow an
959 alternative encoding, this rule is mandatory.
960
961 Rule #2: (Literal representation) Octets with decimal
962 values of 33 through 60 inclusive, and 62 through 126,
963 inclusive, MAY be represented as the ASCII characters
964 which correspond to those octets (EXCLAMATION POINT
965 through LESS THAN, and GREATER THAN through TILDE,
966 respectively).
967
968 Rule #3: (White Space): Octets with values of 9 and 32
969 MAY be represented as ASCII TAB (HT) and SPACE
970 characters, respectively, but MUST NOT be so
971
972
973
974 Borenstein & Freed [Page 14]
975
976
977
978
979 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
980
981
982 represented at the end of an encoded line. Any TAB (HT)
983 or SPACE characters on an encoded line MUST thus be
984 followed on that line by a printable character. In
985 particular, an "=" at the end of an encoded line,
986 indicating a soft line break (see rule #5) may follow
987 one or more TAB (HT) or SPACE characters. It follows
988 that an octet with value 9 or 32 appearing at the end
989 of an encoded line must be represented according to
990 Rule #1. This rule is necessary because some MTAs
991 (Message Transport Agents, programs which transport
992 messages from one user to another, or perform a part of
993 such transfers) are known to pad lines of text with
994 SPACEs, and others are known to remove "white space"
995 characters from the end of a line. Therefore, when
996 decoding a Quoted-Printable body, any trailing white
997 space on a line must be deleted, as it will necessarily
998 have been added by intermediate transport agents.
999
1000 Rule #4 (Line Breaks): A line break in a text body
1001 part, independent of what its representation is
1002 following the canonical representation of the data
1003 being encoded, must be represented by a (RFC 822) line
1004 break, which is a CRLF sequence, in the Quoted-
1005 Printable encoding. If isolated CRs and LFs, or LF CR
1006 and CR LF sequences are allowed to appear in binary
1007 data according to the canonical form, they must be
1008 represented using the "=0D", "=0A", "=0A=0D" and
1009 "=0D=0A" notations respectively.
1010
1011 Note that many implementation may elect to encode the
1012 local representation of various content types directly.
1013 In particular, this may apply to plain text material on
1014 systems that use newline conventions other than CRLF
1015 delimiters. Such an implementation is permissible, but
1016 the generation of line breaks must be generalized to
1017 account for the case where alternate representations of
1018 newline sequences are used.
1019
1020 Rule #5 (Soft Line Breaks): The Quoted-Printable
1021 encoding REQUIRES that encoded lines be no more than 76
1022 characters long. If longer lines are to be encoded with
1023 the Quoted-Printable encoding, 'soft' line breaks must
1024 be used. An equal sign as the last character on a
1025 encoded line indicates such a non-significant ('soft')
1026 line break in the encoded text. Thus if the "raw" form
1027 of the line is a single unencoded line that says:
1028
1029 Now's the time for all folk to come to the aid of
1030 their country.
1031
1032 This can be represented, in the Quoted-Printable
1033 encoding, as
1034
1035
1036
1037
1038
1039 Borenstein & Freed [Page 15]
1040
1041
1042
1043
1044 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1045
1046
1047 Now's the time =
1048 for all folk to come=
1049 to the aid of their country.
1050
1051 This provides a mechanism with which long lines are
1052 encoded in such a way as to be restored by the user
1053 agent. The 76 character limit does not count the
1054 trailing CRLF, but counts all other characters,
1055 including any equal signs.
1056
1057 Since the hyphen character ("-") is represented as itself in
1058 the Quoted-Printable encoding, care must be taken, when
1059 encapsulating a quoted-printable encoded body in a multipart
1060 entity, to ensure that the encapsulation boundary does not
1061 appear anywhere in the encoded body. (A good strategy is to
1062 choose a boundary that includes a character sequence such as
1063 "=_" which can never appear in a quoted-printable body. See
1064 the definition of multipart messages later in this
1065 document.)
1066
1067 NOTE: The quoted-printable encoding represents something of
1068 a compromise between readability and reliability in
1069 transport. Bodies encoded with the quoted-printable
1070 encoding will work reliably over most mail gateways, but may
1071 not work perfectly over a few gateways, notably those
1072 involving translation into EBCDIC. (In theory, an EBCDIC
1073 gateway could decode a quoted-printable body and re-encode
1074 it using base64, but such gateways do not yet exist.) A
1075 higher level of confidence is offered by the base64
1076 Content-Transfer-Encoding. A way to get reasonably reliable
1077 transport through EBCDIC gateways is to also quote the ASCII
1078 characters
1079
1080 !"#$@[\]^`{|}~
1081
1082 according to rule #1. See Appendix B for more information.
1083
1084 Because quoted-printable data is generally assumed to be
1085 line-oriented, it is to be expected that the breaks between
1086 the lines of quoted printable data may be altered in
1087 transport, in the same manner that plain text mail has
1088 always been altered in Internet mail when passing between
1089 systems with differing newline conventions. If such
1090 alterations are likely to constitute a corruption of the
1091 data, it is probably more sensible to use the base64
1092 encoding rather than the quoted-printable encoding.
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104 Borenstein & Freed [Page 16]
1105
1106
1107
1108
1109 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1110
1111
1112 5.2 Base64 Content-Transfer-Encoding
1113
1114 The Base64 Content-Transfer-Encoding is designed to
1115 represent arbitrary sequences of octets in a form that is
1116 not humanly readable. The encoding and decoding algorithms
1117 are simple, but the encoded data are consistently only about
1118 33 percent larger than the unencoded data. This encoding is
1119 based on the one used in Privacy Enhanced Mail applications,
1120 as defined in RFC 1113. The base64 encoding is adapted
1121 from RFC 1113, with one change: base64 eliminates the "*"
1122 mechanism for embedded clear text.
1123
1124 A 65-character subset of US-ASCII is used, enabling 6 bits
1125 to be represented per printable character. (The extra 65th
1126 character, "=", is used to signify a special processing
1127 function.)
1128
1129 NOTE: This subset has the important property that it is
1130 represented identically in all versions of ISO 646,
1131 including US ASCII, and all characters in the subset are
1132 also represented identically in all versions of EBCDIC.
1133 Other popular encodings, such as the encoding used by the
1134 UUENCODE utility and the base85 encoding specified as part
1135 of Level 2 PostScript, do not share these properties, and
1136 thus do not fulfill the portability requirements a binary
1137 transport encoding for mail must meet.
1138
1139 The encoding process represents 24-bit groups of input bits
1140 as output strings of 4 encoded characters. Proceeding from
1141 left to right, a 24-bit input group is formed by
1142 concatenating 3 8-bit input groups. These 24 bits are then
1143 treated as 4 concatenated 6-bit groups, each of which is
1144 translated into a single digit in the base64 alphabet. When
1145 encoding a bit stream via the base64 encoding, the bit
1146 stream must be presumed to be ordered with the most-
1147 significant-bit first. That is, the first bit in the stream
1148 will be the high-order bit in the first byte, and the eighth
1149 bit will be the low-order bit in the first byte, and so on.
1150
1151 Each 6-bit group is used as an index into an array of 64
1152 printable characters. The character referenced by the index
1153 is placed in the output string. These characters, identified
1154 in Table 1, below, are selected so as to be universally
1155 representable, and the set excludes characters with
1156 particular significance to SMTP (e.g., ".", "CR", "LF") and
1157 to the encapsulation boundaries defined in this document
1158 (e.g., "-").
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169 Borenstein & Freed [Page 17]
1170
1171
1172
1173
1174 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1175
1176
1177 Table 1: The Base64 Alphabet
1178
1179 Value Encoding Value Encoding Value Encoding Value
1180 Encoding
1181 0 A 17 R 34 i 51 z
1182 1 B 18 S 35 j 52 0
1183 2 C 19 T 36 k 53 1
1184 3 D 20 U 37 l 54 2
1185 4 E 21 V 38 m 55 3
1186 5 F 22 W 39 n 56 4
1187 6 G 23 X 40 o 57 5
1188 7 H 24 Y 41 p 58 6
1189 8 I 25 Z 42 q 59 7
1190 9 J 26 a 43 r 60 8
1191 10 K 27 b 44 s 61 9
1192 11 L 28 c 45 t 62 +
1193 12 M 29 d 46 u 63 /
1194 13 N 30 e 47 v
1195 14 O 31 f 48 w (pad) =
1196 15 P 32 g 49 x
1197 16 Q 33 h 50 y
1198
1199 The output stream (encoded bytes) must be represented in
1200 lines of no more than 76 characters each. All line breaks
1201 or other characters not found in Table 1 must be ignored by
1202 decoding software. In base64 data, characters other than
1203 those in Table 1, line breaks, and other white space
1204 probably indicate a transmission error, about which a
1205 warning message or even a message rejection might be
1206 appropriate under some circumstances.
1207
1208 Special processing is performed if fewer than 24 bits are
1209 available at the end of the data being encoded. A full
1210 encoding quantum is always completed at the end of a body.
1211 When fewer than 24 input bits are available in an input
1212 group, zero bits are added (on the right) to form an
1213 integral number of 6-bit groups. Output character positions
1214 which are not required to represent actual input data are
1215 set to the character "=". Since all base64 input is an
1216 integral number of octets, only the following cases can
1217 arise: (1) the final quantum of encoding input is an
1218 integral multiple of 24 bits; here, the final unit of
1219 encoded output will be an integral multiple of 4 characters
1220 with no "=" padding, (2) the final quantum of encoding input
1221 is exactly 8 bits; here, the final unit of encoded output
1222 will be two characters followed by two "=" padding
1223 characters, or (3) the final quantum of encoding input is
1224 exactly 16 bits; here, the final unit of encoded output will
1225 be three characters followed by one "=" padding character.
1226
1227 Care must be taken to use the proper octets for line breaks
1228 if base64 encoding is applied directly to text material that
1229 has not been converted to canonical form. In particular,
1230 text line breaks should be converted into CRLF sequences
1231
1232
1233
1234 Borenstein & Freed [Page 18]
1235
1236
1237
1238
1239 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1240
1241
1242 prior to base64 encoding. The important thing to note is
1243 that this may be done directly by the encoder rather than in
1244 a prior canonicalization step in some implementations.
1245
1246 NOTE: There is no need to worry about quoting apparent
1247 encapsulation boundaries within base64-encoded parts of
1248 multipart entities because no hyphen characters are used in
1249 the base64 encoding.
1250
1251 6 Additional Optional Content- Header Fields
1252
1253 6.1 Optional Content-ID Header Field
1254
1255 In constructing a high-level user agent, it may be desirable
1256 to allow one body to make reference to another.
1257 Accordingly, bodies may be labeled using the "Content-ID"
1258 header field, which is syntactically identical to the
1259 "Message-ID" header field:
1260
1261 Content-ID := msg-id
1262
1263 Like the Message-ID values, Content-ID values must be
1264 generated to be as unique as possible.
1265
1266 6.2 Optional Content-Description Header Field
1267
1268 The ability to associate some descriptive information with a
1269 given body is often desirable. For example, it may be useful
1270 to mark an "image" body as "a picture of the Space Shuttle
1271 Endeavor." Such text may be placed in the Content-
1272 Description header field.
1273
1274 Content-Description := *text
1275
1276 The description is presumed to be given in the US-ASCII
1277 character set, although the mechanism specified in [RFC-
1278 1342] may be used for non-US-ASCII Content-Description
1279 values.
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299 Borenstein & Freed [Page 19]
1300
1301
1302
1303
1304 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1305
1306
1307 7 The Predefined Content-Type Values
1308
1309 This document defines seven initial Content-Type values and
1310 an extension mechanism for private or experimental types.
1311 Further standard types must be defined by new published
1312 specifications. It is expected that most innovation in new
1313 types of mail will take place as subtypes of the seven types
1314 defined here. The most essential characteristics of the
1315 seven content-types are summarized in Appendix G.
1316
1317 7.1 The Text Content-Type
1318
1319 The text Content-Type is intended for sending material which
1320 is principally textual in form. It is the default Content-
1321 Type. A "charset" parameter may be used to indicate the
1322 character set of the body text. The primary subtype of text
1323 is "plain". This indicates plain (unformatted) text. The
1324 default Content-Type for Internet mail is "text/plain;
1325 charset=us-ascii".
1326
1327 Beyond plain text, there are many formats for representing
1328 what might be known as "extended text" -- text with embedded
1329 formatting and presentation information. An interesting
1330 characteristic of many such representations is that they are
1331 to some extent readable even without the software that
1332 interprets them. It is useful, then, to distinguish them,
1333 at the highest level, from such unreadable data as images,
1334 audio, or text represented in an unreadable form. In the
1335 absence of appropriate interpretation software, it is
1336 reasonable to show subtypes of text to the user, while it is
1337 not reasonable to do so with most nontextual data.
1338
1339 Such formatted textual data should be represented using
1340 subtypes of text. Plausible subtypes of text are typically
1341 given by the common name of the representation format, e.g.,
1342 "text/richtext".
1343
1344 7.1.1 The charset parameter
1345
1346 A critical parameter that may be specified in the Content-
1347 Type field for text data is the character set. This is
1348 specified with a "charset" parameter, as in:
1349
1350 Content-type: text/plain; charset=us-ascii
1351
1352 Unlike some other parameter values, the values of the
1353 charset parameter are NOT case sensitive. The default
1354 character set, which must be assumed in the absence of a
1355 charset parameter, is US-ASCII.
1356
1357 An initial list of predefined character set names can be
1358 found at the end of this section. Additional character sets
1359 may be registered with IANA as described in Appendix F,
1360 although the standardization of their use requires the usual
1361
1362
1363
1364 Borenstein & Freed [Page 20]
1365
1366
1367
1368
1369 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1370
1371
1372 IAB review and approval. Note that if the specified
1373 character set includes 8-bit data, a Content-Transfer-
1374 Encoding header field and a corresponding encoding on the
1375 data are required in order to transmit the body via some
1376 mail transfer protocols, such as SMTP.
1377
1378 The default character set, US-ASCII, has been the subject of
1379 some confusion and ambiguity in the past. Not only were
1380 there some ambiguities in the definition, there have been
1381 wide variations in practice. In order to eliminate such
1382 ambiguity and variations in the future, it is strongly
1383 recommended that new user agents explicitly specify a
1384 character set via the Content-Type header field. "US-ASCII"
1385 does not indicate an arbitrary seven-bit character code, but
1386 specifies that the body uses character coding that uses the
1387 exact correspondence of codes to characters specified in
1388 ASCII. National use variations of ISO 646 [ISO-646] are NOT
1389 ASCII and their use in Internet mail is explicitly
1390 discouraged. The omission of the ISO 646 character set is
1391 deliberate in this regard. The character set name of "US-
1392 ASCII" explicitly refers to ANSI X3.4-1986 [US-ASCII] only.
1393 The character set name "ASCII" is reserved and must not be
1394 used for any purpose.
1395
1396 NOTE: RFC 821 explicitly specifies "ASCII", and references
1397 an earlier version of the American Standard. Insofar as one
1398 of the purposes of specifying a Content-Type and character
1399 set is to permit the receiver to unambiguously determine how
1400 the sender intended the coded message to be interpreted,
1401 assuming anything other than "strict ASCII" as the default
1402 would risk unintentional and incompatible changes to the
1403 semantics of messages now being transmitted. This also
1404 implies that messages containing characters coded according
1405 to national variations on ISO 646, or using code-switching
1406 procedures (e.g., those of ISO 2022), as well as 8-bit or
1407 multiple octet character encodings MUST use an appropriate
1408 character set specification to be consistent with this
1409 specification.
1410
1411 The complete US-ASCII character set is listed in [US-ASCII].
1412 Note that the control characters including DEL (0-31, 127)
1413 have no defined meaning apart from the combination CRLF
1414 (ASCII values 13 and 10) indicating a new line. Two of the
1415 characters have de facto meanings in wide use: FF (12) often
1416 means "start subsequent text on the beginning of a new
1417 page"; and TAB or HT (9) often (though not always) means
1418 "move the cursor to the next available column after the
1419 current position where the column number is a multiple of 8
1420 (counting the first column as column 0)." Apart from this,
1421 any use of the control characters or DEL in a body must be
1422 part of a private agreement between the sender and
1423 recipient. Such private agreements are discouraged and
1424 should be replaced by the other capabilities of this
1425 document.
1426
1427
1428
1429 Borenstein & Freed [Page 21]
1430
1431
1432
1433
1434 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1435
1436
1437 NOTE: Beyond US-ASCII, an enormous proliferation of
1438 character sets is possible. It is the opinion of the IETF
1439 working group that a large number of character sets is NOT a
1440 good thing. We would prefer to specify a single character
1441 set that can be used universally for representing all of the
1442 world's languages in electronic mail. Unfortunately,
1443 existing practice in several communities seems to point to
1444 the continued use of multiple character sets in the near
1445 future. For this reason, we define names for a small number
1446 of character sets for which a strong constituent base
1447 exists. It is our hope that ISO 10646 or some other
1448 effort will eventually define a single world character set
1449 which can then be specified for use in Internet mail, but in
1450 the advance of that definition we cannot specify the use of
1451 ISO 10646, Unicode, or any other character set whose
1452 definition is, as of this writing, incomplete.
1453
1454 The defined charset values are:
1455
1456 US-ASCII -- as defined in [US-ASCII].
1457
1458 ISO-8859-X -- where "X" is to be replaced, as
1459 necessary, for the parts of ISO-8859 [ISO-
1460 8859]. Note that the ISO 646 character sets
1461 have deliberately been omitted in favor of
1462 their 8859 replacements, which are the
1463 designated character sets for Internet mail.
1464 As of the publication of this document, the
1465 legitimate values for "X" are the digits 1
1466 through 9.
1467
1468 Note that the character set used, if anything other than
1469 US-ASCII, must always be explicitly specified in the
1470 Content-Type field.
1471
1472 No other character set name may be used in Internet mail
1473 without the publication of a formal specification and its
1474 registration with IANA as described in Appendix F, or by
1475 private agreement, in which case the character set name must
1476 begin with "X-".
1477
1478 Implementors are discouraged from defining new character
1479 sets for mail use unless absolutely necessary.
1480
1481 The "charset" parameter has been defined primarily for the
1482 purpose of textual data, and is described in this section
1483 for that reason. However, it is conceivable that non-
1484 textual data might also wish to specify a charset value for
1485 some purpose, in which case the same syntax and values
1486 should be used.
1487
1488 In general, mail-sending software should always use the
1489 "lowest common denominator" character set possible. For
1490 example, if a body contains only US-ASCII characters, it
1491
1492
1493
1494 Borenstein & Freed [Page 22]
1495
1496
1497
1498
1499 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1500
1501
1502 should be marked as being in the US-ASCII character set, not
1503 ISO-8859-1, which, like all the ISO-8859 family of character
1504 sets, is a superset of US-ASCII. More generally, if a
1505 widely-used character set is a subset of another character
1506 set, and a body contains only characters in the widely-used
1507 subset, it should be labeled as being in that subset. This
1508 will increase the chances that the recipient will be able to
1509 view the mail correctly.
1510
1511 7.1.2 The Text/plain subtype
1512
1513 The primary subtype of text is "plain". This indicates
1514 plain (unformatted) text. The default Content-Type for
1515 Internet mail, "text/plain; charset=us-ascii", describes
1516 existing Internet practice, that is, it is the type of body
1517 defined by RFC 822.
1518
1519 7.1.3 The Text/richtext subtype
1520
1521 In order to promote the wider interoperability of simple
1522 formatted text, this document defines an extremely simple
1523 subtype of "text", the "richtext" subtype. This subtype was
1524 designed to meet the following criteria:
1525
1526 1. The syntax must be extremely simple to parse,
1527 so that even teletype-oriented mail systems can
1528 easily strip away the formatting information and
1529 leave only the readable text.
1530
1531 2. The syntax must be extensible to allow for new
1532 formatting commands that are deemed essential.
1533
1534 3. The capabilities must be extremely limited, to
1535 ensure that it can represent no more than is
1536 likely to be representable by the user's primary
1537 word processor. While this limits what can be
1538 sent, it increases the likelihood that what is
1539 sent can be properly displayed.
1540
1541 4. The syntax must be compatible with SGML, so
1542 that, with an appropriate DTD (Document Type
1543 Definition, the standard mechanism for defining a
1544 document type using SGML), a general SGML parser
1545 could be made to parse richtext. However, despite
1546 this compatibility, the syntax should be far
1547 simpler than full SGML, so that no SGML knowledge
1548 is required in order to implement it.
1549
1550 The syntax of "richtext" is very simple. It is assumed, at
1551 the top-level, to be in the US-ASCII character set, unless
1552 of course a different charset parameter was specified in the
1553 Content-type field. All characters represent themselves,
1554 with the exception of the "<" character (ASCII 60), which is
1555 used to mark the beginning of a formatting command.
1556
1557
1558
1559 Borenstein & Freed [Page 23]
1560
1561
1562
1563
1564 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1565
1566
1567 Formatting instructions consist of formatting commands
1568 surrounded by angle brackets ("<>", ASCII 60 and 62). Each
1569 formatting command may be no more than 40 characters in
1570 length, all in US-ASCII, restricted to the alphanumeric and
1571 hyphen ("-") characters. Formatting commands may be preceded
1572 by a forward slash or solidus ("/", ASCII 47), making them
1573 negations, and such negations must always exist to balance
1574 the initial opening commands, except as noted below. Thus,
1575 if the formatting command "<bold>" appears at some point,
1576 there must later be a "</bold>" to balance it. There are
1577 only three exceptions to this "balancing" rule: First, the
1578 command "<lt>" is used to represent a literal "<" character.
1579 Second, the command "<nl>" is used to represent a required
1580 line break. (Otherwise, CRLFs in the data are treated as
1581 equivalent to a single SPACE character.) Finally, the
1582 command "<np>" is used to represent a page break. (NOTE:
1583 The 40 character limit on formatting commands does not
1584 include the "<", ">", or "/" characters that might be
1585 attached to such commands.)
1586
1587 Initially defined formatting commands, not all of which will
1588 be implemented by all richtext implementations, include:
1589
1590 Bold -- causes the subsequent text to be in a bold
1591 font.
1592 Italic -- causes the subsequent text to be in an italic
1593 font.
1594 Fixed -- causes the subsequent text to be in a fixed
1595 width font.
1596 Smaller -- causes the subsequent text to be in a
1597 smaller font.
1598 Bigger -- causes the subsequent text to be in a bigger
1599 font.
1600 Underline -- causes the subsequent text to be
1601 underlined.
1602 Center -- causes the subsequent text to be centered.
1603 FlushLeft -- causes the subsequent text to be left
1604 justified.
1605 FlushRight -- causes the subsequent text to be right
1606 justified.
1607 Indent -- causes the subsequent text to be indented at
1608 the left margin.
1609 IndentRight -- causes the subsequent text to be
1610 indented at the right margin.
1611 Outdent -- causes the subsequent text to be outdented
1612 at the left margin.
1613 OutdentRight -- causes the subsequent text to be
1614 outdented at the right margin.
1615 SamePage -- causes the subsequent text to be grouped,
1616 if possible, on one page.
1617 Subscript -- causes the subsequent text to be
1618 interpreted as a subscript.
1619
1620
1621
1622
1623
1624 Borenstein & Freed [Page 24]
1625
1626
1627
1628
1629 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1630
1631
1632 Superscript -- causes the subsequent text to be
1633 interpreted as a superscript.
1634 Heading -- causes the subsequent text to be interpreted
1635 as a page heading.
1636 Footing -- causes the subsequent text to be interpreted
1637 as a page footing.
1638 ISO-8859-X (for any value of X that is legal as a
1639 "charset" parameter) -- causes the subsequent text
1640 to be interpreted as text in the appropriate
1641 character set.
1642 US-ASCII -- causes the subsequent text to be
1643 interpreted as text in the US-ASCII character set.
1644 Excerpt -- causes the subsequent text to be interpreted
1645 as a textual excerpt from another source.
1646 Typically this will be displayed using indentation
1647 and an alternate font, but such decisions are up
1648 to the viewer.
1649 Paragraph -- causes the subsequent text to be
1650 interpreted as a single paragraph, with
1651 appropriate paragraph breaks (typically blank
1652 space) before and after.
1653 Signature -- causes the subsequent text to be
1654 interpreted as a "signature". Some systems may
1655 wish to display signatures in a smaller font or
1656 otherwise set them apart from the main text of the
1657 message.
1658 Comment -- causes the subsequent text to be interpreted
1659 as a comment, and hence not shown to the reader.
1660 No-op -- has no effect on the subsequent text.
1661 lt -- <lt> is replaced by a literal "<" character. No
1662 balancing </lt> is allowed.
1663 nl -- <nl> causes a line break. No balancing </nl> is
1664 allowed.
1665 np -- <np> causes a page break. No balancing </np> is
1666 allowed.
1667
1668 Each positive formatting command affects all subsequent text
1669 until the matching negative formatting command. Such pairs
1670 of formatting commands must be properly balanced and nested.
1671 Thus, a proper way to describe text in bold italics is:
1672
1673 <bold><italic>the-text</italic></bold>
1674
1675 or, alternately,
1676
1677 <italic><bold>the-text</bold></italic>
1678
1679 but, in particular, the following is illegal
1680 richtext:
1681
1682 <bold><italic>the-text</bold></italic>
1683
1684 NOTE: The nesting requirement for formatting commands
1685 imposes a slightly higher burden upon the composers of
1686
1687
1688
1689 Borenstein & Freed [Page 25]
1690
1691
1692
1693
1694 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1695
1696
1697 richtext bodies, but potentially simplifies richtext
1698 displayers by allowing them to be stack-based. The main
1699 goal of richtext is to be simple enough to make multifont,
1700 formatted email widely readable, so that those with the
1701 capability of sending it will be able to do so with
1702 confidence. Thus slightly increased complexity in the
1703 composing software was deemed a reasonable tradeoff for
1704 simplified reading software. Nonetheless, implementors of
1705 richtext readers are encouraged to follow the general
1706 Internet guidelines of being conservative in what you send
1707 and liberal in what you accept. Those implementations that
1708 can do so are encouraged to deal reasonably with improperly
1709 nested richtext.
1710
1711 Implementations must regard any unrecognized formatting
1712 command as equivalent to "No-op", thus facilitating future
1713 extensions to "richtext". Private extensions may be defined
1714 using formatting commands that begin with "X-", by analogy
1715 to Internet mail header field names.
1716
1717 It is worth noting that no special behavior is required for
1718 the TAB (HT) character. It is recommended, however, that, at
1719 least when fixed-width fonts are in use, the common
1720 semantics of the TAB (HT) character should be observed,
1721 namely that it moves to the next column position that is a
1722 multiple of 8. (In other words, if a TAB (HT) occurs in
1723 column n, where the leftmost column is column 0, then that
1724 TAB (HT) should be replaced by 8-(n mod 8) SPACE
1725 characters.)
1726
1727 Richtext also differentiates between "hard" and "soft" line
1728 breaks. A line break (CRLF) in the richtext data stream is
1729 interpreted as a "soft" line break, one that is included
1730 only for purposes of mail transport, and is to be treated as
1731 white space by richtext interpreters. To include a "hard"
1732 line break (one that must be displayed as such), the "<nl>"
1733 or "<paragraph> formatting constructs should be used. In
1734 general, a soft line break should be treated as white space,
1735 but when soft line breaks immediately follow a <nl> or a
1736 </paragraph> tag they should be ignored rather than treated
1737 as white space.
1738
1739 Putting all this together, the following "text/richtext"
1740 body fragment:
1741
1742 <bold>Now</bold> is the time for
1743 <italic>all</italic> good men
1744 <smaller>(and <lt>women>)</smaller> to
1745 <ignoreme></ignoreme> come
1746
1747 to the aid of their
1748 <nl>
1749
1750
1751
1752
1753
1754 Borenstein & Freed [Page 26]
1755
1756
1757
1758
1759 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1760
1761
1762 beloved <nl><nl>country. <comment> Stupid
1763 quote! </comment> -- the end
1764
1765 represents the following formatted text (which will, no
1766 doubt, look cryptic in the text-only version of this
1767 document):
1768
1769 Now is the time for all good men (and <women>) to
1770 come to the aid of their
1771 beloved
1772
1773 country. -- the end
1774
1775 Richtext conformance: A minimal richtext implementation is
1776 one that simply converts "<lt>" to "<", converts CRLFs to
1777 SPACE, converts <nl> to a newline according to local newline
1778 convention, removes everything between a <comment> command
1779 and the next balancing </comment> command, and removes all
1780 other formatting commands (all text enclosed in angle
1781 brackets).
1782
1783 NOTE ON THE RELATIONSHIP OF RICHTEXT TO SGML: Richtext is
1784 decidedly not SGML, and must not be used to transport
1785 arbitrary SGML documents. Those who wish to use SGML
1786 document types as a mail transport format must define a new
1787 text or application subtype, e.g., "text/sgml-dtd-whatever"
1788 or "application/sgml-dtd-whatever", depending on the
1789 perceived readability of the DTD in use. Richtext is
1790 designed to be compatible with SGML, and specifically so
1791 that it will be possible to define a richtext DTD if one is
1792 needed. However, this does not imply that arbitrary SGML
1793 can be called richtext, nor that richtext implementors have
1794 any need to understand SGML; the description in this
1795 document is a complete definition of richtext, which is far
1796 simpler than complete SGML.
1797
1798 NOTE ON THE INTENDED USE OF RICHTEXT: It is recognized that
1799 implementors of future mail systems will want rich text
1800 functionality far beyond that currently defined for
1801 richtext. The intent of richtext is to provide a common
1802 format for expressing that functionality in a form in which
1803 much of it, at least, will be understood by interoperating
1804 software. Thus, in particular, software with a richer
1805 notion of formatted text than richtext can still use
1806 richtext as its basic representation, but can extend it with
1807 new formatting commands and by hiding information specific
1808 to that software system in richtext comments. As such
1809 systems evolve, it is expected that the definition of
1810 richtext will be further refined by future published
1811 specifications, but richtext as defined here provides a
1812 platform on which evolutionary refinements can be based.
1813
1814 IMPLEMENTATION NOTE: In some environments, it might be
1815 impossible to combine certain richtext formatting commands,
1816
1817
1818
1819 Borenstein & Freed [Page 27]
1820
1821
1822
1823
1824 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1825
1826
1827 whereas in others they might be combined easily. For
1828 example, the combination of <bold> and <italic> might
1829 produce bold italics on systems that support such fonts, but
1830 there exist systems that can make text bold or italicized,
1831 but not both. In such cases, the most recently issued
1832 recognized formatting command should be preferred.
1833
1834 One of the major goals in the design of richtext was to make
1835 it so simple that even text-only mailers will implement
1836 richtext-to-plain-text translators, thus increasing the
1837 likelihood that multifont text will become "safe" to use
1838 very widely. To demonstrate this simplicity, an extremely
1839 simple 35-line C program that converts richtext input into
1840 plain text output is included in Appendix D.
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884 Borenstein & Freed [Page 28]
1885
1886
1887
1888
1889 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1890
1891
1892 7.2 The Multipart Content-Type
1893
1894 In the case of multiple part messages, in which one or more
1895 different sets of data are combined in a single body, a
1896 "multipart" Content-Type field must appear in the entity's
1897 header. The body must then contain one or more "body parts,"
1898 each preceded by an encapsulation boundary, and the last one
1899 followed by a closing boundary. Each part starts with an
1900 encapsulation boundary, and then contains a body part
1901 consisting of header area, a blank line, and a body area.
1902 Thus a body part is similar to an RFC 822 message in syntax,
1903 but different in meaning.
1904
1905 A body part is NOT to be interpreted as actually being an
1906 RFC 822 message. To begin with, NO header fields are
1907 actually required in body parts. A body part that starts
1908 with a blank line, therefore, is allowed and is a body part
1909 for which all default values are to be assumed. In such a
1910 case, the absence of a Content-Type header field implies
1911 that the encapsulation is plain US-ASCII text. The only
1912 header fields that have defined meaning for body parts are
1913 those the names of which begin with "Content-". All other
1914 header fields are generally to be ignored in body parts.
1915 Although they should generally be retained in mail
1916 processing, they may be discarded by gateways if necessary.
1917 Such other fields are permitted to appear in body parts but
1918 should not be depended on. "X-" fields may be created for
1919 experimental or private purposes, with the recognition that
1920 the information they contain may be lost at some gateways.
1921
1922 The distinction between an RFC 822 message and a body part
1923 is subtle, but important. A gateway between Internet and
1924 X.400 mail, for example, must be able to tell the difference
1925 between a body part that contains an image and a body part
1926 that contains an encapsulated message, the body of which is
1927 an image. In order to represent the latter, the body part
1928 must have "Content-Type: message", and its body (after the
1929 blank line) must be the encapsulated message, with its own
1930 "Content-Type: image" header field. The use of similar
1931 syntax facilitates the conversion of messages to body parts,
1932 and vice versa, but the distinction between the two must be
1933 understood by implementors. (For the special case in which
1934 all parts actually are messages, a "digest" subtype is also
1935 defined.)
1936
1937 As stated previously, each body part is preceded by an
1938 encapsulation boundary. The encapsulation boundary MUST NOT
1939 appear inside any of the encapsulated parts. Thus, it is
1940 crucial that the composing agent be able to choose and
1941 specify the unique boundary that will separate the parts.
1942
1943 All present and future subtypes of the "multipart" type must
1944 use an identical syntax. Subtypes may differ in their
1945 semantics, and may impose additional restrictions on syntax,
1946
1947
1948
1949 Borenstein & Freed [Page 29]
1950
1951
1952
1953
1954 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
1955
1956
1957 but must conform to the required syntax for the multipart
1958 type. This requirement ensures that all conformant user
1959 agents will at least be able to recognize and separate the
1960 parts of any multipart entity, even of an unrecognized
1961 subtype.
1962
1963 As stated in the definition of the Content-Transfer-Encoding
1964 field, no encoding other than "7bit", "8bit", or "binary" is
1965 permitted for entities of type "multipart". The multipart
1966 delimiters and header fields are always 7-bit ASCII in any
1967 case, and data within the body parts can be encoded on a
1968 part-by-part basis, with Content-Transfer-Encoding fields
1969 for each appropriate body part.
1970
1971 Mail gateways, relays, and other mail handling agents are
1972 commonly known to alter the top-level header of an RFC 822
1973 message. In particular, they frequently add, remove, or
1974 reorder header fields. Such alterations are explicitly
1975 forbidden for the body part headers embedded in the bodies
1976 of messages of type "multipart."
1977
1978 7.2.1 Multipart: The common syntax
1979
1980 All subtypes of "multipart" share a common syntax, defined
1981 in this section. A simple example of a multipart message
1982 also appears in this section. An example of a more complex
1983 multipart message is given in Appendix C.
1984
1985 The Content-Type field for multipart entities requires one
1986 parameter, "boundary", which is used to specify the
1987 encapsulation boundary. The encapsulation boundary is
1988 defined as a line consisting entirely of two hyphen
1989 characters ("-", decimal code 45) followed by the boundary
1990 parameter value from the Content-Type header field.
1991
1992 NOTE: The hyphens are for rough compatibility with the
1993 earlier RFC 934 method of message encapsulation, and for
1994 ease of searching for the boundaries in some
1995 implementations. However, it should be noted that multipart
1996 messages are NOT completely compatible with RFC 934
1997 encapsulations; in particular, they do not obey RFC 934
1998 quoting conventions for embedded lines that begin with
1999 hyphens. This mechanism was chosen over the RFC 934
2000 mechanism because the latter causes lines to grow with each
2001 level of quoting. The combination of this growth with the
2002 fact that SMTP implementations sometimes wrap long lines
2003 made the RFC 934 mechanism unsuitable for use in the event
2004 that deeply-nested multipart structuring is ever desired.
2005
2006 Thus, a typical multipart Content-Type header field might
2007 look like this:
2008
2009 Content-Type: multipart/mixed;
2010
2011
2012
2013
2014 Borenstein & Freed [Page 30]
2015
2016
2017
2018
2019 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2020
2021
2022 boundary=gc0p4Jq0M2Yt08jU534c0p
2023
2024 This indicates that the entity consists of several parts,
2025 each itself with a structure that is syntactically identical
2026 to an RFC 822 message, except that the header area might be
2027 completely empty, and that the parts are each preceded by
2028 the line
2029
2030 --gc0p4Jq0M2Yt08jU534c0p
2031
2032 Note that the encapsulation boundary must occur at the
2033 beginning of a line, i.e., following a CRLF, and that that
2034 initial CRLF is considered to be part of the encapsulation
2035 boundary rather than part of the preceding part. The
2036 boundary must be followed immediately either by another CRLF
2037 and the header fields for the next part, or by two CRLFs, in
2038 which case there are no header fields for the next part (and
2039 it is therefore assumed to be of Content-Type text/plain).
2040
2041 NOTE: The CRLF preceding the encapsulation line is
2042 considered part of the boundary so that it is possible to
2043 have a part that does not end with a CRLF (line break).
2044 Body parts that must be considered to end with line breaks,
2045 therefore, should have two CRLFs preceding the encapsulation
2046 line, the first of which is part of the preceding body part,
2047 and the second of which is part of the encapsulation
2048 boundary.
2049
2050 The requirement that the encapsulation boundary begins with
2051 a CRLF implies that the body of a multipart entity must
2052 itself begin with a CRLF before the first encapsulation line
2053 -- that is, if the "preamble" area is not used, the entity
2054 headers must be followed by TWO CRLFs. This is indeed how
2055 such entities should be composed. A tolerant mail reading
2056 program, however, may interpret a body of type multipart
2057 that begins with an encapsulation line NOT initiated by a
2058 CRLF as also being an encapsulation boundary, but a
2059 compliant mail sending program must not generate such
2060 entities.
2061
2062 Encapsulation boundaries must not appear within the
2063 encapsulations, and must be no longer than 70 characters,
2064 not counting the two leading hyphens.
2065
2066 The encapsulation boundary following the last body part is a
2067 distinguished delimiter that indicates that no further body
2068 parts will follow. Such a delimiter is identical to the
2069 previous delimiters, with the addition of two more hyphens
2070 at the end of the line:
2071
2072 --gc0p4Jq0M2Yt08jU534c0p--
2073
2074 There appears to be room for additional information prior to
2075 the first encapsulation boundary and following the final
2076
2077
2078
2079 Borenstein & Freed [Page 31]
2080
2081
2082
2083
2084 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2085
2086
2087 boundary. These areas should generally be left blank, and
2088 implementations should ignore anything that appears before
2089 the first boundary or after the last one.
2090
2091 NOTE: These "preamble" and "epilogue" areas are not used
2092 because of the lack of proper typing of these parts and the
2093 lack of clear semantics for handling these areas at
2094 gateways, particularly X.400 gateways.
2095
2096 NOTE: Because encapsulation boundaries must not appear in
2097 the body parts being encapsulated, a user agent must
2098 exercise care to choose a unique boundary. The boundary in
2099 the example above could have been the result of an algorithm
2100 designed to produce boundaries with a very low probability
2101 of already existing in the data to be encapsulated without
2102 having to prescan the data. Alternate algorithms might
2103 result in more 'readable' boundaries for a recipient with an
2104 old user agent, but would require more attention to the
2105 possibility that the boundary might appear in the
2106 encapsulated part. The simplest boundary possible is
2107 something like "---", with a closing boundary of "-----".
2108
2109 As a very simple example, the following multipart message
2110 has two parts, both of them plain text, one of them
2111 explicitly typed and one of them implicitly typed:
2112
2113 From: Nathaniel Borenstein <nsb@bellcore.com>
2114 To: Ned Freed <ned@innosoft.com>
2115 Subject: Sample message
2116 MIME-Version: 1.0
2117 Content-type: multipart/mixed; boundary="simple
2118 boundary"
2119
2120 This is the preamble. It is to be ignored, though it
2121 is a handy place for mail composers to include an
2122 explanatory note to non-MIME compliant readers.
2123 --simple boundary
2124
2125 This is implicitly typed plain ASCII text.
2126 It does NOT end with a linebreak.
2127 --simple boundary
2128 Content-type: text/plain; charset=us-ascii
2129
2130 This is explicitly typed plain ASCII text.
2131 It DOES end with a linebreak.
2132
2133 --simple boundary--
2134 This is the epilogue. It is also to be ignored.
2135
2136 The use of a Content-Type of multipart in a body part within
2137 another multipart entity is explicitly allowed. In such
2138 cases, for obvious reasons, care must be taken to ensure
2139 that each nested multipart entity must use a different
2140 boundary delimiter. See Appendix C for an example of nested
2141
2142
2143
2144 Borenstein & Freed [Page 32]
2145
2146
2147
2148
2149 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2150
2151
2152 multipart entities.
2153
2154 The use of the multipart Content-Type with only a single
2155 body part may be useful in certain contexts, and is
2156 explicitly permitted.
2157
2158 The only mandatory parameter for the multipart Content-Type
2159 is the boundary parameter, which consists of 1 to 70
2160 characters from a set of characters known to be very robust
2161 through email gateways, and NOT ending with white space.
2162 (If a boundary appears to end with white space, the white
2163 space must be presumed to have been added by a gateway, and
2164 should be deleted.) It is formally specified by the
2165 following BNF:
2166
2167 boundary := 0*69<bchars> bcharsnospace
2168
2169 bchars := bcharsnospace / " "
2170
2171 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" /
2172 "_"
2173 / "," / "-" / "." / "/" / ":" / "=" / "?"
2174
2175 Overall, the body of a multipart entity may be specified as
2176 follows:
2177
2178 multipart-body := preamble 1*encapsulation
2179 close-delimiter epilogue
2180
2181 encapsulation := delimiter CRLF body-part
2182
2183 delimiter := CRLF "--" boundary ; taken from Content-Type
2184 field.
2185 ; when content-type is
2186 multipart
2187 ; There must be no space
2188 ; between "--" and boundary.
2189
2190 close-delimiter := delimiter "--" ; Again, no space before
2191 "--"
2192
2193 preamble := *text ; to be ignored upon
2194 receipt.
2195
2196 epilogue := *text ; to be ignored upon
2197 receipt.
2198
2199 body-part = <"message" as defined in RFC 822,
2200 with all header fields optional, and with the
2201 specified delimiter not occurring anywhere in
2202 the message body, either on a line by itself
2203 or as a substring anywhere. Note that the
2204
2205
2206
2207
2208
2209 Borenstein & Freed [Page 33]
2210
2211
2212
2213
2214 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2215
2216
2217 semantics of a part differ from the semantics
2218 of a message, as described in the text.>
2219
2220 NOTE: Conspicuously missing from the multipart type is a
2221 notion of structured, related body parts. In general, it
2222 seems premature to try to standardize interpart structure
2223 yet. It is recommended that those wishing to provide a more
2224 structured or integrated multipart messaging facility should
2225 define a subtype of multipart that is syntactically
2226 identical, but that always expects the inclusion of a
2227 distinguished part that can be used to specify the structure
2228 and integration of the other parts, probably referring to
2229 them by their Content-ID field. If this approach is used,
2230 other implementations will not recognize the new subtype,
2231 but will treat it as the primary subtype (multipart/mixed)
2232 and will thus be able to show the user the parts that are
2233 recognized.
2234
2235 7.2.2 The Multipart/mixed (primary) subtype
2236
2237 The primary subtype for multipart, "mixed", is intended for
2238 use when the body parts are independent and intended to be
2239 displayed serially. Any multipart subtypes that an
2240 implementation does not recognize should be treated as being
2241 of subtype "mixed".
2242
2243 7.2.3 The Multipart/alternative subtype
2244
2245 The multipart/alternative type is syntactically identical to
2246 multipart/mixed, but the semantics are different. In
2247 particular, each of the parts is an "alternative" version of
2248 the same information. User agents should recognize that the
2249 content of the various parts are interchangeable. The user
2250 agent should either choose the "best" type based on the
2251 user's environment and preferences, or offer the user the
2252 available alternatives. In general, choosing the best type
2253 means displaying only the LAST part that can be displayed.
2254 This may be used, for example, to send mail in a fancy text
2255 format in such a way that it can easily be displayed
2256 anywhere:
2257
2258 From: Nathaniel Borenstein <nsb@bellcore.com>
2259 To: Ned Freed <ned@innosoft.com>
2260 Subject: Formatted text mail
2261 MIME-Version: 1.0
2262 Content-Type: multipart/alternative; boundary=boundary42
2263
2264
2265 --boundary42
2266 Content-Type: text/plain; charset=us-ascii
2267
2268 ...plain text version of message goes here....
2269
2270
2271
2272
2273
2274 Borenstein & Freed [Page 34]
2275
2276
2277
2278
2279 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2280
2281
2282 --boundary42
2283 Content-Type: text/richtext
2284
2285 .... richtext version of same message goes here ...
2286 --boundary42
2287 Content-Type: text/x-whatever
2288
2289 .... fanciest formatted version of same message goes here
2290 ...
2291 --boundary42--
2292
2293 In this example, users whose mail system understood the
2294 "text/x-whatever" format would see only the fancy version,
2295 while other users would see only the richtext or plain text
2296 version, depending on the capabilities of their system.
2297
2298 In general, user agents that compose multipart/alternative
2299 entities should place the body parts in increasing order of
2300 preference, that is, with the preferred format last. For
2301 fancy text, the sending user agent should put the plainest
2302 format first and the richest format last. Receiving user
2303 agents should pick and display the last format they are
2304 capable of displaying. In the case where one of the
2305 alternatives is itself of type "multipart" and contains
2306 unrecognized sub-parts, the user agent may choose either to
2307 show that alternative, an earlier alternative, or both.
2308
2309 NOTE: From an implementor's perspective, it might seem more
2310 sensible to reverse this ordering, and have the plainest
2311 alternative last. However, placing the plainest alternative
2312 first is the friendliest possible option when
2313 mutlipart/alternative entities are viewed using a non-MIME-
2314 compliant mail reader. While this approach does impose some
2315 burden on compliant mail readers, interoperability with
2316 older mail readers was deemed to be more important in this
2317 case.
2318
2319 It may be the case that some user agents, if they can
2320 recognize more than one of the formats, will prefer to offer
2321 the user the choice of which format to view. This makes
2322 sense, for example, if mail includes both a nicely-formatted
2323 image version and an easily-edited text version. What is
2324 most critical, however, is that the user not automatically
2325 be shown multiple versions of the same data. Either the
2326 user should be shown the last recognized version or should
2327 explicitly be given the choice.
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339 Borenstein & Freed [Page 35]
2340
2341
2342
2343
2344 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2345
2346
2347 7.2.4 The Multipart/digest subtype
2348
2349 This document defines a "digest" subtype of the multipart
2350 Content-Type. This type is syntactically identical to
2351 multipart/mixed, but the semantics are different. In
2352 particular, in a digest, the default Content-Type value for
2353 a body part is changed from "text/plain" to
2354 "message/rfc822". This is done to allow a more readable
2355 digest format that is largely compatible (except for the
2356 quoting convention) with RFC 934.
2357
2358 A digest in this format might, then, look something like
2359 this:
2360
2361 From: Moderator-Address
2362 MIME-Version: 1.0
2363 Subject: Internet Digest, volume 42
2364 Content-Type: multipart/digest;
2365 boundary="---- next message ----"
2366
2367
2368 ------ next message ----
2369
2370 From: someone-else
2371 Subject: my opinion
2372
2373 ...body goes here ...
2374
2375 ------ next message ----
2376
2377 From: someone-else-again
2378 Subject: my different opinion
2379
2380 ... another body goes here...
2381
2382 ------ next message ------
2383
2384 7.2.5 The Multipart/parallel subtype
2385
2386 This document defines a "parallel" subtype of the multipart
2387 Content-Type. This type is syntactically identical to
2388 multipart/mixed, but the semantics are different. In
2389 particular, in a parallel entity, all of the parts are
2390 intended to be presented in parallel, i.e., simultaneously,
2391 on hardware and software that are capable of doing so.
2392 Composing agents should be aware that many mail readers will
2393 lack this capability and will show the parts serially in any
2394 event.
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404 Borenstein & Freed [Page 36]
2405
2406
2407
2408
2409 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2410
2411
2412 7.3 The Message Content-Type
2413
2414 It is frequently desirable, in sending mail, to encapsulate
2415 another mail message. For this common operation, a special
2416 Content-Type, "message", is defined. The primary subtype,
2417 message/rfc822, has no required parameters in the Content-
2418 Type field. Additional subtypes, "partial" and "External-
2419 body", do have required parameters. These subtypes are
2420 explained below.
2421
2422 NOTE: It has been suggested that subtypes of message might
2423 be defined for forwarded or rejected messages. However,
2424 forwarded and rejected messages can be handled as multipart
2425 messages in which the first part contains any control or
2426 descriptive information, and a second part, of type
2427 message/rfc822, is the forwarded or rejected message.
2428 Composing rejection and forwarding messages in this manner
2429 will preserve the type information on the original message
2430 and allow it to be correctly presented to the recipient, and
2431 hence is strongly encouraged.
2432
2433 As stated in the definition of the Content-Transfer-Encoding
2434 field, no encoding other than "7bit", "8bit", or "binary" is
2435 permitted for messages or parts of type "message". The
2436 message header fields are always US-ASCII in any case, and
2437 data within the body can still be encoded, in which case the
2438 Content-Transfer-Encoding header field in the encapsulated
2439 message will reflect this. Non-ASCII text in the headers of
2440 an encapsulated message can be specified using the
2441 mechanisms described in [RFC-1342].
2442
2443 Mail gateways, relays, and other mail handling agents are
2444 commonly known to alter the top-level header of an RFC 822
2445 message. In particular, they frequently add, remove, or
2446 reorder header fields. Such alterations are explicitly
2447 forbidden for the encapsulated headers embedded in the
2448 bodies of messages of type "message."
2449
2450 7.3.1 The Message/rfc822 (primary) subtype
2451
2452 A Content-Type of "message/rfc822" indicates that the body
2453 contains an encapsulated message, with the syntax of an RFC
2454 822 message.
2455
2456 7.3.2 The Message/Partial subtype
2457
2458 A subtype of message, "partial", is defined in order to
2459 allow large objects to be delivered as several separate
2460 pieces of mail and automatically reassembled by the
2461 receiving user agent. (The concept is similar to IP
2462 fragmentation/reassembly in the basic Internet Protocols.)
2463 This mechanism can be used when intermediate transport
2464 agents limit the size of individual messages that can be
2465 sent. Content-Type "message/partial" thus indicates that
2466
2467
2468
2469 Borenstein & Freed [Page 37]
2470
2471
2472
2473
2474 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2475
2476
2477 the body contains a fragment of a larger message.
2478
2479 Three parameters must be specified in the Content-Type field
2480 of type message/partial: The first, "id", is a unique
2481 identifier, as close to a world-unique identifier as
2482 possible, to be used to match the parts together. (In
2483 general, the identifier is essentially a message-id; if
2484 placed in double quotes, it can be any message-id, in
2485 accordance with the BNF for "parameter" given earlier in
2486 this specification.) The second, "number", an integer, is
2487 the part number, which indicates where this part fits into
2488 the sequence of fragments. The third, "total", another
2489 integer, is the total number of parts. This third subfield
2490 is required on the final part, and is optional on the
2491 earlier parts. Note also that these parameters may be given
2492 in any order.
2493
2494 Thus, part 2 of a 3-part message may have either of the
2495 following header fields:
2496
2497 Content-Type: Message/Partial;
2498 number=2; total=3;
2499 id="oc=jpbe0M2Yt4s@thumper.bellcore.com";
2500
2501 Content-Type: Message/Partial;
2502 id="oc=jpbe0M2Yt4s@thumper.bellcore.com";
2503 number=2
2504
2505 But part 3 MUST specify the total number of parts:
2506
2507 Content-Type: Message/Partial;
2508 number=3; total=3;
2509 id="oc=jpbe0M2Yt4s@thumper.bellcore.com";
2510
2511 Note that part numbering begins with 1, not 0.
2512
2513 When the parts of a message broken up in this manner are put
2514 together, the result is a complete RFC 822 format message,
2515 which may have its own Content-Type header field, and thus
2516 may contain any other data type.
2517
2518 Message fragmentation and reassembly: The semantics of a
2519 reassembled partial message must be those of the "inner"
2520 message, rather than of a message containing the inner
2521 message. This makes it possible, for example, to send a
2522 large audio message as several partial messages, and still
2523 have it appear to the recipient as a simple audio message
2524 rather than as an encapsulated message containing an audio
2525 message. That is, the encapsulation of the message is
2526 considered to be "transparent".
2527
2528 When generating and reassembling the parts of a
2529 message/partial message, the headers of the encapsulated
2530 message must be merged with the headers of the enclosing
2531
2532
2533
2534 Borenstein & Freed [Page 38]
2535
2536
2537
2538
2539 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2540
2541
2542 entities. In this process the following rules must be
2543 observed:
2544
2545 (1) All of the headers from the initial enclosing
2546 entity (part one), except those that start with
2547 "Content-" and "Message-ID", must be copied, in
2548 order, to the new message.
2549
2550 (2) Only those headers in the enclosed message
2551 which start with "Content-" and "Message-ID" must
2552 be appended, in order, to the headers of the new
2553 message. Any headers in the enclosed message
2554 which do not start with "Content-" (except for
2555 "Message-ID") will be ignored.
2556
2557 (3) All of the headers from the second and any
2558 subsequent messages will be ignored.
2559
2560 For example, if an audio message is broken into two parts,
2561 the first part might look something like this:
2562
2563 X-Weird-Header-1: Foo
2564 From: Bill@host.com
2565 To: joe@otherhost.com
2566 Subject: Audio mail
2567 Message-ID: id1@host.com
2568 MIME-Version: 1.0
2569 Content-type: message/partial;
2570 id="ABC@host.com";
2571 number=1; total=2
2572
2573 X-Weird-Header-1: Bar
2574 X-Weird-Header-2: Hello
2575 Message-ID: anotherid@foo.com
2576 Content-type: audio/basic
2577 Content-transfer-encoding: base64
2578
2579 ... first half of encoded audio data goes here...
2580
2581 and the second half might look something like this:
2582
2583 From: Bill@host.com
2584 To: joe@otherhost.com
2585 Subject: Audio mail
2586 MIME-Version: 1.0
2587 Message-ID: id2@host.com
2588 Content-type: message/partial;
2589 id="ABC@host.com"; number=2; total=2
2590
2591 ... second half of encoded audio data goes here...
2592
2593 Then, when the fragmented message is reassembled, the
2594 resulting message to be displayed to the user should look
2595 something like this:
2596
2597
2598
2599 Borenstein & Freed [Page 39]
2600
2601
2602
2603
2604 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2605
2606
2607 X-Weird-Header-1: Foo
2608 From: Bill@host.com
2609 To: joe@otherhost.com
2610 Subject: Audio mail
2611 Message-ID: anotherid@foo.com
2612 MIME-Version: 1.0
2613 Content-type: audio/basic
2614 Content-transfer-encoding: base64
2615
2616 ... first half of encoded audio data goes here...
2617 ... second half of encoded audio data goes here...
2618
2619 It should be noted that, because some message transfer
2620 agents may choose to automatically fragment large messages,
2621 and because such agents may use different fragmentation
2622 thresholds, it is possible that the pieces of a partial
2623 message, upon reassembly, may prove themselves to comprise a
2624 partial message. This is explicitly permitted.
2625
2626 It should also be noted that the inclusion of a "References"
2627 field in the headers of the second and subsequent pieces of
2628 a fragmented message that references the Message-Id on the
2629 previous piece may be of benefit to mail readers that
2630 understand and track references. However, the generation of
2631 such "References" fields is entirely optional.
2632
2633 7.3.3 The Message/External-Body subtype
2634
2635 The external-body subtype indicates that the actual body
2636 data are not included, but merely referenced. In this case,
2637 the parameters describe a mechanism for accessing the
2638 external data.
2639
2640 When a message body or body part is of type
2641 "message/external-body", it consists of a header, two
2642 consecutive CRLFs, and the message header for the
2643 encapsulated message. If another pair of consecutive CRLFs
2644 appears, this of course ends the message header for the
2645 encapsulated message. However, since the encapsulated
2646 message's body is itself external, it does NOT appear in the
2647 area that follows. For example, consider the following
2648 message:
2649
2650 Content-type: message/external-body; access-
2651 type=local-file;
2652 name=/u/nsb/Me.gif
2653
2654 Content-type: image/gif
2655
2656 THIS IS NOT REALLY THE BODY!
2657
2658 The area at the end, which might be called the "phantom
2659 body", is ignored for most external-body messages. However,
2660 it may be used to contain auxilliary information for some
2661
2662
2663
2664 Borenstein & Freed [Page 40]
2665
2666
2667
2668
2669 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2670
2671
2672 such messages, as indeed it is when the access-type is
2673 "mail-server". Of the access-types defined by this
2674 document, the phantom body is used only when the access-type
2675 is "mail-server". In all other cases, the phantom body is
2676 ignored.
2677
2678 The only always-mandatory parameter for message/external-
2679 body is "access-type"; all of the other parameters may be
2680 mandatory or optional depending on the value of access-type.
2681
2682 ACCESS-TYPE -- One or more case-insensitive words,
2683 comma-separated, indicating supported access
2684 mechanisms by which the file or data may be
2685 obtained. Values include, but are not limited to,
2686 "FTP", "ANON-FTP", "TFTP", "AFS", "LOCAL-FILE",
2687 and "MAIL-SERVER". Future values, except for
2688 experimental values beginning with "X-", must be
2689 registered with IANA, as described in Appendix F .
2690
2691 In addition, the following two parameters are optional for
2692 ALL access-types:
2693
2694 EXPIRATION -- The date (in the RFC 822 "date-time"
2695 syntax, as extended by RFC 1123 to permit 4 digits
2696 in the date field) after which the existence of
2697 the external data is not guaranteed.
2698
2699 SIZE -- The size (in octets) of the data. The
2700 intent of this parameter is to help the recipient
2701 decide whether or not to expend the necessary
2702 resources to retrieve the external data.
2703
2704 PERMISSION -- A field that indicates whether or
2705 not it is expected that clients might also attempt
2706 to overwrite the data. By default, or if
2707 permission is "read", the assumption is that they
2708 are not, and that if the data is retrieved once,
2709 it is never needed again. If PERMISSION is "read-
2710 write", this assumption is invalid, and any local
2711 copy must be considered no more than a cache.
2712 "Read" and "Read-write" are the only defined
2713 values of permission.
2714
2715 The precise semantics of the access-types defined here are
2716 described in the sections that follow.
2717
2718 7.3.3.1 The "ftp" and "tftp" access-types
2719
2720 An access-type of FTP or TFTP indicates that the message
2721 body is accessible as a file using the FTP [RFC-959] or TFTP
2722 [RFC-783] protocols, respectively. For these access-types,
2723 the following additional parameters are mandatory:
2724
2725
2726
2727
2728
2729 Borenstein & Freed [Page 41]
2730
2731
2732
2733
2734 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2735
2736
2737 NAME -- The name of the file that contains the
2738 actual body data.
2739
2740 SITE -- A machine from which the file may be
2741 obtained, using the given protocol
2742
2743 Before the data is retrieved, using these protocols, the
2744 user will generally need to be asked to provide a login id
2745 and a password for the machine named by the site parameter.
2746
2747 In addition, the following optional parameters may also
2748 appear when the access-type is FTP or ANON-FTP:
2749
2750 DIRECTORY -- A directory from which the data named
2751 by NAME should be retrieved.
2752
2753 MODE -- A transfer mode for retrieving the
2754 information, e.g. "image".
2755
2756 7.3.3.2 The "anon-ftp" access-type
2757
2758 The "anon-ftp" access-type is identical to the "ftp" access
2759 type, except that the user need not be asked to provide a
2760 name and password for the specified site. Instead, the ftp
2761 protocol will be used with login "anonymous" and a password
2762 that corresponds to the user's email address.
2763
2764 7.3.3.3 The "local-file" and "afs" access-types
2765
2766 An access-type of "local-file" indicates that the actual
2767 body is accessible as a file on the local machine. An
2768 access-type of "afs" indicates that the file is accessible
2769 via the global AFS file system. In both cases, only a
2770 single parameter is required:
2771
2772 NAME -- The name of the file that contains the
2773 actual body data.
2774
2775 The following optional parameter may be used to describe the
2776 locality of reference for the data, that is, the site or
2777 sites at which the file is expected to be visible:
2778
2779 SITE -- A domain specifier for a machine or set of
2780 machines that are known to have access to the data
2781 file. Asterisks may be used for wildcard matching
2782 to a part of a domain name, such as
2783 "*.bellcore.com", to indicate a set of machines on
2784 which the data should be directly visible, while a
2785 single asterisk may be used to indicate a file
2786 that is expected to be universally available,
2787 e.g., via a global file system.
2788
2789 7.3.3.4 The "mail-server" access-type
2790
2791
2792
2793
2794 Borenstein & Freed [Page 42]
2795
2796
2797
2798
2799 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2800
2801
2802 The "mail-server" access-type indicates that the actual body
2803 is available from a mail server. The mandatory parameter
2804 for this access-type is:
2805
2806 SERVER -- The email address of the mail server
2807 from which the actual body data can be obtained.
2808
2809 Because mail servers accept a variety of syntax, some of
2810 which is multiline, the full command to be sent to a mail
2811 server is not included as a parameter on the content-type
2812 line. Instead, it may be provided as the "phantom body"
2813 when the content-type is message/external-body and the
2814 access-type is mail-server.
2815
2816 Note that MIME does not define a mail server syntax.
2817 Rather, it allows the inclusion of arbitrary mail server
2818 commands in the phantom body. Implementations should
2819 include the phantom body in the body of the message it sends
2820 to the mail server address to retrieve the relevant data.
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859 Borenstein & Freed [Page 43]
2860
2861
2862
2863
2864 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2865
2866
2867 7.3.3.5 Examples and Further Explanations
2868
2869 With the emerging possibility of very wide-area file
2870 systems, it becomes very hard to know in advance the set of
2871 machines where a file will and will not be accessible
2872 directly from the file system. Therefore it may make sense
2873 to provide both a file name, to be tried directly, and the
2874 name of one or more sites from which the file is known to be
2875 accessible. An implementation can try to retrieve remote
2876 files using FTP or any other protocol, using anonymous file
2877 retrieval or prompting the user for the necessary name and
2878 password. If an external body is accessible via multiple
2879 mechanisms, the sender may include multiple parts of type
2880 message/external-body within an entity of type
2881 multipart/alternative.
2882
2883 However, the external-body mechanism is not intended to be
2884 limited to file retrieval, as shown by the mail-server
2885 access-type. Beyond this, one can imagine, for example,
2886 using a video server for external references to video clips.
2887
2888 If an entity is of type "message/external-body", then the
2889 body of the entity will contain the header fields of the
2890 encapsulated message. The body itself is to be found in the
2891 external location. This means that if the body of the
2892 "message/external-body" message contains two consecutive
2893 CRLFs, everything after those pairs is NOT part of the
2894 message itself. For most message/external-body messages,
2895 this trailing area must simply be ignored. However, it is a
2896 convenient place for additional data that cannot be included
2897 in the content-type header field. In particular, if the
2898 "access-type" value is "mail-server", then the trailing area
2899 must contain commands to be sent to the mail server at the
2900 address given by NAME@SITE, where NAME and SITE are the
2901 values of the NAME and SITE parameters, respectively.
2902
2903 The embedded message header fields which appear in the body
2904 of the message/external-body data can be used to declare the
2905 Content-type of the external body. Thus a complete
2906 message/external-body message, referring to a document in
2907 PostScript format, might look like this:
2908
2909 From: Whomever
2910 Subject: whatever
2911 MIME-Version: 1.0
2912 Message-ID: id1@host.com
2913 Content-Type: multipart/alternative; boundary=42
2914
2915
2916 --42
2917 Content-Type: message/external-body;
2918 name="BodyFormats.ps";
2919
2920
2921
2922
2923
2924 Borenstein & Freed [Page 44]
2925
2926
2927
2928
2929 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2930
2931
2932 site="thumper.bellcore.com";
2933 access-type=ANON-FTP;
2934 directory="pub";
2935 mode="image";
2936 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
2937
2938 Content-type: application/postscript
2939
2940 --42
2941 Content-Type: message/external-body;
2942 name="/u/nsb/writing/rfcs/RFC-XXXX.ps";
2943 site="thumper.bellcore.com";
2944 access-type=AFS
2945 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
2946
2947 Content-type: application/postscript
2948
2949 --42
2950 Content-Type: message/external-body;
2951 access-type=mail-server
2952 server="listserv@bogus.bitnet";
2953 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
2954
2955 Content-type: application/postscript
2956
2957 get rfc-xxxx doc
2958
2959 --42--
2960
2961 Like the message/partial type, the message/external-body
2962 type is intended to be transparent, that is, to convey the
2963 data type in the external body rather than to convey a
2964 message with a body of that type. Thus the headers on the
2965 outer and inner parts must be merged using the same rules as
2966 for message/partial. In particular, this means that the
2967 Content-type header is overridden, but the From and Subject
2968 headers are preserved.
2969
2970 Note that since the external bodies are not transported as
2971 mail, they need not conform to the 7-bit and line length
2972 requirements, but might in fact be binary files. Thus a
2973 Content-Transfer-Encoding is not generally necessary, though
2974 it is permitted.
2975
2976 Note that the body of a message of type "message/external-
2977 body" is governed by the basic syntax for an RFC 822
2978 message. In particular, anything before the first
2979 consecutive pair of CRLFs is header information, while
2980 anything after it is body information, which is ignored for
2981 most access-types.
2982
2983
2984
2985
2986
2987
2988
2989 Borenstein & Freed [Page 45]
2990
2991
2992
2993
2994 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
2995
2996
2997 7.4 The Application Content-Type
2998
2999 The "application" Content-Type is to be used for data which
3000 do not fit in any of the other categories, and particularly
3001 for data to be processed by mail-based uses of application
3002 programs. This is information which must be processed by an
3003 application before it is viewable or usable to a user.
3004 Expected uses for Content-Type application include mail-
3005 based file transfer, spreadsheets, data for mail-based
3006 scheduling systems, and languages for "active"
3007 (computational) email. (The latter, in particular, can pose
3008 security problems which should be understood by
3009 implementors, and are considered in detail in the discussion
3010 of the application/PostScript content-type.)
3011
3012 For example, a meeting scheduler might define a standard
3013 representation for information about proposed meeting dates.
3014 An intelligent user agent would use this information to
3015 conduct a dialog with the user, and might then send further
3016 mail based on that dialog. More generally, there have been
3017 several "active" messaging languages developed in which
3018 programs in a suitably specialized language are sent through
3019 the mail and automatically run in the recipient's
3020 environment.
3021
3022 Such applications may be defined as subtypes of the
3023 "application" Content-Type. This document defines three
3024 subtypes: octet-stream, ODA, and PostScript.
3025
3026 In general, the subtype of application will often be the
3027 name of the application for which the data are intended.
3028 This does not mean, however, that any application program
3029 name may be used freely as a subtype of application. Such
3030 usages must be registered with IANA, as described in
3031 Appendix F.
3032
3033 7.4.1 The Application/Octet-Stream (primary) subtype
3034
3035 The primary subtype of application, "octet-stream", may be
3036 used to indicate that a body contains binary data. The set
3037 of possible parameters includes, but is not limited to:
3038
3039 NAME -- a suggested name for the binary data if
3040 stored as a file.
3041
3042 TYPE -- the general type or category of binary
3043 data. This is intended as information for the
3044 human recipient rather than for any automatic
3045 processing.
3046
3047 CONVERSIONS -- the set of operations that have
3048 been performed on the data before putting it in
3049 the mail (and before any Content-Transfer-Encoding
3050 that might have been applied). If multiple
3051
3052
3053
3054 Borenstein & Freed [Page 46]
3055
3056
3057
3058
3059 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3060
3061
3062 conversions have occurred, they must be separated
3063 by commas and specified in the order they were
3064 applied -- that is, the leftmost conversion must
3065 have occurred first, and conversions are undone
3066 from right to left. Note that NO conversion
3067 values are defined by this document. Any
3068 conversion values that that do not begin with "X-"
3069 must be preceded by a published specification and
3070 by registration with IANA, as described in
3071 Appendix F.
3072
3073 PADDING -- the number of bits of padding that were
3074 appended to the bitstream comprising the actual
3075 contents to produce the enclosed byte-oriented
3076 data. This is useful for enclosing a bitstream in
3077 a body when the total number of bits is not a
3078 multiple of the byte size.
3079
3080 The values for these attributes are left undefined at
3081 present, but may require specification in the future. An
3082 example of a common (though UNIX-specific) usage might be:
3083
3084 Content-Type: application/octet-stream;
3085 name=foo.tar.Z; type=tar;
3086 conversions="x-encrypt,x-compress"
3087
3088 However, it should be noted that the use of such conversions
3089 is explicitly discouraged due to a lack of portability and
3090 standardization. The use of uuencode is particularly
3091 discouraged, in favor of the Content-Transfer-Encoding
3092 mechanism, which is both more standardized and more portable
3093 across mail boundaries.
3094
3095 The recommended action for an implementation that receives
3096 application/octet-stream mail is to simply offer to put the
3097 data in a file, with any Content-Transfer-Encoding undone,
3098 or perhaps to use it as input to a user-specified process.
3099
3100 To reduce the danger of transmitting rogue programs through
3101 the mail, it is strongly recommended that implementations
3102 NOT implement a path-search mechanism whereby an arbitrary
3103 program named in the Content-Type parameter (e.g., an
3104 "interpreter=" parameter) is found and executed using the
3105 mail body as input.
3106
3107 7.4.2 The Application/PostScript subtype
3108
3109 A Content-Type of "application/postscript" indicates a
3110 PostScript program. The language is defined in
3111 [POSTSCRIPT]. It is recommended that Postscript as sent
3112 through email should use Postscript document structuring
3113 conventions if at all possible, and correctly.
3114
3115
3116
3117
3118
3119 Borenstein & Freed [Page 47]
3120
3121
3122
3123
3124 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3125
3126
3127 The execution of general-purpose PostScript interpreters
3128 entails serious security risks, and implementors are
3129 discouraged from simply sending PostScript email bodies to
3130 "off-the-shelf" interpreters. While it is usually safe to
3131 send PostScript to a printer, where the potential for harm
3132 is greatly constrained, implementors should consider all of
3133 the following before they add interactive display of
3134 PostScript bodies to their mail readers.
3135
3136 The remainder of this section outlines some, though probably
3137 not all, of the possible problems with sending PostScript
3138 through the mail.
3139
3140 Dangerous operations in the PostScript language include, but
3141 may not be limited to, the PostScript operators deletefile,
3142 renamefile, filenameforall, and file. File is only
3143 dangerous when applied to something other than standard
3144 input or output. Implementations may also define additional
3145 nonstandard file operators; these may also pose a threat to
3146 security. Filenameforall, the wildcard file search
3147 operator, may appear at first glance to be harmless. Note,
3148 however, that this operator has the potential to reveal
3149 information about what files the recipient has access to,
3150 and this information may itself be sensitive. Message
3151 senders should avoid the use of potentially dangerous file
3152 operators, since these operators are quite likely to be
3153 unavailable in secure PostScript implementations. Message-
3154 receiving and -displaying software should either completely
3155 disable all potentially dangerous file operators or take
3156 special care not to delegate any special authority to their
3157 operation. These operators should be viewed as being done by
3158 an outside agency when interpreting PostScript documents.
3159 Such disabling and/or checking should be done completely
3160 outside of the reach of the PostScript language itself; care
3161 should be taken to insure that no method exists for
3162 reenabling full-function versions of these operators.
3163
3164 The PostScript language provides facilities for exiting the
3165 normal interpreter, or server, loop. Changes made in this
3166 "outer" environment are customarily retained across
3167 documents, and may in some cases be retained semipermanently
3168 in nonvolatile memory. The operators associated with exiting
3169 the interpreter loop have the potential to interfere with
3170 subsequent document processing. As such, their unrestrained
3171 use constitutes a threat of service denial. PostScript
3172 operators that exit the interpreter loop include, but may
3173 not be limited to, the exitserver and startjob operators.
3174 Message-sending software should not generate PostScript that
3175 depends on exiting the interpreter loop to operate. The
3176 ability to exit will probably be unavailable in secure
3177 PostScript implementations. Message-receiving and
3178 -displaying software should, if possible, disable the
3179 ability to make retained changes to the PostScript
3180 environment. Eliminate the startjob and exitserver commands.
3181
3182
3183
3184 Borenstein & Freed [Page 48]
3185
3186
3187
3188
3189 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3190
3191
3192 If these commands cannot be eliminated, at least set the
3193 password associated with them to a hard-to-guess value.
3194
3195 PostScript provides operators for setting system-wide and
3196 device-specific parameters. These parameter settings may be
3197 retained across jobs and may potentially pose a threat to
3198 the correct operation of the interpreter. The PostScript
3199 operators that set system and device parameters include, but
3200 may not be limited to, the setsystemparams and setdevparams
3201 operators. Message-sending software should not generate
3202 PostScript that depends on the setting of system or device
3203 parameters to operate correctly. The ability to set these
3204 parameters will probably be unavailable in secure PostScript
3205 implementations. Message-receiving and -displaying software
3206 should, if possible, disable the ability to change system
3207 and device parameters. If these operators cannot be
3208 disabled, at least set the password associated with them to
3209 a hard-to-guess value.
3210
3211 Some PostScript implementations provide nonstandard
3212 facilities for the direct loading and execution of machine
3213 code. Such facilities are quite obviously open to
3214 substantial abuse. Message-sending software should not
3215 make use of such features. Besides being totally hardware-
3216 specific, they are also likely to be unavailable in secure
3217 implementations of PostScript. Message-receiving and
3218 -displaying software should not allow such operators to be
3219 used if they exist.
3220
3221 PostScript is an extensible language, and many, if not most,
3222 implementations of it provide a number of their own
3223 extensions. This document does not deal with such extensions
3224 explicitly since they constitute an unknown factor.
3225 Message-sending software should not make use of nonstandard
3226 extensions; they are likely to be missing from some
3227 implementations. Message-receiving and -displaying software
3228 should make sure that any nonstandard PostScript operators
3229 are secure and don't present any kind of threat.
3230
3231 It is possible to write PostScript that consumes huge
3232 amounts of various system resources. It is also possible to
3233 write PostScript programs that loop infinitely. Both types
3234 of programs have the potential to cause damage if sent to
3235 unsuspecting recipients. Message-sending software should
3236 avoid the construction and dissemination of such programs,
3237 which is antisocial. Message-receiving and -displaying
3238 software should provide appropriate mechanisms to abort
3239 processing of a document after a reasonable amount of time
3240 has elapsed. In addition, PostScript interpreters should be
3241 limited to the consumption of only a reasonable amount of
3242 any given system resource.
3243
3244 Finally, bugs may exist in some PostScript interpreters
3245 which could possibly be exploited to gain unauthorized
3246
3247
3248
3249 Borenstein & Freed [Page 49]
3250
3251
3252
3253
3254 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3255
3256
3257 access to a recipient's system. Apart from noting this
3258 possibility, there is no specific action to take to prevent
3259 this, apart from the timely correction of such bugs if any
3260 are found.
3261
3262 7.4.3 The Application/ODA subtype
3263
3264 The "ODA" subtype of application is used to indicate that a
3265 body contains information encoded according to the Office
3266 Document Architecture [ODA] standards, using the ODIF
3267 representation format. For application/oda, the Content-
3268 Type line should also specify an attribute/value pair that
3269 indicates the document application profile (DAP), using the
3270 key word "profile". Thus an appropriate header field might
3271 look like this:
3272
3273 Content-Type: application/oda; profile=Q112
3274
3275 Consult the ODA standard [ODA] for further information.
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314 Borenstein & Freed [Page 50]
3315
3316
3317
3318
3319 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3320
3321
3322 7.5 The Image Content-Type
3323
3324 A Content-Type of "image" indicates that the bodycontains an
3325 image. The subtype names the specific image format. These
3326 names are case insensitive. Two initial subtypes are "jpeg"
3327 for the JPEG format, JFIF encoding, and "gif" for GIF format
3328 [GIF].
3329
3330 The list of image subtypes given here is neither exclusive
3331 nor exhaustive, and is expected to grow as more types are
3332 registered with IANA, as described in Appendix F.
3333
3334 7.6 The Audio Content-Type
3335
3336 A Content-Type of "audio" indicates that the body contains
3337 audio data. Although there is not yet a consensus on an
3338 "ideal" audio format for use with computers, there is a
3339 pressing need for a format capable of providing
3340 interoperable behavior.
3341
3342 The initial subtype of "basic" is specified to meet this
3343 requirement by providing an absolutely minimal lowest common
3344 denominator audio format. It is expected that richer
3345 formats for higher quality and/or lower bandwidth audio will
3346 be defined by a later document.
3347
3348 The content of the "audio/basic" subtype is audio encoded
3349 using 8-bit ISDN u-law [PCM]. When this subtype is present,
3350 a sample rate of 8000 Hz and a single channel is assumed.
3351
3352 7.7 The Video Content-Type
3353
3354 A Content-Type of "video" indicates that the body contains a
3355 time-varying-picture image, possibly with color and
3356 coordinated sound. The term "video" is used extremely
3357 generically, rather than with reference to any particular
3358 technology or format, and is not meant to preclude subtypes
3359 such as animated drawings encoded compactly. The subtype
3360 "mpeg" refers to video coded according to the MPEG standard
3361 [MPEG].
3362
3363 Note that although in general this document strongly
3364 discourages the mixing of multiple media in a single body,
3365 it is recognized that many so-called "video" formats include
3366 a representation for synchronized audio, and this is
3367 explicitly permitted for subtypes of "video".
3368
3369 7.8 Experimental Content-Type Values
3370
3371 A Content-Type value beginning with the characters "X-" is a
3372 private value, to be used by consenting mail systems by
3373 mutual agreement. Any format without a rigorous and public
3374 definition must be named with an "X-" prefix, and publicly
3375 specified values shall never begin with "X-". (Older
3376
3377
3378
3379 Borenstein & Freed [Page 51]
3380
3381
3382
3383
3384 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3385
3386
3387 versions of the widely-used Andrew system use the "X-BE2"
3388 name, so new systems should probably choose a different
3389 name.)
3390
3391 In general, the use of "X-" top-level types is strongly
3392 discouraged. Implementors should invent subtypes of the
3393 existing types whenever possible. The invention of new
3394 types is intended to be restricted primarily to the
3395 development of new media types for email, such as digital
3396 odors or holography, and not for new data formats in
3397 general. In many cases, a subtype of application will be
3398 more appropriate than a new top-level type.
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
3439
3440
3441
3442
3443
3444 Borenstein & Freed [Page 52]
3445
3446
3447
3448
3449 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3450
3451
3452 Summary
3453
3454 Using the MIME-Version, Content-Type, and Content-Transfer-
3455 Encoding header fields, it is possible to include, in a
3456 standardized way, arbitrary types of data objects with RFC
3457 822 conformant mail messages. No restrictions imposed by
3458 either RFC 821 or RFC 822 are violated, and care has been
3459 taken to avoid problems caused by additional restrictions
3460 imposed by the characteristics of some Internet mail
3461 transport mechanisms (see Appendix B). The "multipart" and
3462 "message" Content-Types allow mixing and hierarchical
3463 structuring of objects of different types in a single
3464 message. Further Content-Types provide a standardized
3465 mechanism for tagging messages or body parts as audio,
3466 image, or several other kinds of data. A distinguished
3467 parameter syntax allows further specification of data format
3468 details, particularly the specification of alternate
3469 character sets. Additional optional header fields provide
3470 mechanisms for certain extensions deemed desirable by many
3471 implementors. Finally, a number of useful Content-Types are
3472 defined for general use by consenting user agents, notably
3473 text/richtext, message/partial, and message/external-body.
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509 Borenstein & Freed [Page 53]
3510
3511
3512
3513
3514 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3515
3516
3517 Acknowledgements
3518
3519 This document is the result of the collective effort of a
3520 large number of people, at several IETF meetings, on the
3521 IETF-SMTP and IETF-822 mailing lists, and elsewhere.
3522 Although any enumeration seems doomed to suffer from
3523 egregious omissions, the following are among the many
3524 contributors to this effort:
3525
3526 Harald Tveit Alvestrand Timo Lehtinen
3527 Randall Atkinson John R. MacMillan
3528 Philippe Brandon Rick McGowan
3529 Kevin Carosso Leo Mclaughlin
3530 Uhhyung Choi Goli Montaser-Kohsari
3531 Cristian Constantinof Keith Moore
3532 Mark Crispin Tom Moore
3533 Dave Crocker Erik Naggum
3534 Terry Crowley Mark Needleman
3535 Walt Daniels John Noerenberg
3536 Frank Dawson Mats Ohrman
3537 Hitoshi Doi Julian Onions
3538 Kevin Donnelly Michael Patton
3539 Keith Edwards David J. Pepper
3540 Chris Eich Blake C. Ramsdell
3541 Johnny Eriksson Luc Rooijakkers
3542 Craig Everhart Marshall T. Rose
3543 Patrik Faeltstroem Jonathan Rosenberg
3544 Erik E. Fair Jan Rynning
3545 Roger Fajman Harri Salminen
3546 Alain Fontaine Michael Sanderson
3547 James M. Galvin Masahiro Sekiguchi
3548 Philip Gladstone Mark Sherman
3549 Thomas Gordon Keld Simonsen
3550 Phill Gross Bob Smart
3551 James Hamilton Peter Speck
3552 Steve Hardcastle-Kille Henry Spencer
3553 David Herron Einar Stefferud
3554 Bruce Howard Michael Stein
3555 Bill Janssen Klaus Steinberger
3556 Olle Jaernefors Peter Svanberg
3557 Risto Kankkunen James Thompson
3558 Phil Karn Steve Uhler
3559 Alan Katz Stuart Vance
3560 Tim Kehres Erik van der Poel
3561 Neil Katin Guido van Rossum
3562 Kyuho Kim Peter Vanderbilt
3563 Anders Klemets Greg Vaudreuil
3564 John Klensin Ed Vielmetti
3565 Valdis Kletniek Ryan Waldron
3566 Jim Knowles Wally Wedel
3567 Stev Knowles Sven-Ove Westberg
3568 Bob Kummerfeld Brian Wideen
3569
3570
3571
3572
3573
3574 Borenstein & Freed [Page 54]
3575
3576
3577
3578
3579 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3580
3581
3582 Pekka Kytolaakso John Wobus
3583 Stellan Lagerstr.m Glenn Wright
3584 Vincent Lau Rayan Zachariassen
3585 Donald Lindsay David Zimmerman
3586 The authors apologize for any omissions from this list,
3587 which are certainly unintentional.
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639 Borenstein & Freed [Page 55]
3640
3641
3642
3643
3644 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3645
3646
3647 Appendix A -- Minimal MIME-Conformance
3648
3649 The mechanisms described in this document are open-ended.
3650 It is definitely not expected that all implementations will
3651 support all of the Content-Types described, nor that they
3652 will all share the same extensions. In order to promote
3653 interoperability, however, it is useful to define the
3654 concept of "MIME-conformance" to define a certain level of
3655 implementation that allows the useful interworking of
3656 messages with content that differs from US ASCII text. In
3657 this section, we specify the requirements for such
3658 conformance.
3659
3660 A mail user agent that is MIME-conformant MUST:
3661
3662 1. Always generate a "MIME-Version: 1.0" header
3663 field.
3664
3665 2. Recognize the Content-Transfer-Encoding header
3666 field, and decode all received data encoded with
3667 either the quoted-printable or base64
3668 implementations. Encode any data sent that is
3669 not in seven-bit mail-ready representation using
3670 one of these transformations and include the
3671 appropriate Content-Transfer-Encoding header
3672 field, unless the underlying transport mechanism
3673 supports non-seven-bit data, as SMTP does not.
3674
3675 3. Recognize and interpret the Content-Type
3676 header field, and avoid showing users raw data
3677 with a Content-Type field other than text. Be
3678 able to send at least text/plain messages, with
3679 the character set specified as a parameter if it
3680 is not US-ASCII.
3681
3682 4. Explicitly handle the following Content-Type
3683 values, to at least the following extents:
3684
3685 Text:
3686 -- Recognize and display "text" mail
3687 with the character set "US-ASCII."
3688 -- Recognize other character sets at
3689 least to the extent of being able
3690 to inform the user about what
3691 character set the message uses.
3692 -- Recognize the "ISO-8859-*" character
3693 sets to the extent of being able to
3694 display those characters that are
3695 common to ISO-8859-* and US-ASCII,
3696 namely all characters represented
3697 by octet values 0-127.
3698 -- For unrecognized subtypes, show or
3699 offer to show the user the "raw"
3700 version of the data. An ability at
3701
3702
3703
3704 Borenstein & Freed [Page 56]
3705
3706
3707
3708
3709 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3710
3711
3712 least to convert "text/richtext" to
3713 plain text, as shown in Appendix D,
3714 is encouraged, but not required for
3715 conformance.
3716 Message:
3717 --Recognize and display at least the
3718 primary (822) encapsulation.
3719 Multipart:
3720 -- Recognize the primary (mixed)
3721 subtype. Display all relevant
3722 information on the message level
3723 and the body part header level and
3724 then display or offer to display
3725 each of the body parts
3726 individually.
3727 -- Recognize the "alternative" subtype,
3728 and avoid showing the user
3729 redundant parts of
3730 multipart/alternative mail.
3731 -- Treat any unrecognized subtypes as if
3732 they were "mixed".
3733 Application:
3734 -- Offer the ability to remove either of
3735 the two types of Content-Transfer-
3736 Encoding defined in this document
3737 and put the resulting information
3738 in a user file.
3739
3740 5. Upon encountering any unrecognized Content-
3741 Type, an implementation must treat it as if it had
3742 a Content-Type of "application/octet-stream" with
3743 no parameter sub-arguments. How such data are
3744 handled is up to an implementation, but likely
3745 options for handling such unrecognized data
3746 include offering the user to write it into a file
3747 (decoded from its mail transport format) or
3748 offering the user to name a program to which the
3749 decoded data should be passed as input.
3750 Unrecognized predefined types, which in a MIME-
3751 conformant mailer might still include audio,
3752 image, or video, should also be treated in this
3753 way.
3754
3755 A user agent that meets the above conditions is said to be
3756 MIME-conformant. The meaning of this phrase is that it is
3757 assumed to be "safe" to send virtually any kind of
3758 properly-marked data to users of such mail systems, because
3759 such systems will at least be able to treat the data as
3760 undifferentiated binary, and will not simply splash it onto
3761 the screen of unsuspecting users. There is another sense
3762 in which it is always "safe" to send data in a format that
3763 is MIME-conformant, which is that such data will not break
3764 or be broken by any known systems that are conformant with
3765 RFC 821 and RFC 822. User agents that are MIME-conformant
3766
3767
3768
3769 Borenstein & Freed [Page 57]
3770
3771
3772
3773
3774 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3775
3776
3777 have the additional guarantee that the user will not be
3778 shown data that were never intended to be viewed as text.
3779
3780
3781
3782
3783
3784
3785
3786
3787
3788
3789
3790
3791
3792
3793
3794
3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
3823
3824
3825
3826
3827
3828
3829
3830
3831
3832
3833
3834 Borenstein & Freed [Page 58]
3835
3836
3837
3838
3839 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3840
3841
3842 Appendix B -- General Guidelines For Sending Email Data
3843
3844 Internet email is not a perfect, homogeneous system. Mail
3845 may become corrupted at several stages in its travel to a
3846 final destination. Specifically, email sent throughout the
3847 Internet may travel across many networking technologies.
3848 Many networking and mail technologies do not support the
3849 full functionality possible in the SMTP transport
3850 environment. Mail traversing these systems is likely to be
3851 modified in such a way that it can be transported.
3852
3853 There exist many widely-deployed non-conformant MTAs in the
3854 Internet. These MTAs, speaking the SMTP protocol, alter
3855 messages on the fly to take advantage of the internal data
3856 structure of the hosts they are implemented on, or are just
3857 plain broken.
3858
3859 The following guidelines may be useful to anyone devising a
3860 data format (Content-Type) that will survive the widest
3861 range of networking technologies and known broken MTAs
3862 unscathed. Note that anything encoded in the base64
3863 encoding will satisfy these rules, but that some well-known
3864 mechanisms, notably the UNIX uuencode facility, will not.
3865 Note also that anything encoded in the Quoted-Printable
3866 encoding will survive most gateways intact, but possibly not
3867 some gateways to systems that use the EBCDIC character set.
3868
3869 (1) Under some circumstances the encoding used for
3870 data may change as part of normal gateway or user
3871 agent operation. In particular, conversion from
3872 base64 to quoted-printable and vice versa may be
3873 necessary. This may result in the confusion of
3874 CRLF sequences with line breaks in text body
3875 parts. As such, the persistence of CRLF as
3876 something other than a line break should not be
3877 relied on.
3878
3879 (2) Many systems may elect to represent and store
3880 text data using local newline conventions. Local
3881 newline conventions may not match the RFC822 CRLF
3882 convention -- systems are known that use plain CR,
3883 plain LF, CRLF, or counted records. The result is
3884 that isolated CR and LF characters are not well
3885 tolerated in general; they may be lost or
3886 converted to delimiters on some systems, and hence
3887 should not be relied on.
3888
3889 (3) TAB (HT) characters may be misinterpreted or
3890 may be automatically converted to variable numbers
3891 of spaces. This is unavoidable in some
3892 environments, notably those not based on the ASCII
3893 character set. Such conversion is STRONGLY
3894 DISCOURAGED, but it may occur, and mail formats
3895 should not rely on the persistence of TAB (HT)
3896
3897
3898
3899 Borenstein & Freed [Page 59]
3900
3901
3902
3903
3904 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3905
3906
3907 characters.
3908
3909 (4) Lines longer than 76 characters may be wrapped
3910 or truncated in some environments. Line wrapping
3911 and line truncation are STRONGLY DISCOURAGED, but
3912 unavoidable in some cases. Applications which
3913 require long lines should somehow differentiate
3914 between soft and hard line breaks. (A simple way
3915 to do this is to use the quoted-printable
3916 encoding.)
3917
3918 (5) Trailing "white space" characters (SPACE, TAB
3919 (HT)) on a line may be discarded by some transport
3920 agents, while other transport agents may pad lines
3921 with these characters so that all lines in a mail
3922 file are of equal length. The persistence of
3923 trailing white space, therefore, should not be
3924 relied on.
3925
3926 (6) Many mail domains use variations on the ASCII
3927 character set, or use character sets such as
3928 EBCDIC which contain most but not all of the US-
3929 ASCII characters. The correct translation of
3930 characters not in the "invariant" set cannot be
3931 depended on across character converting gateways.
3932 For example, this situation is a problem when
3933 sending uuencoded information across BITNET, an
3934 EBCDIC system. Similar problems can occur without
3935 crossing a gateway, since many Internet hosts use
3936 character sets other than ASCII internally. The
3937 definition of Printable Strings in X.400 adds
3938 further restrictions in certain special cases. In
3939 particular, the only characters that are known to
3940 be consistent across all gateways are the 73
3941 characters that correspond to the upper and lower
3942 case letters A-Z and a-z, the 10 digits 0-9, and
3943 the following eleven special characters:
3944
3945 "'" (ASCII code 39)
3946 "(" (ASCII code 40)
3947 ")" (ASCII code 41)
3948 "+" (ASCII code 43)
3949 "," (ASCII code 44)
3950 "-" (ASCII code 45)
3951 "." (ASCII code 46)
3952 "/" (ASCII code 47)
3953 ":" (ASCII code 58)
3954 "=" (ASCII code 61)
3955 "?" (ASCII code 63)
3956
3957 A maximally portable mail representation, such as
3958 the base64 encoding, will confine itself to
3959 relatively short lines of text in which the only
3960 meaningful characters are taken from this set of
3961
3962
3963
3964 Borenstein & Freed [Page 60]
3965
3966
3967
3968
3969 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
3970
3971
3972 73 characters.
3973
3974 Please note that the above list is NOT a list of recommended
3975 practices for MTAs. RFC 821 MTAs are prohibited from
3976 altering the character of white space or wrapping long
3977 lines. These BAD and illegal practices are known to occur
3978 on established networks, and implementions should be robust
3979 in dealing with the bad effects they can cause.
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
4006
4007
4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
4027
4028
4029 Borenstein & Freed [Page 61]
4030
4031
4032
4033
4034 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4035
4036
4037 Appendix C -- A Complex Multipart Example
4038
4039 What follows is the outline of a complex multipart message.
4040 This message has five parts to be displayed serially: two
4041 introductory plain text parts, an embedded multipart
4042 message, a richtext part, and a closing encapsulated text
4043 message in a non-ASCII character set. The embedded
4044 multipart message has two parts to be displayed in parallel,
4045 a picture and an audio fragment.
4046
4047 MIME-Version: 1.0
4048 From: Nathaniel Borenstein <nsb@bellcore.com>
4049 Subject: A multipart example
4050 Content-Type: multipart/mixed;
4051 boundary=unique-boundary-1
4052
4053 This is the preamble area of a multipart message.
4054 Mail readers that understand multipart format
4055 should ignore this preamble.
4056 If you are reading this text, you might want to
4057 consider changing to a mail reader that understands
4058 how to properly display multipart messages.
4059 --unique-boundary-1
4060
4061 ...Some text appears here...
4062 [Note that the preceding blank line means
4063 no header fields were given and this is text,
4064 with charset US ASCII. It could have been
4065 done with explicit typing as in the next part.]
4066
4067 --unique-boundary-1
4068 Content-type: text/plain; charset=US-ASCII
4069
4070 This could have been part of the previous part,
4071 but illustrates explicit versus implicit
4072 typing of body parts.
4073
4074 --unique-boundary-1
4075 Content-Type: multipart/parallel;
4076 boundary=unique-boundary-2
4077
4078
4079 --unique-boundary-2
4080 Content-Type: audio/basic
4081 Content-Transfer-Encoding: base64
4082
4083 ... base64-encoded 8000 Hz single-channel
4084 u-law-format audio data goes here....
4085
4086 --unique-boundary-2
4087 Content-Type: image/gif
4088 Content-Transfer-Encoding: Base64
4089
4090
4091
4092
4093
4094 Borenstein & Freed [Page 62]
4095
4096
4097
4098
4099 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4100
4101
4102 ... base64-encoded image data goes here....
4103
4104 --unique-boundary-2--
4105
4106 --unique-boundary-1
4107 Content-type: text/richtext
4108
4109 This is <bold><italic>richtext.</italic></bold>
4110 <nl><nl>Isn't it
4111 <bigger><bigger>cool?</bigger></bigger>
4112
4113 --unique-boundary-1
4114 Content-Type: message/rfc822
4115
4116 From: (name in US-ASCII)
4117 Subject: (subject in US-ASCII)
4118 Content-Type: Text/plain; charset=ISO-8859-1
4119 Content-Transfer-Encoding: Quoted-printable
4120
4121 ... Additional text in ISO-8859-1 goes here ...
4122
4123 --unique-boundary-1--
4124
4125
4126
4127
4128
4129
4130
4131
4132
4133
4134
4135
4136
4137
4138
4139
4140
4141
4142
4143
4144
4145
4146
4147
4148
4149
4150
4151
4152
4153
4154
4155
4156
4157
4158
4159 Borenstein & Freed [Page 63]
4160
4161
4162
4163
4164 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4165
4166
4167 Appendix D -- A Simple Richtext-to-Text Translator in C
4168
4169 One of the major goals in the design of the richtext subtype
4170 of the text Content-Type is to make formatted text so simple
4171 that even text-only mailers will implement richtext-to-
4172 plain-text translators, thus increasing the likelihood that
4173 multifont text will become "safe" to use very widely. To
4174 demonstrate this simplicity, what follows is an extremely
4175 simple 44-line C program that converts richtext input into
4176 plain text output:
4177
4178 #include <stdio.h>
4179 #include <ctype.h>
4180 main() {
4181 int c, i;
4182 char token[50];
4183
4184 while((c = getc(stdin)) != EOF) {
4185 if (c == '<') {
4186 for (i=0; (i<49 && (c = getc(stdin)) != '>'
4187 && c != EOF); ++i) {
4188 token[i] = isupper(c) ? tolower(c) : c;
4189 }
4190 if (c == EOF) break;
4191 if (c != '>') while ((c = getc(stdin)) !=
4192 '>'
4193 && c != EOF) {;}
4194 if (c == EOF) break;
4195 token[i] = '\0';
4196 if (!strcmp(token, "lt")) {
4197 putc('<', stdout);
4198 } else if (!strcmp(token, "nl")) {
4199 putc('\n', stdout);
4200 } else if (!strcmp(token, "/paragraph")) {
4201 fputs("\n\n", stdout);
4202 } else if (!strcmp(token, "comment")) {
4203 int commct=1;
4204 while (commct > 0) {
4205 while ((c = getc(stdin)) != '<'
4206 && c != EOF) ;
4207 if (c == EOF) break;
4208 for (i=0; (c = getc(stdin)) != '>'
4209 && c != EOF; ++i) {
4210 token[i] = isupper(c) ?
4211 tolower(c) : c;
4212 }
4213 if (c== EOF) break;
4214 token[i] = NULL;
4215 if (!strcmp(token, "/comment")) --
4216 commct;
4217 if (!strcmp(token, "comment"))
4218 ++commct;
4219
4220
4221
4222
4223
4224 Borenstein & Freed [Page 64]
4225
4226
4227
4228
4229 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4230
4231
4232 }
4233 } /* Ignore all other tokens */
4234 } else if (c != '\n') putc(c, stdout);
4235 }
4236 putc('\n', stdout); /* for good measure */
4237 }
4238 It should be noted that one can do considerably better than
4239 this in displaying richtext data on a dumb terminal. In
4240 particular, one can replace font information such as "bold"
4241 with textual emphasis (like *this* or _T_H_I_S_). One can
4242 also properly handle the richtext formatting commands
4243 regarding indentation, justification, and others. However,
4244 the above program is all that is necessary in order to
4245 present richtext on a dumb terminal.
4246
4247
4248
4249
4250
4251
4252
4253
4254
4255
4256
4257
4258
4259
4260
4261
4262
4263
4264
4265
4266
4267
4268
4269
4270
4271
4272
4273
4274
4275
4276
4277
4278
4279
4280
4281
4282
4283
4284
4285
4286
4287
4288
4289 Borenstein & Freed [Page 65]
4290
4291
4292
4293
4294 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4295
4296
4297 Appendix E -- Collected Grammar
4298
4299 This appendix contains the complete BNF grammar for all the
4300 syntax specified by this document.
4301
4302 By itself, however, this grammar is incomplete. It refers
4303 to several entities that are defined by RFC 822. Rather
4304 than reproduce those definitions here, and risk
4305 unintentional differences between the two, this document
4306 simply refers the reader to RFC 822 for the remaining
4307 definitions. Wherever a term is undefined, it refers to the
4308 RFC 822 definition.
4309
4310 attribute := token
4311
4312 body-part = <"message" as defined in RFC 822,
4313 with all header fields optional, and with the
4314 specified delimiter not occurring anywhere in
4315 the message body, either on a line by itself
4316 or as a substring anywhere.>
4317
4318 boundary := 0*69<bchars> bcharsnospace
4319
4320 bchars := bcharsnospace / " "
4321
4322 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" /
4323 "_"
4324 / "," / "-" / "." / "/" / ":" / "=" / "?"
4325
4326 close-delimiter := delimiter "--"
4327
4328 Content-Description := *text
4329
4330 Content-ID := msg-id
4331
4332 Content-Transfer-Encoding := "BASE64" / "QUOTED-
4333 PRINTABLE" /
4334 "8BIT" / "7BIT" /
4335 "BINARY" / x-token
4336
4337 Content-Type := type "/" subtype *[";" parameter]
4338
4339 delimiter := CRLF "--" boundary ; taken from Content-Type
4340 field.
4341 ; when content-type is
4342 multipart
4343 ; There should be no space
4344 ; between "--" and boundary.
4345
4346 encapsulation := delimiter CRLF body-part
4347
4348 epilogue := *text ; to be ignored upon
4349 receipt.
4350
4351
4352
4353
4354 Borenstein & Freed [Page 66]
4355
4356
4357
4358
4359 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4360
4361
4362 MIME-Version := 1*text
4363
4364 multipart-body := preamble 1*encapsulation close-delimiter
4365 epilogue
4366
4367 parameter := attribute "=" value
4368
4369 preamble := *text ; to be ignored upon
4370 receipt.
4371
4372 subtype := token
4373
4374 token := 1*<any CHAR except SPACE, CTLs, or tspecials>
4375
4376 tspecials := "(" / ")" / "<" / ">" / "@" ; Must be in
4377 / "," / ";" / ":" / "\" / <"> ; quoted-string,
4378 / "/" / "[" / "]" / "?" / "." ; to use within
4379 / "=" ; parameter values
4380
4381
4382 type := "application" / "audio" ; case-
4383 insensitive
4384 / "image" / "message"
4385 / "multipart" / "text"
4386 / "video" / x-token
4387
4388 value := token / quoted-string
4389
4390 x-token := <The two characters "X-" followed, with no
4391 intervening white space, by any token>
4392
4393
4394
4395
4396
4397
4398
4399
4400
4401
4402
4403
4404
4405
4406
4407
4408
4409
4410
4411
4412
4413
4414
4415
4416
4417
4418
4419 Borenstein & Freed [Page 67]
4420
4421
4422
4423
4424 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4425
4426
4427 Appendix F -- IANA Registration Procedures
4428
4429 MIME has been carefully designed to have extensible
4430 mechanisms, and it is expected that the set of content-
4431 type/subtype pairs and their associated parameters will grow
4432 significantly with time. Several other MIME fields, notably
4433 character set names, access-type parameters for the
4434 message/external-body type, conversions parameters for the
4435 application type, and possibly even Content-Transfer-
4436 Encoding values, are likely to have new values defined over
4437 time. In order to ensure that the set of such values is
4438 developed in an orderly, well-specified, and public manner,
4439 MIME defines a registration process which uses the Internet
4440 Assigned Numbers Authority (IANA) as a central registry for
4441 such values.
4442
4443 In general, parameters in the content-type header field are
4444 used to convey supplemental information for various content
4445 types, and their use is defined when the content-type and
4446 subtype are defined. New parameters should not be defined
4447 as a way to introduce new functionality.
4448
4449 In order to simplify and standardize the registration
4450 process, this appendix gives templates for the registration
4451 of new values with IANA. Each of these is given in the form
4452 of an email message template, to be filled in by the
4453 registering party.
4454
4455 F.1 Registration of New Content-type/subtype Values
4456
4457 Note that MIME is generally expected to be extended by
4458 subtypes. If a new fundamental top-level type is needed,
4459 its specification should be published as an RFC or
4460 submitted in a form suitable to become an RFC, and be
4461 subject to the Internet standards process.
4462
4463 To: IANA@isi.edu
4464 Subject: Registration of new MIME content-type/subtype
4465
4466 MIME type name:
4467
4468 (If the above is not an existing top-level MIME type,
4469 please explain why an existing type cannot be used.)
4470
4471 MIME subtype name:
4472
4473 Required parameters:
4474
4475 Optional parameters:
4476
4477 Encoding considerations:
4478
4479 Security considerations:
4480
4481
4482
4483
4484 Borenstein & Freed [Page 68]
4485
4486
4487
4488
4489 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4490
4491
4492 Published specification:
4493
4494 (The published specification must be an Internet RFC or
4495 RFC-to-be if a new top-level type is being defined, and
4496 must be a publicly available specification in any
4497 case.)
4498
4499 Person & email address to contact for further
4500 information:
4501 F.2 Registration of New Character Set Values
4502
4503 To: IANA@isi.edu
4504 Subject: Registration of new MIME character set value
4505
4506 MIME character set name:
4507
4508 Published specification:
4509
4510 (The published specification must be an Internet RFC or
4511 RFC-to-be or an international standard.)
4512
4513 Person & email address to contact for further
4514 information:
4515
4516 F.3 Registration of New Access-type Values for
4517 Message/external-body
4518
4519 To: IANA@isi.edu
4520 Subject: Registration of new MIME Access-type for
4521 Message/external-body content-type
4522
4523 MIME access-type name:
4524
4525 Required parameters:
4526
4527 Optional parameters:
4528
4529 Published specification:
4530
4531 (The published specification must be an Internet RFC or
4532 RFC-to-be.)
4533
4534 Person & email address to contact for further
4535 information:
4536
4537
4538 F.4 Registration of New Conversions Values for Application
4539
4540 To: IANA@isi.edu
4541 Subject: Registration of new MIME Conversions value
4542 for Application content-type
4543
4544 MIME Conversions name:
4545
4546
4547
4548
4549 Borenstein & Freed [Page 69]
4550
4551
4552
4553
4554 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4555
4556
4557 Published specification:
4558
4559 (The published specification must be an Internet RFC or
4560 RFC-to-be.)
4561
4562 Person & email address to contact for further
4563 information:
4564
4565
4566
4567
4568
4569
4570
4571
4572
4573
4574
4575
4576
4577
4578
4579
4580
4581
4582
4583
4584
4585
4586
4587
4588
4589
4590
4591
4592
4593
4594
4595
4596
4597
4598
4599
4600
4601
4602
4603
4604
4605
4606
4607
4608
4609
4610
4611
4612
4613
4614 Borenstein & Freed [Page 70]
4615
4616
4617
4618
4619 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4620
4621
4622 Appendix G -- Summary of the Seven Content-types
4623
4624 Content-type: text
4625
4626 Subtypes defined by this document: plain, richtext
4627
4628 Important Parameters: charset
4629
4630 Encoding notes: quoted-printable generally preferred if an
4631 encoding is needed and the character set is mostly an
4632 ASCII superset.
4633
4634 Security considerations: Rich text formats such as TeX and
4635 Troff often contain mechanisms for executing arbitrary
4636 commands or file system operations, and should not be
4637 used automatically unless these security problems have
4638 been addressed. Even plain text may contain control
4639 characters that can be used to exploit the capabilities
4640 of "intelligent" terminals and cause security
4641 violations. User interfaces designed to run on such
4642 terminals should be aware of and try to prevent such
4643 problems.
4644 ________________________________________________________________
4645
4646 Content-type: multipart
4647
4648 Subtypes defined by this document: mixed, alternative,
4649 digest, parallel.
4650
4651 Important Parameters: boundary
4652
4653 Encoding notes: No content-transfer-encoding is permitted.
4654
4655 ________________________________________________________________
4656
4657 Content-type: message
4658
4659 Subtypes defined by this document: rfc822, partial,
4660 external-body
4661
4662 Important Parameters: id, number, total
4663
4664 Encoding notes: No content-transfer-encoding is permitted.
4665
4666 ________________________________________________________________
4667
4668 Content-type: application
4669
4670 Subtypes defined by this document: octet-stream,
4671 postscript, oda
4672
4673 Important Parameters: profile
4674
4675
4676
4677
4678
4679 Borenstein & Freed [Page 71]
4680
4681
4682
4683
4684 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4685
4686
4687 Encoding notes: base64 generally preferred for octet-stream
4688 or other unreadable subtypes.
4689
4690 Security considerations: This type is intended for the
4691 transmission of data to be interpreted by locally-installed
4692 programs. If used, for example, to transmit executable
4693 binary programs or programs in general-purpose interpreted
4694 languages, such as LISP programs or shell scripts, severe
4695 security problems could result. In general, authors of
4696 mail-reading agents are cautioned against giving their
4697 systems the power to execute mail-based application data
4698 without carefully considering the security implications.
4699 While it is certainly possible to define safe application
4700 formats and even safe interpreters for unsafe formats, each
4701 interpreter should be evaluated separately for possible
4702 security problems.
4703 ________________________________________________________________
4704
4705 Content-type: image
4706
4707 Subtypes defined by this document: jpeg, gif
4708
4709 Important Parameters: none
4710
4711 Encoding notes: base64 generally preferred
4712
4713 ________________________________________________________________
4714
4715 Content-type: audio
4716
4717 Subtypes defined by this document: basic
4718
4719 Important Parameters: none
4720
4721 Encoding notes: base64 generally preferred
4722
4723 ________________________________________________________________
4724
4725 Content-type: video
4726
4727 Subtypes defined by this document: mpeg
4728
4729 Important Parameters: none
4730
4731 Encoding notes: base64 generally preferred
4732
4733
4734
4735
4736
4737
4738
4739
4740
4741
4742
4743
4744 Borenstein & Freed [Page 72]
4745
4746
4747
4748
4749 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4750
4751
4752 Appendix H -- Canonical Encoding Model
4753
4754
4755
4756 There was some confusion, in earlier drafts of this memo,
4757 regarding the model for when email data was to be converted
4758 to canonical form and encoded, and in particular how this
4759 process would affect the treatment of CRLFs, given that the
4760 representation of newlines varies greatly from system to
4761 system. For this reason, a canonical model for encoding is
4762 presented below.
4763
4764 The process of composing a MIME message part can be modelled
4765 as being done in a number of steps. Note that these steps
4766 are roughly similar to those steps used in RFC1113:
4767
4768 Step 1. Creation of local form.
4769
4770 The body part to be transmitted is created in the system's
4771 native format. The native character set is used, and where
4772 appropriate local end of line conventions are used as well.
4773 The may be a UNIX-style text file, or a Sun raster image, or
4774 a VMS indexed file, or audio data in a system-dependent
4775 format stored only in memory, or anything else that
4776 corresponds to the local model for the representation of
4777 some form of information.
4778
4779 Step 2. Conversion to canonical form.
4780
4781 The entire body part, including "out-of-band" information
4782 such as record lengths and possibly file attribute
4783 information, is converted to a universal canonical form.
4784 The specific content type of the body part as well as its
4785 associated attributes dictate the nature of the canonical
4786 form that is used. Conversion to the proper canonical form
4787 may involve character set conversion, transformation of
4788 audio data, compression, or various other operations
4789 specific to the various content types.
4790
4791 For example, in the case of text/plain data, the text must
4792 be converted to a supported character set and lines must be
4793 delimited with CRLF delimiters in accordance with RFC822.
4794 Note that the restriction on line lengths implied by RFC822
4795 is eliminated if the next step employs either quoted-
4796 printable or base64 encoding.
4797
4798 Step 3. Apply transfer encoding.
4799
4800 A Content-Transfer-Encoding appropriate for this body part
4801 is applied. Note that there is no fixed relationship
4802 between the content type and the transfer encoding. In
4803 particular, it may be appropriate to base the choice of
4804 base64 or quoted-printable on character frequency counts
4805 which are specific to a given instance of body part.
4806
4807
4808
4809 Borenstein & Freed [Page 73]
4810
4811
4812
4813
4814 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4815
4816
4817 Step 4. Insertion into message.
4818
4819 The encoded object is inserted into a MIME message with
4820 appropriate body part headers and boundary markers.
4821
4822 It is vital to note that these steps are only a model; they
4823 are specifically NOT a blueprint for how an actual system
4824 would be built. In particular, the model fails to account
4825 for two common designs:
4826
4827 1. In many cases the conversion to a canonical
4828 form prior to encoding will be subsumed into the
4829 encoder itself, which understands local formats
4830 directly. For example, the local newline
4831 convention for text bodyparts might be carried
4832 through to the encoder itself along with knowledge
4833 of what that format is.
4834
4835 2. The output of the encoders may have to pass
4836 through one or more additional steps prior to
4837 being transmitted as a message. As such, the
4838 output of the encoder may not be compliant with
4839 the formats specified by RFC822. In particular,
4840 once again it may be appropriate for the
4841 converter's output to be expressed using local
4842 newline conventions rather than using the standard
4843 RFC822 CRLF delimiters.
4844
4845 Other implementation variations are conceivable as well.
4846 The only important aspect of this discussion is that the
4847 resulting messages are consistent with those produced by the
4848 model described here.
4849
4850
4851
4852
4853
4854
4855
4856
4857
4858
4859
4860
4861
4862
4863
4864
4865
4866
4867
4868
4869
4870
4871
4872
4873
4874 Borenstein & Freed [Page 74]
4875
4876
4877
4878
4879 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4880
4881
4882 References
4883
4884 [US-ASCII] Coded Character Set--7-Bit American Standard Code
4885 for Information Interchange, ANSI X3.4-1986.
4886
4887 [ATK] Borenstein, Nathaniel S., Multimedia Applications
4888 Development with the Andrew Toolkit, Prentice-Hall, 1990.
4889
4890 [GIF] Graphics Interchange Format (Version 89a), Compuserve,
4891 Inc., Columbus, Ohio, 1990.
4892
4893 [ISO-2022] International Standard--Information Processing--
4894 ISO 7-bit and 8-bit coded character sets--Code extension
4895 techniques, ISO 2022:1986.
4896
4897 [ISO-8859] Information Processing -- 8-bit Single-Byte Coded
4898 Graphic Character Sets -- Part 1: Latin Alphabet No. 1, ISO
4899 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2,
4900 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part
4901 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5:
4902 Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6:
4903 Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7:
4904 Latin/Greek alphabet, ISO 8859-7, 1987. Part 8:
4905 Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin
4906 alphabet No. 5, ISO 8859-9, 1990.
4907
4908 [ISO-646] International Standard--Information Processing--
4909 ISO 7-bit coded character set for information interchange,
4910 ISO 646:1983.
4911
4912 [MPEG] Video Coding Draft Standard ISO 11172 CD, ISO
4913 IEC/TJC1/SC2/WG11 (Motion Picture Experts Group), May, 1991.
4914
4915 [ODA] ISO 8613; Information Processing: Text and Office
4916 System; Office Document Architecture (ODA) and Interchange
4917 Format (ODIF), Part 1-8, 1989.
4918
4919 [PCM] CCITT, Fascicle III.4 - Recommendation G.711, Geneva,
4920 1972, "Pulse Code Modulation (PCM) of Voice Frequencies".
4921
4922 [POSTSCRIPT] Adobe Systems, Inc., PostScript Language
4923 Reference Manual, Addison-Wesley, 1985.
4924
4925 [X400] Schicker, Pietro, "Message Handling Systems, X.400",
4926 Message Handling Systems and Distributed Applications, E.
4927 Stefferud, O-j. Jacobsen, and P. Schicker, eds., North-
4928 Holland, 1989, pp. 3-41.
4929
4930 [RFC-783] Sollins, K.R. TFTP Protocol (revision 2). June,
4931 1981, MIT, RFC-783.
4932
4933 [RFC-821] Postel, J.B. Simple Mail Transfer Protocol.
4934 August, 1982, USC/Information Sciences Institute, RFC-821.
4935
4936
4937
4938
4939 Borenstein & Freed [Page 75]
4940
4941
4942
4943
4944 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
4945
4946
4947 [RFC-822] Crocker, D. Standard for the format of ARPA
4948 Internet text messages. August, 1982, UDEL, RFC-822.
4949
4950 [RFC-934] Rose, M.T.; Stefferud, E.A. Proposed standard
4951 for message encapsulation. January, 1985, Delaware
4952 and NMA, RFC-934.
4953
4954 [RFC-959] Postel, J.B.; Reynolds, J.K. File Transfer
4955 Protocol. October, 1985, USC/Information Sciences
4956 Institute, RFC-959.
4957
4958 [RFC-1049] Sirbu, M.A. Content-Type header field for
4959 Internet messages. March, 1988, CMU, RFC-1049.
4960
4961 [RFC-1113] Linn, J. Privacy enhancement for Internet
4962 electronic mail: Part I - message encipherment and
4963 authentication procedures. August, 1989, IAB Privacy Task
4964 Force, RFC-1113.
4965
4966 [RFC-1154] Robinson, D.; Ullmann, R. Encoding header field
4967 for Internet messages. April, 1990, Prime Computer,
4968 Inc., RFC-1154.
4969
4970 [RFC-1342] Moore, Keith, Representation of Non-Ascii Text in
4971 Internet Message Headers. June, 1992, University of
4972 Tennessee, RFC-1342.
4973
4974 Security Considerations
4975
4976 Security issues are discussed in Section 7.4.2 and in
4977 Appendix G. Implementors should pay special attention to
4978 the security implications of any mail content-types that can
4979 cause the remote execution of any actions in the recipient's
4980 environment. In such cases, the discussion of the
4981 applicaton/postscript content-type in Section 7.4.2 may
4982 serve as a model for considering other content-types with
4983 remote execution capabilities.
4984
4985
4986
4987
4988
4989
4990
4991
4992
4993
4994
4995
4996
4997
4998
4999
5000
5001
5002
5003
5004 Borenstein & Freed [Page 76]
5005
5006
5007
5008
5009 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
5010
5011
5012 Authors' Addresses
5013
5014 For more information, the authors of this document may be
5015 contacted via Internet mail:
5016
5017 Nathaniel S. Borenstein
5018 MRE 2D-296, Bellcore
5019 445 South St.
5020 Morristown, NJ 07962-1910
5021
5022 Phone: +1 201 829 4270
5023 Fax: +1 201 829 7019
5024 Email: nsb@bellcore.com
5025
5026
5027 Ned Freed
5028 Innosoft International, Inc.
5029 250 West First Street
5030 Suite 240
5031 Claremont, CA 91711
5032
5033 Phone: +1 714 624 7907
5034 Fax: +1 714 621 5319
5035 Email: ned@innosoft.com
5036
5037
5038
5039
5040
5041
5042
5043
5044
5045
5046
5047
5048
5049
5050
5051
5052
5053
5054
5055
5056
5057
5058
5059
5060
5061
5062
5063
5064
5065
5066
5067
5068
5069 Borenstein & Freed [Page 77]
5070
5071
5072
5073
5074 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
5075
5076
5077
5078
5079
5080 THIS PAGE INTENTIONALLY LEFT BLANK.
5081
5082 Please discard this page and place the following table of
5083 contents after the title page.
5084
5085
5086
5087
5088
5089
5090
5091
5092
5093
5094
5095
5096
5097
5098
5099
5100
5101
5102
5103
5104
5105
5106
5107
5108
5109
5110
5111
5112
5113
5114
5115
5116
5117
5118
5119
5120
5121
5122
5123
5124
5125
5126
5127
5128
5129
5130
5131
5132
5133
5134 Borenstein & Freed [Page i]
5135
5136
5137
5138
5139
5140
5141
5142
5143 Table of Contents
5144
5145
5146 1 Introduction....................................... 1
5147 2 Notations, Conventions, and Generic BNF Grammar.... 3
5148 3 The MIME-Version Header Field...................... 5
5149 4 The Content-Type Header Field...................... 6
5150 5 The Content-Transfer-Encoding Header Field......... 10
5151 5.1 Quoted-Printable Content-Transfer-Encoding......... 14
5152 5.2 Base64 Content-Transfer-Encoding................... 17
5153 6 Additional Optional Content- Header Fields......... 19
5154 6.1 Optional Content-ID Header Field................... 19
5155 6.2 Optional Content-Description Header Field.......... 19
5156 7 The Predefined Content-Type Values................. 20
5157 7.1 The Text Content-Type.............................. 20
5158 7.1.1 The charset parameter.............................. 20
5159 7.1.2 The Text/plain subtype............................. 23
5160 7.1.3 The Text/richtext subtype.......................... 23
5161 7.2 The Multipart Content-Type......................... 29
5162 7.2.1 Multipart: The common syntax...................... 30
5163 7.2.2 The Multipart/mixed (primary) subtype.............. 34
5164 7.2.3 The Multipart/alternative subtype.................. 34
5165 7.2.4 The Multipart/digest subtype....................... 36
5166 7.2.5 The Multipart/parallel subtype..................... 36
5167 7.3 The Message Content-Type........................... 37
5168 7.3.1 The Message/rfc822 (primary) subtype............... 37
5169 7.3.2 The Message/Partial subtype........................ 37
5170 7.3.3 The Message/External-Body subtype.................. 40
5171 7.4 The Application Content-Type....................... 46
5172 7.4.1 The Application/Octet-Stream (primary) subtype..... 46
5173 7.4.2 The Application/PostScript subtype................. 47
5174 7.4.3 The Application/ODA subtype........................ 50
5175 7.5 The Image Content-Type............................. 51
5176 7.6 The Audio Content-Type............................. 51
5177 7.7 The Video Content-Type............................. 51
5178 7.8 Experimental Content-Type Values................... 51
5179 Summary............................................ 53
5180 Acknowledgements................................... 54
5181 Appendix A -- Minimal MIME-Conformance............. 56
5182 Appendix B -- General Guidelines For Sending Email Data59
5183 Appendix C -- A Complex Multipart Example.......... 62
5184 Appendix D -- A Simple Richtext-to-Text Translator in C64
5185 Appendix E -- Collected Grammar.................... 66
5186 Appendix F -- IANA Registration Procedures......... 68
5187 F.1 Registration of New Content-type/subtype Values..68
5188 F.2 Registration of New Character Set Values...... 69
5189 F.3 Registration of New Access-type Values for Message/external-body69
5190 F.4 Registration of New Conversions Values for Application69
5191 Appendix G -- Summary of the Seven Content-types... 71
5192 Appendix H -- Canonical Encoding Model............. 73
5193 References......................................... 75
5194 Security Considerations............................ 76
5195 Authors' Addresses................................. 77
5196
5197
5198
5199 Borenstein & Freed [Page ii]
5200
5201
5202
5203
5204
5205
5206
5207
5208
5209
5210
5211
5212
5213
5214
5215
5216
5217
5218
5219
5220
5221
5222
5223
5224
5225
5226
5227
5228
5229
5230
5231
5232
5233
5234
5235
5236
5237
5238
5239
5240
5241
5242
5243
5244
5245
5246
5247
5248
5249
5250
5251
5252
5253
5254
5255
5256
5257
5258
5259
5260
5261
5262
5263
5264 Borenstein & Freed [Page iii]
5265