Skip to main content

EML File Format Specification for Email Journalling

File StructureAn EML file consists of two main parts:1. Headers Section Headers come first and contain metadata about the message. Each he...

E
Written by Evie Lynch
Updated this week

File Structure

An EML file consists of two main parts:

1. Headers Section

Headers come first and contain metadata about the message. Each header follows this format:

Header-Name: Header-Value

Standard headers include:

  • From: Sender's email address (e.g., From: <[email protected]>)

  • To: Recipient's email address

  • Date: Timestamp in RFC format (e.g., Thu, 8 Mar 2018 10:43:37 +0100)

  • Subject: Email subject line

  • Content-Type: Defines how to interpret the body content

  • Content-Transfer-Encoding: Specifies encoding method (commonly base64 for attachments)

2. Message Body

The body starts after a blank line following the headers. It can contain:

  • Plain text

  • HTML content

  • Embedded images

  • File attachments

  • Multiple parts separated by boundaries

Content Types

The Content-Type header tells you what you're dealing with - like a file extension but more reliable:

Type

Purpose

Common Subtypes

text

Human-readable content

text/plain, text/html

image

Pictures (not videos)

image/png, image/jpg, image/gif

audio

Sound files

audio/wav, audio/mp3

application

Binary data/documents

application/pdf, application/octet-stream

For a full list of possible Content-Types see: https://www.iana.org/assignments/media-types/media-types.xhtml

Handling Attachments

Attachments are identified by specific header combinations:

Content-Type: application/pdf; name="report.pdf"
Content-Disposition: attachment; filename="report.pdf"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_abc123[Base64 encoded file content follows here]

The Content-Disposition: attachment is your signal that what follows is a file attachment, not inline content.

Multipart Messages

When an email contains multiple sections (like text + HTML + attachments), it uses boundaries - think of them as dividers in a filing cabinet:

Content-Type: multipart/mixed; boundary="boundary123"--boundary123
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inlineThis is the plain text part
--boundary123
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline<p>This is the HTML part</p>
--boundary123--

Processing Considerations

  1. Encoding: Most attachments use base64 encoding (indicated by the Content-Transfer-Encoding header). You'll need to decode this to get the actual file data.

  2. Character Sets: Watch for charset parameters in Content-Type headers (e.g., charset="UTF-8").

  3. Line Endings: EML files use CRLF (\\r\\n) for line breaks per RFC specification.

  4. File Size: Since attachments are base64-encoded, they're roughly 33% larger than the original files in the EML.

Did this answer your question?