File Structure
An EML file consists of two main parts:
1. Headers Section
Headers come first and contain metadata about the message. Each header follows this format:
Header-Name: Header-Value
Standard headers include:
From:Sender's email address (e.g.,From: <[email protected]>)To:Recipient's email addressDate:Timestamp in RFC format (e.g.,Thu, 8 Mar 2018 10:43:37 +0100)Subject:Email subject lineContent-Type:Defines how to interpret the body contentContent-Transfer-Encoding:Specifies encoding method (commonly base64 for attachments)
2. Message Body
The body starts after a blank line following the headers. It can contain:
Plain text
HTML content
Embedded images
File attachments
Multiple parts separated by boundaries
Content Types
The Content-Type header tells you what you're dealing with - like a file extension but more reliable:
Type | Purpose | Common Subtypes |
| Human-readable content |
|
| Pictures (not videos) |
|
| Sound files |
|
| Binary data/documents |
|
For a full list of possible Content-Types see: https://www.iana.org/assignments/media-types/media-types.xhtml
Handling Attachments
Attachments are identified by specific header combinations:
Content-Type: application/pdf; name="report.pdf" Content-Disposition: attachment; filename="report.pdf" Content-Transfer-Encoding: base64 X-Attachment-Id: f_abc123[Base64 encoded file content follows here]
The Content-Disposition: attachment is your signal that what follows is a file attachment, not inline content.
Multipart Messages
When an email contains multiple sections (like text + HTML + attachments), it uses boundaries - think of them as dividers in a filing cabinet:
Content-Type: multipart/mixed; boundary="boundary123"--boundary123 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Content-Disposition: inlineThis is the plain text part --boundary123 Content-Type: text/html Content-Transfer-Encoding: quoted-printable Content-Disposition: inline<p>This is the HTML part</p> --boundary123--
Processing Considerations
Encoding: Most attachments use base64 encoding (indicated by the
Content-Transfer-Encodingheader). You'll need to decode this to get the actual file data.Character Sets: Watch for
charsetparameters in Content-Type headers (e.g.,charset="UTF-8").Line Endings: EML files use CRLF (
\\r\\n) for line breaks per RFC specification.File Size: Since attachments are base64-encoded, they're roughly 33% larger than the original files in the EML.