Parse an email for e-discovery
Goal: turn raw email into the structured fields an e-discovery or investigation pipeline indexes — who sent it, to whom, when, what it said, and header forensics.
kaos-source parses .eml, .mbox, vCard, PACER, and more into typed records. Operating
on raw bytes here, so it’s fully offline.
uv run examples/email-forensics.py#!/usr/bin/env -S uv run --script# /// script# requires-python = ">=3.13"# dependencies = ["kaos-source>=0.1.3,<0.2"]# ///"""Parse an email into structured fields — the basis of e-discovery.
`kaos-source` parses `.eml` / `.mbox` / vCard / PACER and other formats intotyped records. Here we parse a raw email into sender, recipients, subject, body,and header forensics — the structured shape an e-discovery or investigationpipeline indexes. Operating on raw bytes, so it's fully offline.
Run it:
uv run examples/email-forensics.py"""
from __future__ import annotations
from kaos_source.parsers import EmlParser
RAW_EML = b"""\From: counsel@example.comTo: client@example.comCc: paralegal@example.comSubject: Re: Merger AgreementDate: Tue, 1 Apr 2026 10:00:00 -0400
The deal terms are confidential. Please review the indemnification clausebefore our call."""
def main() -> dict: parsed = EmlParser().parse(RAW_EML)
record = { "from": parsed.from_address.address, "to": [a.address for a in parsed.to_addresses], "cc": [a.address for a in parsed.cc_addresses], "subject": parsed.subject, "body": parsed.body_text.strip(), } print(f" from: {record['from']}") print(f" to: {record['to']}") print(f" cc: {record['cc']}") print(f" subject: {record['subject']}") print(f" body: {record['body'][:60]}...") print(f" header forensics: {type(parsed.forensics).__name__}") return record
if __name__ == "__main__": record = main() assert record["from"] == "counsel@example.com" assert record["to"] == ["client@example.com"] assert record["cc"] == ["paralegal@example.com"] assert record["subject"] == "Re: Merger Agreement" assert "indemnification clause" in record["body"]Notes
EmlParser().parse(bytes)returns aParsedEmailwith typedEmailAddresslists, subject, dates,body_text/body_html, attachments, andforensics(header analysis).kaos-sourceis the on-ramp for real data: filesystem and archive discovery,materializeto the artifact store, plus REST connectors for EDGAR, Federal Register, eCFR, GovInfo, and GLEIF (those hit live APIs; the filesystem, archive, and parser surfaces run offline).- Pair it with SQL analytics to build a searchable database from a mailbox, or feed bodies into near-duplicate detection to cluster threads.