Build a document
kaos-content is KAOS’s document model: one Block/Inline AST with provenance.
Every extractor — PDF, DOCX, web, spreadsheets — produces this same shape, so the
rest of the stack (search, LLM programs, agents, citations) works against one
structure instead of a dozen formats. Here you build one by hand to see the model.
Fully offline and deterministic.
Run it
Section titled “Run it”uv run examples/build-a-document.py--- Markdown ---# Lease Agreement
This lease is between **the parties**.
## Rent
Tenant shall pay rent [monthly](https://example.com/terms).
--- Outline (depth | text | block ref) --- h1 'Lease Agreement' #/body/0 h2 'Rent' #/body/2The code
Section titled “The code”#!/usr/bin/env -S uv run --script# /// script# requires-python = ">=3.13"# dependencies = ["kaos-content>=0.1.6,<0.2"]# ///"""Build a structured document, serialize it, and address a block.
`kaos-content` is KAOS's document model: one Block/Inline AST that everyextractor (PDF, DOCX, web, ...) produces, so the rest of the stack workson a single shape. Here we build one by hand with `DocumentBuilder`,render it to Markdown, and read its outline — including each heading'sstable **block reference** (`ref`), which is how citations point at aprecise location in a document.
Fully offline and deterministic — no key, no network.
Run it:
uv run examples/build-a-document.py"""
from __future__ import annotations
import kaos_content as kc
def build() -> kc.ContentDocument: b = kc.DocumentBuilder() b.heading(1, "Lease Agreement") b.paragraph("This lease is between ", kc.DocumentBuilder.bold("the parties"), ".") b.heading(2, "Rent") b.paragraph( "Tenant shall pay rent ", kc.DocumentBuilder.link("monthly", "https://example.com/terms"), ".", ) return b.build()
def main() -> tuple[str, list]: doc = build()
# 1. Serialize the AST to Markdown. markdown = kc.serialize_markdown(doc) print("--- Markdown ---") print(markdown)
# 2. Read the outline. Each entry has a stable `ref` — the block # reference a citation would point at. outline = kc.document_outline(doc) print("--- Outline (depth | text | block ref) ---") for h in outline: print(f" h{h['depth']} {h['text']!r:<22} {h['ref']}")
return markdown, outline
if __name__ == "__main__": markdown, outline = main() # Deterministic self-checks: structure and the block refs are stable. assert "# Lease Agreement" in markdown assert "**the parties**" in markdown assert "[monthly](https://example.com/terms)" in markdown assert [(h["depth"], h["text"], h["ref"]) for h in outline] == [ (1, "Lease Agreement", "#/body/0"), (2, "Rent", "#/body/2"), ]What to notice
Section titled “What to notice”DocumentBuilderassembles blocks (heading,paragraph) and inlines (bold,link)..build()returns an immutableContentDocument.serialize_markdownrenders the AST. There are serializers for HTML, text, JSON, CSV, and more — the AST is the single source, the format is a choice.- Block references (
#/body/0) are stable addresses into the document. This is how a citation points at exactly where a fact came from — the foundation of the grounded-citation work you’ll see in the agent tutorials.
You can make tools and documents. Now bring in an LLM — without needing an API key.
Offline LLM with FunctionClient → Bring an LLM into the picture with no API key — the deterministic seam this whole site runs on.