Your first agent

This is what KAOS is for. kaos-agents gives you a stateful agent with memory, patterns, permissions, and cost accounting. Here you build a chat agent and run two turns — and watch it remember the first when answering the second.

It runs offline, with no API key, using the FunctionClient seam you already know.

Run it

uv run examples/first-agent.py

turn 1: Offline reply — I can see 1 user message(s) of history.
turn 2: Offline reply — I can see 2 user message(s) of history.
cost (offline): $0.0000

Turn 2 sees two messages of history — the agent remembered turn 1.

The code

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.13"
# dependencies = [
#   "kaos-agents>=0.1.28,<0.2",
#   "kaos-llm-client>=0.1.9,<0.2",
#   "kaos-llm-core>=0.1.12,<0.2",
# ]
# ///
"""Your first KAOS agent — a stateful chat agent that remembers, run offline.

`kaos-agents` splits an agent into two parts: an `Agent` is frozen *config*
(instructions, model, pattern, tools); a `Runner` is the *engine* that drives
it. State doesn't live on the agent — it lives in session memory, hydrated from
a virtual filesystem each turn. That's why the agent below remembers turn 1 when
it answers turn 2.

Offline note: a real agent calls an LLM provider. To run this for free with no
API key, we substitute the model factory with a deterministic `FunctionClient`
— the same technique the kaos-agents test suite uses. Set `KAOS_LEARN_LIVE=1`
(and `ANTHROPIC_API_KEY`) to drop the substitution and use a real model.

Run it (offline, no key):

    uv run examples/first-agent.py
"""

from __future__ import annotations

import asyncio
import contextlib
import json
import os

from kaos_agents.config import Agent, AgentPattern
from kaos_agents.runtime.runner import Runner
from kaos_core import KaosRuntime


def _fake_model(messages: list[dict], profile):
    """A deterministic stand-in for an LLM provider.

    The chat loop makes two kinds of structured calls — intent classification
    and the response — so we return the right shape for each. The response
    reports how many remembered turns it can see, which makes session memory
    visible in the output.
    """
    from kaos_llm_client.types import ContentPart, ProviderResponse

    blob = " ".join(str(m.get("content", "")) for m in messages)
    if "reasoning" in blob.lower() and "intent" in blob.lower():
        payload = {"intent": "respond", "confidence": 0.95, "reasoning": "a direct question"}
    else:
        remembered = blob.count("user:")  # user turns visible in the prompt's history
        payload = {"response": f"Offline reply — I can see {remembered} user message(s) of history."}
    return ProviderResponse(
        provider="function",
        model="function-test",
        raw={},
        parts=[ContentPart(type="text", text=json.dumps(payload))],
    )


@contextlib.contextmanager
def offline_model():
    """Swap the LLM factory for a FunctionClient unless running live."""
    if os.environ.get("KAOS_LEARN_LIVE"):
        yield "anthropic:claude-haiku-4-5"  # real model id; needs a key
        return
    from unittest.mock import patch

    from kaos_llm_client.providers.function import FunctionClient

    fc = FunctionClient(function=_fake_model)
    with (
        patch("kaos_llm_core.programs.call.create_client", return_value=fc),
        patch("kaos_llm_client.create_client", return_value=fc),
    ):
        yield "function-test"


async def main() -> list[str]:
    # test_mode() = in-memory, isolated VFS. Critical: with the default disk VFS,
    # session memory would leak across runs and tests would false-green.
    runtime = KaosRuntime.test_mode()

    with offline_model() as model:
        agent = Agent(instructions="Be brief.", model=model, pattern=AgentPattern.CHAT)
        runner = Runner(agent, runtime=runtime)

        # Same session id across both turns -> the agent remembers turn 1.
        r1 = await runner.turn("Hello!", "demo-session")
        r2 = await runner.turn("What did I just say?", "demo-session")

    print(f"turn 1: {r1.text}")
    print(f"turn 2: {r2.text}")
    print(f"cost (offline): ${r1.cost_usd:.4f}")
    return [r1.text, r2.text]


if __name__ == "__main__":
    replies = asyncio.run(main())
    if not os.environ.get("KAOS_LEARN_LIVE"):
        # Memory persists: turn 2 sees more history than turn 1.
        assert "1 user message" in replies[0], replies[0]
        assert "2 user message" in replies[1], replies[1]

What to notice

Agent vs Runner. An Agent is frozen config (instructions, model, pattern, tools). A Runner is the engine. The agent holds no state — splitting them is what makes agents reconstructable and safe to share.
Memory lives in the session, not the agent. Both turns use the same session id, so the second turn sees the first. State is hydrated from a virtual filesystem each turn — which is why the same agent can serve many sessions.
KaosRuntime.test_mode() is mandatory offline. It gives an in-memory, isolated VFS. With the default disk VFS, session memory would leak across runs and your tests would falsely pass — a real footgun the docs call out.
Cost is first-class. response.cost_usd and response.total_tokens are on every result. Offline they’re $0.0000; live they’re the real spend.

About the offline substitution

A real agent calls an LLM. To run this for free, the example swaps the model factory for a FunctionClient inside offline_model() — the same technique the kaos-agents test suite uses. Set KAOS_LEARN_LIVE=1 and an ANTHROPIC_API_KEY to drop the swap and use a real model; the agent code is identical.

You have an agent that remembers. Next, make its answers trustworthy.

Grounded citations → The trust mechanism: every claim carries a span that is checked against its source.