Your first agent
This is what KAOS is for. kaos-agents gives you a stateful agent with memory,
patterns, permissions, and cost accounting. Here you build a chat agent and run two
turns — and watch it remember the first when answering the second.
It runs offline, with no API key, using the FunctionClient seam you already know.
Run it
Section titled “Run it”uv run examples/first-agent.pyturn 1: Offline reply — I can see 1 user message(s) of history.turn 2: Offline reply — I can see 2 user message(s) of history.cost (offline): $0.0000Turn 2 sees two messages of history — the agent remembered turn 1.
The code
Section titled “The code”#!/usr/bin/env -S uv run --script# /// script# requires-python = ">=3.13"# dependencies = [# "kaos-agents>=0.1.28,<0.2",# "kaos-llm-client>=0.1.9,<0.2",# "kaos-llm-core>=0.1.12,<0.2",# ]# ///"""Your first KAOS agent — a stateful chat agent that remembers, run offline.
`kaos-agents` splits an agent into two parts: an `Agent` is frozen *config*(instructions, model, pattern, tools); a `Runner` is the *engine* that drivesit. State doesn't live on the agent — it lives in session memory, hydrated froma virtual filesystem each turn. That's why the agent below remembers turn 1 whenit answers turn 2.
Offline note: a real agent calls an LLM provider. To run this for free with noAPI key, we substitute the model factory with a deterministic `FunctionClient`— the same technique the kaos-agents test suite uses. Set `KAOS_LEARN_LIVE=1`(and `ANTHROPIC_API_KEY`) to drop the substitution and use a real model.
Run it (offline, no key):
uv run examples/first-agent.py"""
from __future__ import annotations
import asyncioimport contextlibimport jsonimport os
from kaos_agents.config import Agent, AgentPatternfrom kaos_agents.runtime.runner import Runnerfrom kaos_core import KaosRuntime
def _fake_model(messages: list[dict], profile): """A deterministic stand-in for an LLM provider.
The chat loop makes two kinds of structured calls — intent classification and the response — so we return the right shape for each. The response reports how many remembered turns it can see, which makes session memory visible in the output. """ from kaos_llm_client.types import ContentPart, ProviderResponse
blob = " ".join(str(m.get("content", "")) for m in messages) if "reasoning" in blob.lower() and "intent" in blob.lower(): payload = {"intent": "respond", "confidence": 0.95, "reasoning": "a direct question"} else: remembered = blob.count("user:") # user turns visible in the prompt's history payload = {"response": f"Offline reply — I can see {remembered} user message(s) of history."} return ProviderResponse( provider="function", model="function-test", raw={}, parts=[ContentPart(type="text", text=json.dumps(payload))], )
@contextlib.contextmanagerdef offline_model(): """Swap the LLM factory for a FunctionClient unless running live.""" if os.environ.get("KAOS_LEARN_LIVE"): yield "anthropic:claude-haiku-4-5" # real model id; needs a key return from unittest.mock import patch
from kaos_llm_client.providers.function import FunctionClient
fc = FunctionClient(function=_fake_model) with ( patch("kaos_llm_core.programs.call.create_client", return_value=fc), patch("kaos_llm_client.create_client", return_value=fc), ): yield "function-test"
async def main() -> list[str]: # test_mode() = in-memory, isolated VFS. Critical: with the default disk VFS, # session memory would leak across runs and tests would false-green. runtime = KaosRuntime.test_mode()
with offline_model() as model: agent = Agent(instructions="Be brief.", model=model, pattern=AgentPattern.CHAT) runner = Runner(agent, runtime=runtime)
# Same session id across both turns -> the agent remembers turn 1. r1 = await runner.turn("Hello!", "demo-session") r2 = await runner.turn("What did I just say?", "demo-session")
print(f"turn 1: {r1.text}") print(f"turn 2: {r2.text}") print(f"cost (offline): ${r1.cost_usd:.4f}") return [r1.text, r2.text]
if __name__ == "__main__": replies = asyncio.run(main()) if not os.environ.get("KAOS_LEARN_LIVE"): # Memory persists: turn 2 sees more history than turn 1. assert "1 user message" in replies[0], replies[0] assert "2 user message" in replies[1], replies[1]What to notice
Section titled “What to notice”- Agent vs Runner. An
Agentis frozen config (instructions, model, pattern, tools). ARunneris the engine. The agent holds no state — splitting them is what makes agents reconstructable and safe to share. - Memory lives in the session, not the agent. Both turns use the same session id, so the second turn sees the first. State is hydrated from a virtual filesystem each turn — which is why the same agent can serve many sessions.
KaosRuntime.test_mode()is mandatory offline. It gives an in-memory, isolated VFS. With the default disk VFS, session memory would leak across runs and your tests would falsely pass — a real footgun the docs call out.- Cost is first-class.
response.cost_usdandresponse.total_tokensare on every result. Offline they’re$0.0000; live they’re the real spend.
About the offline substitution
Section titled “About the offline substitution”A real agent calls an LLM. To run this for free, the example swaps the model factory for
a FunctionClient inside offline_model() — the same technique the kaos-agents test
suite uses. Set KAOS_LEARN_LIVE=1 and an ANTHROPIC_API_KEY to drop the swap and use
a real model; the agent code is identical.
You have an agent that remembers. Next, make its answers trustworthy.
Grounded citations → The trust mechanism: every claim carries a span that is checked against its source.