Offline LLM with FunctionClient

Before you call a real model, learn the seam that makes every LLM example here runnable offline, for free, with no API key — and tested in CI.

kaos-llm-client ships a FunctionClient: a provider client that runs a Python callable instead of making an HTTP request, while satisfying the exact same interface as the real Anthropic / OpenAI / Google clients. Your code calls client.chat(...) either way — only the model is faked. That means you exercise the real code path with a deterministic, free, instant “model”.

Run it

uv run examples/functionclient-chat.py

model said: 'FAKE MODEL SAYS: HELLO, KAOS'
calls recorded: 1

The code

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.13"
# dependencies = ["kaos-llm-client>=0.1.9,<0.2"]
# ///
"""The offline LLM seam: a deterministic fake model via FunctionClient.

Every LLM example on this site runs for free in CI with no API key. The
trick is `FunctionClient` — a provider client that runs a Python callable
instead of making an HTTP request, while satisfying the *same* interface
as the real Anthropic / OpenAI / Google clients. So the same code path
(`client.chat(...)`) is exercised; only the model is faked.

By default this runs offline. Set `KAOS_LEARN_LIVE=1` (and an API key) to
hit a real provider instead — the rest of the program is identical.

Run it (offline, no key):

    uv run examples/functionclient-chat.py

Run it live:

    KAOS_LEARN_LIVE=1 ANTHROPIC_API_KEY=sk-... uv run examples/functionclient-chat.py
"""

from __future__ import annotations

import os

from kaos_llm_client.providers.function import FunctionClient
from kaos_llm_client.types import ContentPart, ProviderResponse


def fake_model(messages: list[dict], profile) -> ProviderResponse:
    """A deterministic 'model': echo the user's text back, uppercased.

    A real provider would return a generated completion here; for tests we
    return exactly what we want so assertions are stable.
    """
    user_text = messages[-1]["content"]
    return ProviderResponse(
        provider="function",
        model="function-test",
        raw={},
        parts=[ContentPart(type="text", text=f"FAKE MODEL SAYS: {user_text.upper()}")],
    )


def make_client():
    """Offline by default; a real client when KAOS_LEARN_LIVE is set."""
    if os.environ.get("KAOS_LEARN_LIVE"):
        from kaos_llm_client import create_client

        # Anthropic Haiku is the documented default for live examples.
        return create_client("anthropic:claude-haiku-4-5")
    return FunctionClient(function=fake_model)


def main() -> str:
    client = make_client()
    response = client.chat([{"role": "user", "content": "hello, kaos"}])
    print(f"model said: {response.text!r}")
    # The client records every call — handy for asserting what was sent.
    if isinstance(client, FunctionClient):
        print(f"calls recorded: {len(client.call_history)}")
    return response.text


if __name__ == "__main__":
    text = main()
    if not os.environ.get("KAOS_LEARN_LIVE"):
        # Offline path is fully deterministic.
        assert text == "FAKE MODEL SAYS: HELLO, KAOS", text

What to notice

fake_model(messages, profile) is the whole “model” — it receives the chat messages and returns a ProviderResponse. Return whatever you want; assertions become deterministic.
FunctionClient(function=fake_model) is a drop-in for a real client. The same client.chat([...]) call works against both.
make_client() is the swap point: offline by default, real provider when KAOS_LEARN_LIVE=1 is set. Every example on this site uses this pattern — that’s how the docs stay tested without burning tokens.
client.call_history records exactly what was sent, which is invaluable for asserting your prompts.

You have the offline seam. Now make a model call with it.

Your first model call → Make a “real” model call — the same code, with the live provider path explained.

Offline LLM with FunctionClient

Run it

The code

What to notice

Next