Offline LLM with FunctionClient
Before you call a real model, learn the seam that makes every LLM example here runnable offline, for free, with no API key — and tested in CI.
kaos-llm-client ships a FunctionClient: a provider client that runs a Python
callable instead of making an HTTP request, while satisfying the exact same
interface as the real Anthropic / OpenAI / Google clients. Your code calls
client.chat(...) either way — only the model is faked. That means you exercise the
real code path with a deterministic, free, instant “model”.
Run it
Section titled “Run it”uv run examples/functionclient-chat.pymodel said: 'FAKE MODEL SAYS: HELLO, KAOS'calls recorded: 1The code
Section titled “The code”#!/usr/bin/env -S uv run --script# /// script# requires-python = ">=3.13"# dependencies = ["kaos-llm-client>=0.1.9,<0.2"]# ///"""The offline LLM seam: a deterministic fake model via FunctionClient.
Every LLM example on this site runs for free in CI with no API key. Thetrick is `FunctionClient` — a provider client that runs a Python callableinstead of making an HTTP request, while satisfying the *same* interfaceas the real Anthropic / OpenAI / Google clients. So the same code path(`client.chat(...)`) is exercised; only the model is faked.
By default this runs offline. Set `KAOS_LEARN_LIVE=1` (and an API key) tohit a real provider instead — the rest of the program is identical.
Run it (offline, no key):
uv run examples/functionclient-chat.py
Run it live:
KAOS_LEARN_LIVE=1 ANTHROPIC_API_KEY=sk-... uv run examples/functionclient-chat.py"""
from __future__ import annotations
import os
from kaos_llm_client.providers.function import FunctionClientfrom kaos_llm_client.types import ContentPart, ProviderResponse
def fake_model(messages: list[dict], profile) -> ProviderResponse: """A deterministic 'model': echo the user's text back, uppercased.
A real provider would return a generated completion here; for tests we return exactly what we want so assertions are stable. """ user_text = messages[-1]["content"] return ProviderResponse( provider="function", model="function-test", raw={}, parts=[ContentPart(type="text", text=f"FAKE MODEL SAYS: {user_text.upper()}")], )
def make_client(): """Offline by default; a real client when KAOS_LEARN_LIVE is set.""" if os.environ.get("KAOS_LEARN_LIVE"): from kaos_llm_client import create_client
# Anthropic Haiku is the documented default for live examples. return create_client("anthropic:claude-haiku-4-5") return FunctionClient(function=fake_model)
def main() -> str: client = make_client() response = client.chat([{"role": "user", "content": "hello, kaos"}]) print(f"model said: {response.text!r}") # The client records every call — handy for asserting what was sent. if isinstance(client, FunctionClient): print(f"calls recorded: {len(client.call_history)}") return response.text
if __name__ == "__main__": text = main() if not os.environ.get("KAOS_LEARN_LIVE"): # Offline path is fully deterministic. assert text == "FAKE MODEL SAYS: HELLO, KAOS", textWhat to notice
Section titled “What to notice”fake_model(messages, profile)is the whole “model” — it receives the chat messages and returns aProviderResponse. Return whatever you want; assertions become deterministic.FunctionClient(function=fake_model)is a drop-in for a real client. The sameclient.chat([...])call works against both.make_client()is the swap point: offline by default, real provider whenKAOS_LEARN_LIVE=1is set. Every example on this site uses this pattern — that’s how the docs stay tested without burning tokens.client.call_historyrecords exactly what was sent, which is invaluable for asserting your prompts.
You have the offline seam. Now make a model call with it.
Your first model call → Make a “real” model call — the same code, with the live provider path explained.