Skip to content

Offline LLM with FunctionClient

Before you call a real model, learn the seam that makes every LLM example here runnable offline, for free, with no API key — and tested in CI.

kaos-llm-client ships a FunctionClient: a provider client that runs a Python callable instead of making an HTTP request, while satisfying the exact same interface as the real Anthropic / OpenAI / Google clients. Your code calls client.chat(...) either way — only the model is faked. That means you exercise the real code path with a deterministic, free, instant “model”.

Terminal window
uv run examples/functionclient-chat.py
model said: 'FAKE MODEL SAYS: HELLO, KAOS'
calls recorded: 1
examples/functionclient-chat.py
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.13"
# dependencies = ["kaos-llm-client>=0.1.9,<0.2"]
# ///
"""The offline LLM seam: a deterministic fake model via FunctionClient.
Every LLM example on this site runs for free in CI with no API key. The
trick is `FunctionClient` — a provider client that runs a Python callable
instead of making an HTTP request, while satisfying the *same* interface
as the real Anthropic / OpenAI / Google clients. So the same code path
(`client.chat(...)`) is exercised; only the model is faked.
By default this runs offline. Set `KAOS_LEARN_LIVE=1` (and an API key) to
hit a real provider instead — the rest of the program is identical.
Run it (offline, no key):
uv run examples/functionclient-chat.py
Run it live:
KAOS_LEARN_LIVE=1 ANTHROPIC_API_KEY=sk-... uv run examples/functionclient-chat.py
"""
from __future__ import annotations
import os
from kaos_llm_client.providers.function import FunctionClient
from kaos_llm_client.types import ContentPart, ProviderResponse
def fake_model(messages: list[dict], profile) -> ProviderResponse:
"""A deterministic 'model': echo the user's text back, uppercased.
A real provider would return a generated completion here; for tests we
return exactly what we want so assertions are stable.
"""
user_text = messages[-1]["content"]
return ProviderResponse(
provider="function",
model="function-test",
raw={},
parts=[ContentPart(type="text", text=f"FAKE MODEL SAYS: {user_text.upper()}")],
)
def make_client():
"""Offline by default; a real client when KAOS_LEARN_LIVE is set."""
if os.environ.get("KAOS_LEARN_LIVE"):
from kaos_llm_client import create_client
# Anthropic Haiku is the documented default for live examples.
return create_client("anthropic:claude-haiku-4-5")
return FunctionClient(function=fake_model)
def main() -> str:
client = make_client()
response = client.chat([{"role": "user", "content": "hello, kaos"}])
print(f"model said: {response.text!r}")
# The client records every call — handy for asserting what was sent.
if isinstance(client, FunctionClient):
print(f"calls recorded: {len(client.call_history)}")
return response.text
if __name__ == "__main__":
text = main()
if not os.environ.get("KAOS_LEARN_LIVE"):
# Offline path is fully deterministic.
assert text == "FAKE MODEL SAYS: HELLO, KAOS", text
  • fake_model(messages, profile) is the whole “model” — it receives the chat messages and returns a ProviderResponse. Return whatever you want; assertions become deterministic.
  • FunctionClient(function=fake_model) is a drop-in for a real client. The same client.chat([...]) call works against both.
  • make_client() is the swap point: offline by default, real provider when KAOS_LEARN_LIVE=1 is set. Every example on this site uses this pattern — that’s how the docs stay tested without burning tokens.
  • client.call_history records exactly what was sent, which is invaluable for asserting your prompts.

You have the offline seam. Now make a model call with it.