Prefetch models for offline use

kaos-nlp-transformers downloads ONNX models on first use. To run fully offline afterward (CI, air-gapped, or just deterministic), pre-warm the cache once, then enforce offline mode.

Prefetch, then go offline

# Download the models you'll use into the cache (one time, needs network)
kaos-nlp-transformers prefetch --include embedding --include reranker
# ...or a specific model
kaos-nlp-transformers prefetch --model BAAI/bge-small-en-v1.5

# Afterwards, force offline so no network fetch is attempted
export KAOS_NLP_TRANSFORMERS_OFFLINE=1

kaos-nlp-transformers info     # confirm what's cached and the active device

Notes

The vendored static model minishlab/potion-base-8M (the [model2vec] extra) needs no prefetch at all — it loads with no download, which is why the embeddings how-to and clustering how-to run offline out of the box.
Models are license-vetted and SHA-pinned; prefetch respects the registry.
Cache location follows KAOS_NLP_TRANSFORMERS_CACHE_DIR / HF_HOME (see environment variables).