Skip to content

Extract entities with a local NER model

Goal: turn unstructured text into structured records — the people, organizations, amounts, and dates in a document — without an LLM or API key. kaos-nlp-transformers ships a zero-shot NER extractor (GLiNER) that runs locally on a small ONNX model: you just name the labels you want.

Terminal window
uv run examples/extract-entities.py
date: 'January 5, 2026' (0.97)
organization: 'Acme Corporation' (0.99)
money: '$2,500,000' (0.95)
person: 'Jane Doe' (0.99)
examples/extract-entities.py
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.13"
# dependencies = ["kaos-nlp-transformers>=0.1.5,<0.2"]
# ///
"""Extract entities from text with a local NER model — people, orgs, money, dates.
`kaos-nlp-transformers` ships a zero-shot NER extractor (GLiNER) that pulls typed
entities out of text *without an LLM or API key* — you just name the labels you
want. It runs locally on a small ONNX model. This is the offline
information-extraction backbone for building databases from documents.
Model note: the first run downloads the ONNX model (~tens of MB) from Hugging
Face and caches it; subsequent runs are offline. (To pre-warm a cache for CI or
air-gapped use, see how-to/prefetch-models.)
Run it:
uv run examples/extract-entities.py
"""
from __future__ import annotations
import kaos_nlp_transformers as knt
TEXT = (
"On January 5, 2026, Acme Corporation paid $2,500,000 to Jane Doe "
"to settle the matter under the Master Services Agreement."
)
LABELS = ["person", "organization", "money", "date"]
def main() -> dict[str, str]:
extractor = knt.GLiNERExtractor.load()
# extract() takes a batch of texts and returns a list of entity lists.
entities = extractor.extract([TEXT], labels=LABELS)[0]
print(f"entities in:\n {TEXT!r}\n")
found = {}
for e in entities:
print(f" {e.label:>13}: {e.text!r} ({e.score:.2f})")
found[e.label] = e.text
return found
if __name__ == "__main__":
found = main()
# The model reliably pulls the org, the amount, the person, and the date.
assert found.get("organization") == "Acme Corporation"
assert "2,500,000" in found.get("money", "")
assert found.get("person") == "Jane Doe"
assert "2026" in found.get("date", "")

What to notice

  • Zero-shot. You pass labels=[...] — any labels, no fine-tuning. Need court, statute, product? Add them to the list.
  • Local and private. It runs on a local ONNX model; the text never leaves the machine. The first run downloads the model (~tens of MB) and caches it — pre-warm it for CI or air-gapped use with prefetch-models.
  • Deterministic enough to build on. High-confidence spans with offsets — feed them straight into a complaint database, a knowledge graph, or structured extraction.
  • This is the offline complement to LLM extraction: use NER for the entities a model recognizes out of the box, and a typed Call for bespoke, schema-shaped fields.