Search text with BM25

Goal: find the most relevant sentences in a block of text for a query — the same lexical retrieval agents use to assemble context. No model, no key, deterministic.

Use kaos-nlp-core’s search_sentences(text, query), which segments the text and ranks the sentences with BM25.

uv run examples/bm25-search.py

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.13"
# dependencies = ["kaos-nlp-core>=0.1.6,<0.2"]
# ///
"""Rank sentences by relevance with BM25 — fast, offline retrieval.

`kaos-nlp-core` is a Rust-backed NLP engine. `search_sentences` segments a
block of text into sentences and ranks them against a query with BM25 — the
classic lexical retrieval algorithm agents use to pull relevant context out
of a corpus. No model, no key, fully deterministic.

Run it:

    uv run examples/bm25-search.py
"""

from __future__ import annotations

from kaos_nlp_core.search import search_sentences

TEXT = (
    "The lease term is five years. "
    "Rent is due monthly on the first. "
    "The tenant may renew for an additional term. "
    "Late rent incurs a five percent fee."
)


def main() -> list:
    hits = search_sentences(TEXT, "rent", top_k=3)
    print(f'Top matches for "rent":\n')
    for h in hits:
        # Each hit carries the matched sentence, its BM25 score, and the
        # character span it occupies in the source text.
        print(f"  {h.score:.3f}  {h.text!r}  (chars {h.start}-{h.end})")
    return hits


if __name__ == "__main__":
    hits = main()
    assert hits, "expected at least one hit"
    assert "Rent" in hits[0].text, f"unexpected top hit: {hits[0].text!r}"

Notes

search_sentences returns SegmentHits with the matched text, a BM25 score, and the start/end character span in the source.
For multi-document corpora, use Searcher.from_documents(...) instead.
BM25 is KAOS’s default retrieval strategy in agents — see the concept page on why plain BM25 beats fancier schemes cross-domain (landing soon).