Skip to content

Check parties against a client list

Goal: when a new matter comes in, check its parties against your existing clients — and catch the near matches a string equality test would miss (“Acme Corporaton”, “Initech, LLC”). kaos-nlp-core’s FstSet indexes the known names and does edit-distance and type-ahead lookups: deterministic, fast, offline, no model.

Terminal window
uv run examples/conflict-check.py
CONFLICT 'Acme Corporaton' ~ 'Acme Corporation' (distance 1)
CONFLICT 'globex industries' ~ 'Globex Industries' (distance 0)
CONFLICT 'Initech, LLC' ~ 'Initech LLC' (distance 1)
clear 'Stark Industries' (no known client within distance 2)
examples/conflict-check.py
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.13"
# dependencies = ["kaos-nlp-core>=0.1.6,<0.2"]
# ///
"""Check new parties against a client list — fuzzy, for conflicts of interest.
Conflict checking has to catch *near* matches: "Acme Corporaton" (a typo) and
"Initech, LLC" (punctuation) must still hit "Acme Corporation" and "Initech LLC".
`kaos-nlp-core`'s `FstSet` builds a finite-state index of known names and does
edit-distance (`fuzzy_search`) and type-ahead (`prefix_search`) lookups — fast,
deterministic, offline, no model.
Run it:
uv run examples/conflict-check.py
"""
from __future__ import annotations
from kaos_nlp_core.matching import FstSet
KNOWN_CLIENTS = [
"Acme Corporation",
"Globex Industries",
"Initech LLC",
"Wayne Enterprises",
]
# Names appearing on a new matter — some are the same parties, spelled loosely.
INCOMING = [
"Acme Corporaton", # typo -> Acme Corporation
"globex industries", # different case -> Globex Industries
"Initech, LLC", # punctuation -> Initech LLC
"Stark Industries", # genuinely new -> no conflict
]
def main() -> dict[str, str | None]:
# Normalize case so a casing difference isn't counted as edits; keep a map
# back to the canonical client name.
canonical = {c.lower(): c for c in KNOWN_CLIENTS}
index = FstSet(sorted(canonical))
results: dict[str, str | None] = {}
for name in INCOMING:
hits = index.fuzzy_search(name.lower(), 2) # within edit distance 2
if hits:
best = min(hits, key=lambda h: h.distance)
match = canonical[best.key]
print(f" CONFLICT {name!r:22} ~ {match!r} (distance {best.distance})")
results[name] = match
else:
print(f" clear {name!r:22} (no known client within distance 2)")
results[name] = None
return results
if __name__ == "__main__":
results = main()
# The three variants resolve to known clients; the genuinely new party is clear.
assert results["Acme Corporaton"] == "Acme Corporation"
assert results["globex industries"] == "Globex Industries"
assert results["Initech, LLC"] == "Initech LLC"
assert results["Stark Industries"] is None

What to notice

  • FstSet(keys) builds a finite-state index; fuzzy_search(query, max_distance) returns matches within an edit distance, each with its .distance. Tune max_distance to trade recall for precision.
  • Normalize first. Lowercasing (as here) keeps a casing difference from eating your edit budget; map back to the canonical name for display. Add your own normalization (drop “Inc.”/“LLC”, collapse whitespace) for stricter entity resolution.
  • prefix_search on the same index powers type-ahead for a search box.
  • This is the deterministic counterpart to near-duplicate detection (which works on whole documents); use FstSet for short strings like names and titles.