Contract (core)

The dependency-free heart of the library: the ports every adapter implements, the domain models, the results the ports reference, and the error taxonomy. This is the frozen surface — everything else either implements a port here or orchestrates ports defined here (see Architecture).

Ports

Structural contracts (typing.Protocol). First-party adapters subclass their port explicitly so the type checker verifies completeness; third-party code may conform purely structurally.

Serializer

Embedder

Maps text to dense vectors.

VectorIndex

Spec for a vector-index backend — stateless and reusable.

SearchableIndex

Immutable fitted artifact produced by VectorIndex.build().

Blocker

Spec for candidate generation — stateless and reusable.

BlockingIndex

Immutable fitted artifact produced by Blocker.build().

Filter

A second comparison-space reduction, distinct from blocking: prune a candidate set before matching.

Matcher

Clusterer

Groups a LinkageResult's matches into entity clusters — a swappable strategy, not a fixed step.

Trainer

Produces a frozen component from supervised data (v2, [train]).

Models

Record

RecordId

CandidatePair

A pair to be matched.

MatchDecision

A successful decision — is_match is always a real bool.

MatchError

A pair the matcher could not decide (retries exhausted / backend error).

Source

Data-bound config travelling with a frame.

Results

LinkageResult

All candidate pairs with their match decisions.

ClusteringResult

Entity clusters as a record-id -> cluster-id map (sklearn-style labels).

LabeledPairs

The gold set of true matches — one type everywhere.

TrainingPairs

Supervised material for a Trainer (v2).

Errors

The hard-failure taxonomy. DenseLinkageError is the catchable root for data / runtime failures; API misuse raises a plain ValueError, kept deliberately outside this family.

DenseLinkageError

Root of every hard failure raised by denselinkage.

UnknownIdColumn

Source.id_column is not a column of Source.frame.

EmptySource

Source.frame has no rows.

DuplicateRecordId

Source.id_column contains duplicate ids (record identity must be unique within a source).

DimensionMismatch

An embedder's output width does not match the vector index (or a query embedding does not match the indexed vectors).

InvalidTopK

A blocker top_k is not a positive integer.

IncompatibleStore

A persisted index cannot be reloaded as requested: the re-supplied embedder's model_id / embedding_dim does not match the stored provenance, or the store's format version is unsupported.