denselinkage.linkage.LinkageIndex

class denselinkage.linkage.LinkageIndex(*, blocking_index: BlockingIndex, matcher: Matcher)[source]

Bases: object

Prepared linkage state: a built BlockingIndex fused with a Matcher. Constructed by DenseLinker.index; not typically built directly. kw_only for consistency with DenseLinker.

candidates(source: Source, *, top_k: int | None = None, similarity_threshold: float | None = None) list[CandidatePair][source]

Blocking-only counterpart to query(): read source and return the blocker’s CandidatePair objects (record_a = indexed, record_b = from source) without running the matcher — the ergonomic input to blocking_metrics / pair_completeness_at_k.

top_k / similarity_threshold override the built blocker’s spec for this call only (e.g. a large top_k to sweep pair-completeness over several k), reusing this prepared index instead of rebuilding it. InvalidTopK if an override top_k <= 0; otherwise the same RecordReader failure modes as query().

query(source: Source) LinkageResult[source]

Query the prepared index with source.

Raises (from the RecordReader seam): UnknownIdColumn if source.id_column is absent, EmptySource if the frame is empty, DuplicateRecordId on duplicate ids, DimensionMismatch if the query embedding width differs from the indexed vectors. All subclass denselinkage.core.errors.DenseLinkageError.

save(path: str | Path) None[source]

Persist this prepared index to path (a directory) so the reference set can be reused later without re-embedding. Supported for the dependency-free reference stack (DenseBlocker over NumpyFlatIndex); other backends raise NotImplementedError. The matcher is not persisted — supply one (and the matching embedder) to load().

classmethod load(path: str | Path, *, embedder: Embedder, matcher: Matcher) LinkageIndex[source]

Reload an index saved by save(), re-supplying the live embedder and matcher. The embedder must match the stored provenance (model_id / embedding_dim) or denselinkage.core.errors.IncompatibleStore is raised — a persisted index cannot be queried with a different embedding model.