denselinkage.linkage.DenseLinker¶
- class denselinkage.linkage.DenseLinker(*, blocker: Blocker | None = None, matcher: Matcher)[source]¶
Bases:
objectPure config.
link(a, b) == index(a).query(b).kw_onlysoblockeris optional whilematcherstays required (and to forbid positional ambiguity — callers always writeDenseLinker(blocker=..., matcher=...)).blockeris optional: developers who already have candidate pairs from rule-based / external blocking have no blocker.link/dedupe/indexrequire one and raiseValueErrorif it isNone;match_pairsdoes not (it is inference with no blocking step — no learning on the linker, so the immutable-config contract holds).with_defaultsalways yields a linker with a blocker.- classmethod with_defaults(*, blocker: Blocker | None = None, matcher: Matcher | None = None) DenseLinker[source]¶
Low-floor entry point: wire the dependency-free reference stack (
HashedNGramEmbedder+NumpyFlatIndexbehindDenseBlocker, plusThresholdMatcher). Passblocker=/matcher=to override either half. The default stack is lexical (character n-gram hashing) — it recovers abbreviations/punctuation/typos, not semantic renames. Imports are local soimport denselinkagestays light.
- index(source: Source) LinkageIndex[source]¶
Build the prepared linkage state by delegating indexing to
self.blocker.build(which returns a freshBlockingIndex— this frozen config is never mutated) and composing it withself.matcher.Raises
ValueErrorifblockerisNone. From the RecordReader seam:UnknownIdColumn,EmptySource,DuplicateRecordId;InvalidTopKif the blocker’stop_k <= 0;DimensionMismatchif the embedder width differs from the index. Alldenselinkage.core.errorssubclasses.
- link(left: Source, right: Source) LinkageResult[source]¶
Two-table linkage.
Raises
ValueErrorifblockerisNone; otherwise the samedenselinkage.core.errorstaxonomy asindex(UnknownIdColumn,EmptySource,DuplicateRecordId,InvalidTopK,DimensionMismatch), evaluated for each ofleft/right.
- block(left: Source, right: Source, *, top_k: int | None = None, similarity_threshold: float | None = None) list[CandidatePair][source]¶
Blocking-only two-table affordance: the blocker’s
CandidatePairobjects forleft/rightwithout matching.block(a, b)mirrorslink(a, b) == index(a).query(b)(hereindex(a).candidates(b)); feed the result toblocking_metrics/pair_completeness_at_k.top_k/similarity_thresholdoverride the blocker’s spec for this call (e.g. a largetop_kto sweep pair-completeness). RaisesValueErrorifblockerisNone(viaindex()); otherwise the samedenselinkage.core.errorstaxonomy aslink().
- dedupe(source: Source) LinkageResult[source]¶
Single-table dedupe (self-pairs suppressed).
Raises
ValueErrorifblockerisNone; otherwise the samedenselinkage.core.errorstaxonomy asindex.
- match_pairs(candidates: Sequence[CandidatePair]) LinkageResult[source]¶
Matcher-only path: score externally supplied candidate pairs (e.g. rule-based / pre-blocked) with
self.matcher, skipping blocking. Does not requireblocker. Result flows through the sameLinkageResult/ metrics path aslink.To build the input from a DataFrame of id-pairs, use
denselinkage.candidate_pairs_from_frame(an id-pair frame + the two sources ->list[CandidatePair]). Raises no Source-validation errors here (it takes pre-builtCandidatePair``s, whose ``similarity_scoremay beNone); backend matcher failures surface per-pair asMatchError, never as exceptions.