Components

The pluggable adapters, grouped by pipeline stage. Each family has a port in Contract (core); the classes below are the reference adapters that declare it. Heavy adapters import their backend lazily and require an extra (noted per class) — installing the core package never pulls them in.

Serializing

TemplateSerializer

"Name: {name}"; column_mapping maps this source's columns onto the template variables.

FieldwiseSerializer

WholeRowSerializer

Package default when Source(serializer=None).

default_serializer

Embedding

HashedNGramEmbedder

Dependency-free reference embedder.

SentenceTransformerEmbedder

Semantic embedder over a sentence-transformers checkpoint (extra: [sentence-transformers]).

Indexing

NumpyFlatIndex

Dependency-free reference index spec (brute-force / flat search).

NumpySearchableIndex

Immutable artifact built by NumpyFlatIndex — exhaustive (flat) nearest-neighbour search by inner product (which equals cosine for the L2-normalized vectors the reference embedder produces).

FaissFlatIndex

FAISS-backed reference index spec — exact (flat) inner-product search (extra: [faiss]).

FaissSearchableIndex

Immutable artifact built by FaissFlatIndex — exact (flat) inner-product nearest-neighbour search over a faiss.IndexFlatIP.

Blocking

DenseBlocker

Dense-blocking spec.

DenseBlockingIndex

Immutable artifact built by DenseBlocker for one reference set.

Filtering

SimilarityThresholdFilter

Keeps pairs whose similarity_score >= threshold.

Matching

ThresholdMatcher

Dependency-free reference matcher; gates on the carried similarity.

LangChainMatcher

LLM matcher (extra: [langchain]).

RetryPolicy

Clustering

ConnectedComponentsClusterer

Reference Clusterer adapter; delegates to denselinkage.clustering.connected_components().

connected_components

Connected-components clustering: transitively close the matched pairs in result and label each record with its component id.

Mining

mine_hard_negatives

The hardest negatives among candidates: scored pairs not in gold, ordered by descending similarity (ties broken by id for determinism).