Metrics

Pure functions over already-computed pipeline outputs, plus the typed report dataclasses they return. Each evaluator takes ground truth as a keyword-only gold argument. Per Architecture (ADR-0002) these live in the metrics layer, not in core — nothing in the contract depends on them.

Functions

linkage_metrics

Precision/recall/F1 over all candidate pairs against gold.

blocking_metrics

Pair-completeness@k for each k in ks.

pair_completeness_at_k

Single-k pair-completeness (kwarg gold, consistent with the other metrics).

clustering_metrics

B³ clustering quality of clustering against gold.

tune_threshold

Sweep a similarity threshold over scored candidates and return the full precision/recall/F1 curve as a ThresholdSweep.

adjusted_metrics

Decompose end-to-end recall into matcher x blocker components.

Reports

LinkageMetrics

Contract: pairs that errored (a MatchError in LinkageResult.errors) are excluded from tp/fp/fn and reported as n_errors.

BlockingMetrics

Pair-completeness@k.

ClusteringMetrics

B³ (Bagga-Baldwin) clustering quality.

ThresholdSweep

Full precision/recall/F1 curve over a threshold grid.

AdjustedMetrics

End-to-end honest number: matcher metrics adjusted by blocker pair-completeness@k (recall_adjusted = matcher.recall * pc@k).