denselinkage.metrics.blocking_metrics¶
- denselinkage.metrics.blocking_metrics(candidates: Sequence[CandidatePair], *, gold: LabeledPairs, ks: Sequence[int], directed: bool = True) BlockingMetrics[source]¶
Pair-completeness@k for each k in
ks.Candidates are grouped by query record (
record_b) and ranked by similarity; PC@k is the fraction ofgoldpairs recalled when each query keeps its top-k candidates. Candidates are expected to be blocker-oriented (record_aindexed/reference,record_bquery), as produced byBlockingIndex.query. To sweepksmeaningfully, pass the blocker’s full ranked retrieval (a largetop_k), not an already-truncated set.Pair identity (D1): same rule as
linkage_metrics—directed=True(link) compares ordered;directed=False(dedupe) canonicalizes to an unordered key.