denselinkage.core.results.LabeledPairs¶
- class denselinkage.core.results.LabeledPairs(pairs: frozenset[tuple[str, str]])[source]¶
Bases:
objectThe gold set of true matches — one type everywhere.
Pairs are stored exactly as given (ordered); no symmetrization happens at construction. Each tuple is
(left_id, right_id). Evaluation comparison depends on the verb:link(two sources): the order is meaningful — a gold(a, b)matches a result pair whose left id isaand right id isb.dedupe(one source): left/right is arbitrary, so metrics canonicalize both gold and result pairs to an unordered key (frozenset({a, b})) before comparing. This removes the silent recall/precision fork.
See the matching docstrings of
linkage_metrics/pair_completeness_at_kfor which comparison each applies.- split(*, test_size: float, seed: int | None = None) tuple[LabeledPairs, LabeledPairs][source]¶
Partition the gold pairs into
(train, test).test_sizeis the fraction in[0.0, 1.0]routed totest(round(test_size * n)pairs); the rest go totrain. Pairs are sorted before a seeded shuffle, so the split is reproducible givenseed(None= nondeterministic). RaisesValueErroriftest_sizeis outside[0.0, 1.0].The split is pair-level: a record/entity may appear in both halves, which is fine for tuning a scalar decision threshold. For entity-disjoint evaluation (e.g. of a trained matcher) split by gold cluster instead.