denselinkage.clustering.connected_components¶
- denselinkage.clustering.connected_components(result: LinkageResult, *, all_record_ids: Iterable[str] | None = None) ClusteringResult[source]¶
Connected-components clustering: transitively close the matched pairs in
resultand label each record with its component id. Convenience form ofConnectedComponentsClusterer; kept in the prelude.Nodes are every record the pipeline compared — those in
result.decisions(matched or not) orresult.errors— so a record that was never matched (including pairs the matcher could not decide) becomes its own singleton cluster; edges are the matched pairs only. Clustering is transitive: if A matches B and B matches C, all three share a cluster even if A and C were never matched directly. Cluster ids are0..n_clusters-1, assigned deterministically by each component’s smallest record id.A record that produced no candidate pair at all (e.g.
dedupewith a smalltop_kwhose only neighbour was itself) does not appear inresult. Passall_record_idsto seed the clustering universe with the full id set: every listed record is labelled — an unmatched one becomes its own singleton — soclustering_metricsreports a complete B³ over all records instead of one inflated by the dropped records. Ids are stringified to match record ids;None(the default) keeps the universe to the records seen inresult.