denselinkage.embedding.HashedNGramEmbedder¶
- class denselinkage.embedding.HashedNGramEmbedder(n_features: int = 256, ngram: int = 3)[source]¶
Bases:
EmbedderDependency-free reference embedder.
Character n-gram feature hashing: count character n-grams into
n_featuresbuckets via a stable hash (zlib.crc32— deterministic across processes, unlike builtinhash()which isPYTHONHASHSEED- salted), then L2-normalize so inner product equals cosine. Lexical: it recovers abbreviations, punctuation and typos, not semantic renames.