denselinkage.embedding.HashedNGramEmbedder

class denselinkage.embedding.HashedNGramEmbedder(n_features: int = 256, ngram: int = 3)[source]

Bases: Embedder

Dependency-free reference embedder.

Character n-gram feature hashing: count character n-grams into n_features buckets via a stable hash (zlib.crc32 — deterministic across processes, unlike builtin hash() which is PYTHONHASHSEED- salted), then L2-normalize so inner product equals cosine. Lexical: it recovers abbreviations, punctuation and typos, not semantic renames.

property model_id: str

Stable identifier of the embedding model (e.g. its name/checkpoint). Reserved for provenance: the Phase-B Reference Store records it so a persisted index can refuse a query embedded by a different model. Not consumed by the v1 link path.

property embedding_dim: int

Output width of encode(). Reserved for eager dimension validation; v1 instead detects width mismatches at search time via DimensionMismatch, so this is not yet consumed.