denselinkage.embedding.SentenceTransformerEmbedder

class denselinkage.embedding.SentenceTransformerEmbedder(model_name: str)[source]

Bases: Embedder

Semantic embedder over a sentence-transformers checkpoint (extra: [sentence-transformers]).

Where the lexical HashedNGramEmbedder recovers typos and abbreviations, this captures meaning: it can link semantic renames (e.g. Google / Alphabet) that share no characters. Encodes with normalize_embeddings=True so the unit-vector inner product equals cosine — the similarity the numpy / FAISS indexes and similarity_threshold are defined against. The model loads eagerly at construction, so a bad model_name fails fast.

property model_id: str

Stable identifier of the embedding model (e.g. its name/checkpoint). Reserved for provenance: the Phase-B Reference Store records it so a persisted index can refuse a query embedded by a different model. Not consumed by the v1 link path.

property embedding_dim: int

Output width of encode(). Reserved for eager dimension validation; v1 instead detects width mismatches at search time via DimensionMismatch, so this is not yet consumed.