kb/openspec/changes/kb-search/specs/embedding-management/spec.md at 2030976b8546a0bafb0402a94e5a85457caf28e7

steve/kb

Fork 0

Files

T

steve f245c24928 Initial MVP

2026-03-23 20:38:42 +00:00

4.1 KiB

Raw Blame History

ADDED Requirements

Requirement: Model initialisation

The system SHALL download the embedding model on kb init. The default model SHALL be all-MiniLM-L6-v2. The user MAY specify a different model via kb init --model <name>. The model SHALL be downloaded via sentence-transformers to the HuggingFace default cache (~/.cache/huggingface/). On first load, the model SHALL be exported to ONNX format for inference.

Scenario: Default init

WHEN user runs kb init
THEN the system downloads all-MiniLM-L6-v2, creates ~/.kb/kb.db with the schema, and records model_name=all-MiniLM-L6-v2 and embedding_dim=384 in the DB config table

Scenario: Init with custom model

WHEN user runs kb init --model nomic-embed-text
THEN the system downloads nomic-embed-text, creates the database, and records the model name and its dimension in the DB config table

Scenario: Init status check

WHEN user runs kb init --status
THEN the system reports: whether ~/.kb/ exists, whether the DB is initialised, which model is configured, whether the model is downloaded, and Docling model status

Scenario: ONNX export on first load

WHEN the embedding model is loaded for the first time after download
THEN the system SHALL display "Optimising model for ONNX inference (one-time)..." and export the model to ONNX format. Subsequent loads SHALL use the cached ONNX export.

Requirement: Model-database binding

The system SHALL store the active model name and embedding dimension in the database config table. Every operation that uses the embedding model (add, search, reindex) SHALL verify that the loaded model matches the DB record. A mismatch SHALL be a hard error.

Scenario: Model mismatch on add

WHEN user runs kb add doc.pdf but the config YAML specifies a different model than what the DB was initialised with
THEN the system SHALL print an error: "Model mismatch: DB uses 'all-MiniLM-L6-v2' (384 dim) but config specifies 'nomic-embed-text'. Run kb reindex --model nomic-embed-text to switch models." and exit with non-zero status

Scenario: Model match on add

WHEN user runs kb add doc.pdf and the config model matches the DB model
THEN ingestion proceeds normally

Requirement: Full reindex with model switching

The system SHALL support re-embedding all chunks via kb reindex. If --model is specified, the system SHALL download the new model, re-embed all chunks, replace all vectors, and update the DB config. A progress bar SHALL be displayed. The operation SHALL be atomic — if interrupted, the old embeddings remain intact.

Scenario: Reindex with same model

WHEN user runs kb reindex
THEN all chunks are re-embedded with the current model and vectors are replaced. Useful if the model's ONNX export was corrupted or chunks were modified.

Scenario: Reindex with new model

WHEN user runs kb reindex --model bge-small-en-v1.5
THEN the system downloads the new model, re-embeds all chunks (showing progress), replaces all vectors in chunks_vec (recreating the table if dimension changed), and updates model_name and embedding_dim in the DB config table

Scenario: Interrupted reindex

WHEN a reindex is interrupted partway through
THEN the old embeddings remain intact (the vector table is only replaced on successful completion of all embeddings). The user can rerun kb reindex to retry.

Requirement: Embedding model inference via ONNX

The system SHALL use sentence-transformers with the ONNX backend for all embedding inference. This avoids a PyTorch dependency. The ONNX Runtime (onnxruntime) SHALL be the inference engine.

Scenario: Embed a chunk

WHEN a chunk of text needs to be embedded during ingestion
THEN the system uses the sentence-transformers ONNX backend to produce a float vector of the correct dimension for the active model

Scenario: Embed a query

WHEN a search query needs to be embedded
THEN the system applies the configured query_prefix (if any) to the query text before embedding, and uses the same ONNX model used for chunk embeddings

4.1 KiB Raw Blame History