Files
kb/openspec/changes/kb-search/specs/embedding-management/spec.md
T
2026-03-23 20:38:42 +00:00

4.1 KiB

ADDED Requirements

Requirement: Model initialisation

The system SHALL download the embedding model on kb init. The default model SHALL be all-MiniLM-L6-v2. The user MAY specify a different model via kb init --model <name>. The model SHALL be downloaded via sentence-transformers to the HuggingFace default cache (~/.cache/huggingface/). On first load, the model SHALL be exported to ONNX format for inference.

Scenario: Default init

  • WHEN user runs kb init
  • THEN the system downloads all-MiniLM-L6-v2, creates ~/.kb/kb.db with the schema, and records model_name=all-MiniLM-L6-v2 and embedding_dim=384 in the DB config table

Scenario: Init with custom model

  • WHEN user runs kb init --model nomic-embed-text
  • THEN the system downloads nomic-embed-text, creates the database, and records the model name and its dimension in the DB config table

Scenario: Init status check

  • WHEN user runs kb init --status
  • THEN the system reports: whether ~/.kb/ exists, whether the DB is initialised, which model is configured, whether the model is downloaded, and Docling model status

Scenario: ONNX export on first load

  • WHEN the embedding model is loaded for the first time after download
  • THEN the system SHALL display "Optimising model for ONNX inference (one-time)..." and export the model to ONNX format. Subsequent loads SHALL use the cached ONNX export.

Requirement: Model-database binding

The system SHALL store the active model name and embedding dimension in the database config table. Every operation that uses the embedding model (add, search, reindex) SHALL verify that the loaded model matches the DB record. A mismatch SHALL be a hard error.

Scenario: Model mismatch on add

  • WHEN user runs kb add doc.pdf but the config YAML specifies a different model than what the DB was initialised with
  • THEN the system SHALL print an error: "Model mismatch: DB uses 'all-MiniLM-L6-v2' (384 dim) but config specifies 'nomic-embed-text'. Run kb reindex --model nomic-embed-text to switch models." and exit with non-zero status

Scenario: Model match on add

  • WHEN user runs kb add doc.pdf and the config model matches the DB model
  • THEN ingestion proceeds normally

Requirement: Full reindex with model switching

The system SHALL support re-embedding all chunks via kb reindex. If --model is specified, the system SHALL download the new model, re-embed all chunks, replace all vectors, and update the DB config. A progress bar SHALL be displayed. The operation SHALL be atomic — if interrupted, the old embeddings remain intact.

Scenario: Reindex with same model

  • WHEN user runs kb reindex
  • THEN all chunks are re-embedded with the current model and vectors are replaced. Useful if the model's ONNX export was corrupted or chunks were modified.

Scenario: Reindex with new model

  • WHEN user runs kb reindex --model bge-small-en-v1.5
  • THEN the system downloads the new model, re-embeds all chunks (showing progress), replaces all vectors in chunks_vec (recreating the table if dimension changed), and updates model_name and embedding_dim in the DB config table

Scenario: Interrupted reindex

  • WHEN a reindex is interrupted partway through
  • THEN the old embeddings remain intact (the vector table is only replaced on successful completion of all embeddings). The user can rerun kb reindex to retry.

Requirement: Embedding model inference via ONNX

The system SHALL use sentence-transformers with the ONNX backend for all embedding inference. This avoids a PyTorch dependency. The ONNX Runtime (onnxruntime) SHALL be the inference engine.

Scenario: Embed a chunk

  • WHEN a chunk of text needs to be embedded during ingestion
  • THEN the system uses the sentence-transformers ONNX backend to produce a float vector of the correct dimension for the active model

Scenario: Embed a query

  • WHEN a search query needs to be embedded
  • THEN the system applies the configured query_prefix (if any) to the query text before embedding, and uses the same ONNX model used for chunk embeddings