kb/openspec/changes/kb-search/specs/embedding-management/spec.md

## ADDED Requirements

### Requirement: Model initialisation
The system SHALL download the embedding model on `kb init`. The default model SHALL be `all-MiniLM-L6-v2`. The user MAY specify a different model via `kb init --model <name>`. The model SHALL be downloaded via sentence-transformers to the HuggingFace default cache (`~/.cache/huggingface/`). On first load, the model SHALL be exported to ONNX format for inference.

#### Scenario: Default init
- **WHEN** user runs `kb init`
- **THEN** the system downloads `all-MiniLM-L6-v2`, creates `~/.kb/kb.db` with the schema, and records `model_name=all-MiniLM-L6-v2` and `embedding_dim=384` in the DB config table

#### Scenario: Init with custom model
- **WHEN** user runs `kb init --model nomic-embed-text`
- **THEN** the system downloads `nomic-embed-text`, creates the database, and records the model name and its dimension in the DB config table

#### Scenario: Init status check
- **WHEN** user runs `kb init --status`
- **THEN** the system reports: whether `~/.kb/` exists, whether the DB is initialised, which model is configured, whether the model is downloaded, and Docling model status

#### Scenario: ONNX export on first load
- **WHEN** the embedding model is loaded for the first time after download
- **THEN** the system SHALL display "Optimising model for ONNX inference (one-time)..." and export the model to ONNX format. Subsequent loads SHALL use the cached ONNX export.

### Requirement: Model-database binding
The system SHALL store the active model name and embedding dimension in the database `config` table. Every operation that uses the embedding model (add, search, reindex) SHALL verify that the loaded model matches the DB record. A mismatch SHALL be a hard error.

#### Scenario: Model mismatch on add
- **WHEN** user runs `kb add doc.pdf` but the config YAML specifies a different model than what the DB was initialised with
- **THEN** the system SHALL print an error: "Model mismatch: DB uses 'all-MiniLM-L6-v2' (384 dim) but config specifies 'nomic-embed-text'. Run `kb reindex --model nomic-embed-text` to switch models." and exit with non-zero status

#### Scenario: Model match on add
- **WHEN** user runs `kb add doc.pdf` and the config model matches the DB model
- **THEN** ingestion proceeds normally

### Requirement: Full reindex with model switching
The system SHALL support re-embedding all chunks via `kb reindex`. If `--model` is specified, the system SHALL download the new model, re-embed all chunks, replace all vectors, and update the DB config. A progress bar SHALL be displayed. The operation SHALL be atomic — if interrupted, the old embeddings remain intact.

#### Scenario: Reindex with same model
- **WHEN** user runs `kb reindex`
- **THEN** all chunks are re-embedded with the current model and vectors are replaced. Useful if the model's ONNX export was corrupted or chunks were modified.

#### Scenario: Reindex with new model
- **WHEN** user runs `kb reindex --model bge-small-en-v1.5`
- **THEN** the system downloads the new model, re-embeds all chunks (showing progress), replaces all vectors in `chunks_vec` (recreating the table if dimension changed), and updates `model_name` and `embedding_dim` in the DB config table

#### Scenario: Interrupted reindex
- **WHEN** a reindex is interrupted partway through
- **THEN** the old embeddings remain intact (the vector table is only replaced on successful completion of all embeddings). The user can rerun `kb reindex` to retry.

### Requirement: Embedding model inference via ONNX
The system SHALL use `sentence-transformers` with the ONNX backend for all embedding inference. This avoids a PyTorch dependency. The ONNX Runtime (`onnxruntime`) SHALL be the inference engine.

#### Scenario: Embed a chunk
- **WHEN** a chunk of text needs to be embedded during ingestion
- **THEN** the system uses the sentence-transformers ONNX backend to produce a float vector of the correct dimension for the active model

#### Scenario: Embed a query
- **WHEN** a search query needs to be embedded
- **THEN** the system applies the configured `query_prefix` (if any) to the query text before embedding, and uses the same ONNX model used for chunk embeddings