## ADDED Requirements ### Requirement: Model initialisation The system SHALL download the embedding model on `kb init`. The default model SHALL be `all-MiniLM-L6-v2`. The user MAY specify a different model via `kb init --model `. The model SHALL be downloaded via sentence-transformers to the HuggingFace default cache (`~/.cache/huggingface/`). On first load, the model SHALL be exported to ONNX format for inference. #### Scenario: Default init - **WHEN** user runs `kb init` - **THEN** the system downloads `all-MiniLM-L6-v2`, creates `~/.kb/kb.db` with the schema, and records `model_name=all-MiniLM-L6-v2` and `embedding_dim=384` in the DB config table #### Scenario: Init with custom model - **WHEN** user runs `kb init --model nomic-embed-text` - **THEN** the system downloads `nomic-embed-text`, creates the database, and records the model name and its dimension in the DB config table #### Scenario: Init status check - **WHEN** user runs `kb init --status` - **THEN** the system reports: whether `~/.kb/` exists, whether the DB is initialised, which model is configured, whether the model is downloaded, and Docling model status #### Scenario: ONNX export on first load - **WHEN** the embedding model is loaded for the first time after download - **THEN** the system SHALL display "Optimising model for ONNX inference (one-time)..." and export the model to ONNX format. Subsequent loads SHALL use the cached ONNX export. ### Requirement: Model-database binding The system SHALL store the active model name and embedding dimension in the database `config` table. Every operation that uses the embedding model (add, search, reindex) SHALL verify that the loaded model matches the DB record. A mismatch SHALL be a hard error. #### Scenario: Model mismatch on add - **WHEN** user runs `kb add doc.pdf` but the config YAML specifies a different model than what the DB was initialised with - **THEN** the system SHALL print an error: "Model mismatch: DB uses 'all-MiniLM-L6-v2' (384 dim) but config specifies 'nomic-embed-text'. Run `kb reindex --model nomic-embed-text` to switch models." and exit with non-zero status #### Scenario: Model match on add - **WHEN** user runs `kb add doc.pdf` and the config model matches the DB model - **THEN** ingestion proceeds normally ### Requirement: Full reindex with model switching The system SHALL support re-embedding all chunks via `kb reindex`. If `--model` is specified, the system SHALL download the new model, re-embed all chunks, replace all vectors, and update the DB config. A progress bar SHALL be displayed. The operation SHALL be atomic — if interrupted, the old embeddings remain intact. #### Scenario: Reindex with same model - **WHEN** user runs `kb reindex` - **THEN** all chunks are re-embedded with the current model and vectors are replaced. Useful if the model's ONNX export was corrupted or chunks were modified. #### Scenario: Reindex with new model - **WHEN** user runs `kb reindex --model bge-small-en-v1.5` - **THEN** the system downloads the new model, re-embeds all chunks (showing progress), replaces all vectors in `chunks_vec` (recreating the table if dimension changed), and updates `model_name` and `embedding_dim` in the DB config table #### Scenario: Interrupted reindex - **WHEN** a reindex is interrupted partway through - **THEN** the old embeddings remain intact (the vector table is only replaced on successful completion of all embeddings). The user can rerun `kb reindex` to retry. ### Requirement: Embedding model inference via ONNX The system SHALL use `sentence-transformers` with the ONNX backend for all embedding inference. This avoids a PyTorch dependency. The ONNX Runtime (`onnxruntime`) SHALL be the inference engine. #### Scenario: Embed a chunk - **WHEN** a chunk of text needs to be embedded during ingestion - **THEN** the system uses the sentence-transformers ONNX backend to produce a float vector of the correct dimension for the active model #### Scenario: Embed a query - **WHEN** a search query needs to be embedded - **THEN** the system applies the configured `query_prefix` (if any) to the query text before embedding, and uses the same ONNX model used for chunk embeddings