58 lines
4.1 KiB
Markdown
58 lines
4.1 KiB
Markdown
## ADDED Requirements
|
|
|
|
### Requirement: Model initialisation
|
|
The system SHALL download the embedding model on `kb init`. The default model SHALL be `all-MiniLM-L6-v2`. The user MAY specify a different model via `kb init --model <name>`. The model SHALL be downloaded via sentence-transformers to the HuggingFace default cache (`~/.cache/huggingface/`). On first load, the model SHALL be exported to ONNX format for inference.
|
|
|
|
#### Scenario: Default init
|
|
- **WHEN** user runs `kb init`
|
|
- **THEN** the system downloads `all-MiniLM-L6-v2`, creates `~/.kb/kb.db` with the schema, and records `model_name=all-MiniLM-L6-v2` and `embedding_dim=384` in the DB config table
|
|
|
|
#### Scenario: Init with custom model
|
|
- **WHEN** user runs `kb init --model nomic-embed-text`
|
|
- **THEN** the system downloads `nomic-embed-text`, creates the database, and records the model name and its dimension in the DB config table
|
|
|
|
#### Scenario: Init status check
|
|
- **WHEN** user runs `kb init --status`
|
|
- **THEN** the system reports: whether `~/.kb/` exists, whether the DB is initialised, which model is configured, whether the model is downloaded, and Docling model status
|
|
|
|
#### Scenario: ONNX export on first load
|
|
- **WHEN** the embedding model is loaded for the first time after download
|
|
- **THEN** the system SHALL display "Optimising model for ONNX inference (one-time)..." and export the model to ONNX format. Subsequent loads SHALL use the cached ONNX export.
|
|
|
|
### Requirement: Model-database binding
|
|
The system SHALL store the active model name and embedding dimension in the database `config` table. Every operation that uses the embedding model (add, search, reindex) SHALL verify that the loaded model matches the DB record. A mismatch SHALL be a hard error.
|
|
|
|
#### Scenario: Model mismatch on add
|
|
- **WHEN** user runs `kb add doc.pdf` but the config YAML specifies a different model than what the DB was initialised with
|
|
- **THEN** the system SHALL print an error: "Model mismatch: DB uses 'all-MiniLM-L6-v2' (384 dim) but config specifies 'nomic-embed-text'. Run `kb reindex --model nomic-embed-text` to switch models." and exit with non-zero status
|
|
|
|
#### Scenario: Model match on add
|
|
- **WHEN** user runs `kb add doc.pdf` and the config model matches the DB model
|
|
- **THEN** ingestion proceeds normally
|
|
|
|
### Requirement: Full reindex with model switching
|
|
The system SHALL support re-embedding all chunks via `kb reindex`. If `--model` is specified, the system SHALL download the new model, re-embed all chunks, replace all vectors, and update the DB config. A progress bar SHALL be displayed. The operation SHALL be atomic — if interrupted, the old embeddings remain intact.
|
|
|
|
#### Scenario: Reindex with same model
|
|
- **WHEN** user runs `kb reindex`
|
|
- **THEN** all chunks are re-embedded with the current model and vectors are replaced. Useful if the model's ONNX export was corrupted or chunks were modified.
|
|
|
|
#### Scenario: Reindex with new model
|
|
- **WHEN** user runs `kb reindex --model bge-small-en-v1.5`
|
|
- **THEN** the system downloads the new model, re-embeds all chunks (showing progress), replaces all vectors in `chunks_vec` (recreating the table if dimension changed), and updates `model_name` and `embedding_dim` in the DB config table
|
|
|
|
#### Scenario: Interrupted reindex
|
|
- **WHEN** a reindex is interrupted partway through
|
|
- **THEN** the old embeddings remain intact (the vector table is only replaced on successful completion of all embeddings). The user can rerun `kb reindex` to retry.
|
|
|
|
### Requirement: Embedding model inference via ONNX
|
|
The system SHALL use `sentence-transformers` with the ONNX backend for all embedding inference. This avoids a PyTorch dependency. The ONNX Runtime (`onnxruntime`) SHALL be the inference engine.
|
|
|
|
#### Scenario: Embed a chunk
|
|
- **WHEN** a chunk of text needs to be embedded during ingestion
|
|
- **THEN** the system uses the sentence-transformers ONNX backend to produce a float vector of the correct dimension for the active model
|
|
|
|
#### Scenario: Embed a query
|
|
- **WHEN** a search query needs to be embedded
|
|
- **THEN** the system applies the configured `query_prefix` (if any) to the query text before embedding, and uses the same ONNX model used for chunk embeddings
|