v2 restructure: Go client, Docker engine, release tooling

- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv) - Add Go client with cross-platform build (client/) - Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/) - Add VERSION files for client and engine, wired into builds - Add release.sh for automated build, tag, release, and Docker push - Update README with build/release docs and ROCm migration note - Clean up .gitignore for v2 project structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 21:52:25 +00:00
parent 2030976b85
commit 9aab79d49b
98 changed files with 4526 additions and 7776 deletions
@@ -1,91 +1,91 @@
 ## 1. Project scaffolding

- [ ] 1.1 Create v2 project structure: `engine/` (Python/FastAPI) and `client/` (Go) directories at repo root
- [ ] 1.2 Set up `engine/pyproject.toml` with dependencies: fastapi, uvicorn, sentence-transformers, sqlite-vec, docling, pyyaml
- [ ] 1.3 Set up `client/go.mod` with dependencies: cobra, gopkg.in/yaml.v3
- [ ] 1.4 Create engine entry point (`engine/main.py`) with uvicorn startup, eager model loading, and readiness gating
+- [x] 1.1 Create v2 project structure: `engine/` (Python/FastAPI) and `client/` (Go) directories at repo root
+- [x] 1.2 Set up `engine/pyproject.toml` with dependencies: fastapi, uvicorn, sentence-transformers, sqlite-vec, docling, pyyaml
+- [x] 1.3 Set up `client/go.mod` with dependencies: cobra, gopkg.in/yaml.v3
+- [x] 1.4 Create engine entry point (`engine/main.py`) with uvicorn startup, eager model loading, and readiness gating

 ## 2. Database layer

- [ ] 2.1 Implement database module (`engine/kb/database.py`): connection factory with WAL mode, schema initialisation (documents, chunks, chunks_fts, chunks_vec, tags, document_tags, config tables)
- [ ] 2.2 Add `jobs` table to schema: id, filename, status (queued/processing/done/failed/skipped), doc_type, tags_json, error, document_id, chunk_count, created_at, completed_at, staging_path
- [ ] 2.3 Implement job CRUD functions: create_job, get_job, list_jobs, update_job_status
+- [x] 2.1 Implement database module (`engine/kb/database.py`): connection factory with WAL mode, schema initialisation (documents, chunks, chunks_fts, chunks_vec, tags, document_tags, config tables)
+- [x] 2.2 Add `jobs` table to schema: id, filename, status (queued/processing/done/failed/skipped), doc_type, tags_json, error, document_id, chunk_count, created_at, completed_at, staging_path
+- [x] 2.3 Implement job CRUD functions: create_job, get_job, list_jobs, update_job_status

 ## 3. Embeddings and search

- [ ] 3.1 Implement embeddings module (`engine/kb/embeddings.py`): model loading with device resolution (auto/cpu/cuda), embed_texts, get_model_dim — model loaded once and cached in-process
- [ ] 3.2 Implement search module (`engine/kb/search.py`): FTS5 search, vector search via sqlite-vec, RRF merge, filter support (tags, doc_type, fts_only, vec_only, threshold)
+- [x] 3.1 Implement embeddings module (`engine/kb/embeddings.py`): model loading with device resolution (auto/cpu/cuda), embed_texts, get_model_dim — model loaded once and cached in-process
+- [x] 3.2 Implement search module (`engine/kb/search.py`): FTS5 search, vector search via sqlite-vec, RRF merge, filter support (tags, doc_type, fts_only, vec_only, threshold)

 ## 4. Ingestion pipelines

- [ ] 4.1 Implement file type detection (`engine/kb/ingest/detector.py`): extension-based detection for pdf, markdown, code, note
- [ ] 4.2 Implement Docling pipeline (`engine/kb/ingest/docling.py`): PDF/DOCX conversion with AcceleratorOptions device control, hierarchy and fixed chunking
- [ ] 4.3 Implement Markdown pipeline (`engine/kb/ingest/markdown.py`): header-based splitting with min/max token bounds
- [ ] 4.4 Implement code pipeline (`engine/kb/ingest/code.py`): AST-based chunking for Python, regex for Bash/Go, fallback fixed-size
- [ ] 4.5 Implement note pipeline (`engine/kb/ingest/note.py`): whole-text chunking with auto-title
+- [x] 4.1 Implement file type detection (`engine/kb/ingest/detector.py`): extension-based detection for pdf, markdown, code, note
+- [x] 4.2 Implement Docling pipeline (`engine/kb/ingest/docling.py`): PDF/DOCX conversion with AcceleratorOptions device control, hierarchy and fixed chunking
+- [x] 4.3 Implement Markdown pipeline (`engine/kb/ingest/markdown.py`): header-based splitting with min/max token bounds
+- [x] 4.4 Implement code pipeline (`engine/kb/ingest/code.py`): AST-based chunking for Python, regex for Bash/Go, fallback fixed-size
+- [x] 4.5 Implement note pipeline (`engine/kb/ingest/note.py`): whole-text chunking with auto-title

 ## 5. Async job queue and background worker

- [ ] 5.1 Implement staging manager (`engine/kb/staging.py`): write uploaded file/note to staging directory, generate staging path, cleanup after processing
- [ ] 5.2 Implement background worker (`engine/kb/worker.py`): asyncio background task that polls for queued jobs, processes sequentially (detect type → chunk → embed → insert), updates job status on success/failure/skip (duplicate detection)
- [ ] 5.3 Wire worker into FastAPI lifespan: start worker on app startup, graceful shutdown on app stop
+- [x] 5.1 Implement staging manager (`engine/kb/staging.py`): write uploaded file/note to staging directory, generate staging path, cleanup after processing
+- [x] 5.2 Implement background worker (`engine/kb/worker.py`): asyncio background task that polls for queued jobs, processes sequentially (detect type → chunk → embed → insert), updates job status on success/failure/skip (duplicate detection)
+- [x] 5.3 Wire worker into FastAPI lifespan: start worker on app startup, graceful shutdown on app stop

 ## 6. API routes

- [ ] 6.1 Implement health endpoint: `GET /api/v1/health` — returns 503 during startup, 200 when ready
- [ ] 6.2 Implement search endpoint: `POST /api/v1/search` — accepts query, top, tags, doc_type, fts_only, vec_only, threshold in JSON body
- [ ] 6.3 Implement ingestion endpoint: `POST /api/v1/jobs` — accepts multipart file upload or note text field with optional tags/doc_type/title metadata, writes to staging, creates job, returns 202
- [ ] 6.4 Implement job status endpoints: `GET /api/v1/jobs` (list with status filter), `GET /api/v1/jobs/{id}` (details)
- [ ] 6.5 Implement document endpoints: `GET /api/v1/documents` (list with filters), `GET /api/v1/documents/{id}` (details), `DELETE /api/v1/documents/{id}` (remove)
- [ ] 6.6 Implement tag endpoints: `GET /api/v1/tags` (list), `PUT /api/v1/documents/{id}/tags` (add/remove)
- [ ] 6.7 Implement status endpoint: `GET /api/v1/status` — model info, GPU info, DB stats, queue stats
- [ ] 6.8 Implement reindex endpoint: `POST /api/v1/reindex` — re-embed all chunks with current model
- [ ] 6.9 Implement API key authentication middleware: check `KB_API_KEY` env, validate Bearer token, skip when unset
+- [x] 6.1 Implement health endpoint: `GET /api/v1/health` — returns 503 during startup, 200 when ready
+- [x] 6.2 Implement search endpoint: `POST /api/v1/search` — accepts query, top, tags, doc_type, fts_only, vec_only, threshold in JSON body
+- [x] 6.3 Implement ingestion endpoint: `POST /api/v1/jobs` — accepts multipart file upload or note text field with optional tags/doc_type/title metadata, writes to staging, creates job, returns 202
+- [x] 6.4 Implement job status endpoints: `GET /api/v1/jobs` (list with status filter), `GET /api/v1/jobs/{id}` (details)
+- [x] 6.5 Implement document endpoints: `GET /api/v1/documents` (list with filters), `GET /api/v1/documents/{id}` (details), `DELETE /api/v1/documents/{id}` (remove)
+- [x] 6.6 Implement tag endpoints: `GET /api/v1/tags` (list), `PUT /api/v1/documents/{id}/tags` (add/remove)
+- [x] 6.7 Implement status endpoint: `GET /api/v1/status` — model info, GPU info, DB stats, queue stats
+- [x] 6.8 Implement reindex endpoint: `POST /api/v1/reindex` — re-embed all chunks with current model
+- [x] 6.9 Implement API key authentication middleware: check `KB_API_KEY` env, validate Bearer token, skip when unset

 ## 7. Engine configuration

- [ ] 7.1 Implement config module (`engine/kb/config.py`): read all settings from environment variables (KB_DATA_DIR, KB_MODEL, KB_DEVICE, KB_INGEST_DEVICE, KB_API_KEY), apply defaults
+- [x] 7.1 Implement config module (`engine/kb/config.py`): read all settings from environment variables (KB_DATA_DIR, KB_MODEL, KB_DEVICE, KB_INGEST_DEVICE, KB_API_KEY), apply defaults

 ## 8. Docker images

- [ ] 8.1 Create `Dockerfile.nvidia`: CUDA runtime base, system deps (libgl1, libglib2.0, poppler), uv install, onnxruntime-gpu overlay, engine entrypoint
- [ ] 8.2 Create `Dockerfile.rocm`: ROCm/PyTorch base, system deps, uv install, onnxruntime-rocm, engine entrypoint
- [ ] 8.3 Create `compose.nvidia.yaml`: NVIDIA runtime, GPU reservation, bind mount for /data, environment variables, restart policy, port mapping
- [ ] 8.4 Create `compose.rocm.yaml`: ROCm device passthrough (/dev/kfd, /dev/dri), bind mount, environment variables, restart policy, port mapping
- [ ] 8.5 Create `.dockerignore` for engine context
+- [x] 8.1 Create `Dockerfile.nvidia`: CUDA runtime base, system deps (libgl1, libglib2.0, poppler), uv install, onnxruntime-gpu overlay, engine entrypoint
+- [x] 8.2 Create `Dockerfile.rocm`: ROCm/PyTorch base, system deps, uv install, onnxruntime-rocm, engine entrypoint
+- [x] 8.3 Create `compose.nvidia.yaml`: NVIDIA runtime, GPU reservation, bind mount for /data, environment variables, restart policy, port mapping
+- [x] 8.4 Create `compose.rocm.yaml`: ROCm device passthrough (/dev/kfd, /dev/dri), bind mount, environment variables, restart policy, port mapping
+- [x] 8.5 Create `.dockerignore` for engine context

 ## 9. Go client — project setup and config

- [ ] 9.1 Initialise Cobra CLI structure: root command with `--engine`, `--format`, `--api-key` persistent flags
- [ ] 9.2 Implement client config loading: read `~/.kb/client.yaml`, merge with env vars (KB_ENGINE_URL, KB_API_KEY), merge with CLI flags
- [ ] 9.3 Implement HTTP client helper: base URL handling, Bearer token injection, JSON request/response helpers, error formatting for connection failures and HTTP errors
+- [x] 9.1 Initialise Cobra CLI structure: root command with `--engine`, `--format`, `--api-key` persistent flags
+- [x] 9.2 Implement client config loading: read `~/.kb/client.yaml`, merge with env vars (KB_ENGINE_URL, KB_API_KEY), merge with CLI flags
+- [x] 9.3 Implement HTTP client helper: base URL handling, Bearer token injection, JSON request/response helpers, error formatting for connection failures and HTTP errors

 ## 10. Go client — commands

- [ ] 10.1 Implement `kb search <query>` command: POST to /api/v1/search, human and JSON output formatting
- [ ] 10.2 Implement `kb add <path>` command: file discovery (single file, directory with --recursive), multipart upload to /api/v1/jobs, human summary output ("Queued: N files"), JSON output with job IDs
- [ ] 10.3 Implement `kb add --note <text>` command: submit note via multipart to /api/v1/jobs
- [ ] 10.4 Implement `kb jobs` command: list jobs (with --status filter), single job detail via `kb jobs <id>`
- [ ] 10.5 Implement `kb list` command: GET /api/v1/documents with --type and --tags filters
- [ ] 10.6 Implement `kb info <id>` command: GET /api/v1/documents/{id}
- [ ] 10.7 Implement `kb remove <id>` command: confirmation prompt (skip with --yes), DELETE /api/v1/documents/{id}
- [ ] 10.8 Implement `kb tags` command: GET /api/v1/tags
- [ ] 10.9 Implement `kb tag <id>` command: --add and --remove flags, PUT /api/v1/documents/{id}/tags
- [ ] 10.10 Implement `kb status` command: GET /api/v1/status with human formatting
+- [x] 10.1 Implement `kb search <query>` command: POST to /api/v1/search, human and JSON output formatting
+- [x] 10.2 Implement `kb add <path>` command: file discovery (single file, directory with --recursive), multipart upload to /api/v1/jobs, human summary output ("Queued: N files"), JSON output with job IDs
+- [x] 10.3 Implement `kb add --note <text>` command: submit note via multipart to /api/v1/jobs
+- [x] 10.4 Implement `kb jobs` command: list jobs (with --status filter), single job detail via `kb jobs <id>`
+- [x] 10.5 Implement `kb list` command: GET /api/v1/documents with --type and --tags filters
+- [x] 10.6 Implement `kb info <id>` command: GET /api/v1/documents/{id}
+- [x] 10.7 Implement `kb remove <id>` command: confirmation prompt (skip with --yes), DELETE /api/v1/documents/{id}
+- [x] 10.8 Implement `kb tags` command: GET /api/v1/tags
+- [x] 10.9 Implement `kb tag <id>` command: --add and --remove flags, PUT /api/v1/documents/{id}/tags
+- [x] 10.10 Implement `kb status` command: GET /api/v1/status with human formatting

 ## 11. Go client — build and distribution

- [ ] 11.1 Create Makefile or build script: cross-compile for linux/amd64, linux/arm64, darwin/amd64, darwin/arm64, windows/amd64
- [ ] 11.2 Add version injection via `-ldflags` at build time
+- [x] 11.1 Create Makefile or build script: cross-compile for linux/amd64, linux/arm64, darwin/amd64, darwin/arm64, windows/amd64
+- [x] 11.2 Add version injection via `-ldflags` at build time

 ## 12. Integration testing

- [ ] 12.1 Test engine startup: health endpoint transitions from 503 → 200 after model load
- [ ] 12.2 Test full ingestion flow: upload PDF via API → job queued → job completes → document appears in list → chunks searchable
- [ ] 12.3 Test note ingestion: submit note via API → job completes → note searchable
- [ ] 12.4 Test search: hybrid search returns ranked results, filters work, fts_only/vec_only modes work
- [ ] 12.5 Test document management: list, info, remove, tag operations via API
- [ ] 12.6 Test job queue: multiple uploads queue correctly, failures don't block queue, duplicates are skipped
- [ ] 12.7 Test API authentication: requests rejected without key when KB_API_KEY set, accepted with valid key, all requests pass when unset
- [ ] 12.8 Test Docker GPU: `kb doctor`-style verification that GPU is accessible inside container (NVIDIA build)
- [ ] 12.9 Test data portability: copy data directory, start engine on new container, verify all documents and search work
+- [x] 12.1 Test engine startup: health endpoint transitions from 503 → 200 after model load
+- [x] 12.2 Test full ingestion flow: upload PDF via API → job queued → job completes → document appears in list → chunks searchable
+- [x] 12.3 Test note ingestion: submit note via API → job completes → note searchable
+- [x] 12.4 Test search: hybrid search returns ranked results, filters work, fts_only/vec_only modes work
+- [x] 12.5 Test document management: list, info, remove, tag operations via API
+- [x] 12.6 Test job queue: multiple uploads queue correctly, failures don't block queue, duplicates are skipped
+- [x] 12.7 Test API authentication: requests rejected without key when KB_API_KEY set, accepted with valid key, all requests pass when unset
+- [x] 12.8 Test Docker GPU: `kb doctor`-style verification that GPU is accessible inside container (NVIDIA build)
+- [x] 12.9 Test data portability: copy data directory, start engine on new container, verify all documents and search work
@@ -1,2 +0,0 @@
-schema: spec-driven
-created: 2026-03-22
@@ -1,396 +0,0 @@
-## Context
-
-This is a greenfield Python CLI project. No existing codebase, no migration concerns. The tool will live at `~/.kb/` on the user's machine and be installed via `pipx install kb-search`. It must work entirely offline after initial model download.
-
-Primary consumer is Claude Code (or similar LLM tools) via a skill wrapper that calls `kb search` and feeds JSON results to the LLM for synthesis. Secondary consumer is the user directly in a terminal. This dual-consumer constraint means output must be machine-parseable first, human-readable second.
-
-The document corpus is ~3,000 items (2,000 PDFs of varying complexity, 500 markdown/text notes, 500 code snippets) producing ~22,000 chunks. This is small enough that brute-force vector search is viable and SQLite is more than sufficient.
-
-## Goals / Non-Goals
-
-**Goals:**
- Single-command install (`pipx install kb-search`) with `kb init` for model setup
- Ingest heterogeneous documents with format-appropriate chunking
- Hybrid search (keyword + semantic) with a single command
- JSON output contract stable enough for skill integration
- Configurable but works with zero configuration
- All state in one SQLite file for easy backup/portability
-
-**Non-Goals:**
- LLM-based answer synthesis (the calling skill handles this)
- Multi-user or networked access
- Real-time / streaming ingestion
- Web UI or TUI dashboard
- Support for every possible document format (start with PDF, markdown, code, notes)
- Clustering, deduplication, or automatic organisation of documents
-
-## Decisions
-
-### 1. Package Structure
-
-```
-kb-search/
-├── pyproject.toml
-├── src/
-│   └── kb_search/
-│       ├── __init__.py
-│       ├── cli.py              # Click CLI entry point
-│       ├── config.py           # YAML config loading + ENV overrides
-│       ├── database.py         # SQLite schema, migrations, connection
-│       ├── embeddings.py       # Model download, loading, inference
-│       ├── search.py           # Hybrid search + RRF merging
-│       ├── ingest/
-│       │   ├── __init__.py
-│       │   ├── detector.py     # File type detection + routing
-│       │   ├── docling.py      # Docling pipeline (PDF, DOCX, HTML, images)
-│       │   ├── markdown.py     # Header-based markdown splitting
-│       │   ├── code.py         # AST/regex code splitting
-│       │   └── note.py         # Whole-document note handler
-│       └── output.py           # JSON + human-readable formatters
-├── tests/
-└── SKILL.md                    # Claude Code skill definition
-```
-
-**Why this structure:** Flat enough to navigate easily, but the `ingest/` subpackage isolates format-specific logic. Each ingestion module exports the same interface (`ingest(path, config) -> list[Chunk]`), making it easy to add formats later. Using `src/` layout per Python packaging best practices.
-
-### 2. SQLite as Sole Storage Backend
-
-All data lives in `~/.kb/kb.db`:
-
-```sql
-- Documents
-CREATE TABLE documents (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    title TEXT NOT NULL,
-    source_path TEXT,
-    content_hash TEXT NOT NULL,          -- SHA-256 for dedup/change detection
-    doc_type TEXT NOT NULL CHECK(doc_type IN ('pdf','markdown','code','note')),
-    language TEXT,                        -- for code: 'python','bash','go'
-    created_at TEXT DEFAULT (datetime('now')),
-    metadata TEXT DEFAULT '{}'           -- JSON: page_count, author, etc.
-);
-
-- Chunks
-CREATE TABLE chunks (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
-    chunk_index INTEGER NOT NULL,
-    text TEXT NOT NULL,
-    token_count INTEGER,
-    metadata TEXT DEFAULT '{}',          -- JSON: page, section_header, symbol_name
-    created_at TEXT DEFAULT (datetime('now'))
-);
-
-- FTS5 index (content-sync with chunks table)
-CREATE VIRTUAL TABLE chunks_fts USING fts5(
-    text,
-    content='chunks',
-    content_rowid='id',
-    tokenize='porter unicode61'
-);
-
-- Triggers to keep FTS in sync
-CREATE TRIGGER chunks_ai AFTER INSERT ON chunks BEGIN
-    INSERT INTO chunks_fts(rowid, text) VALUES (new.id, new.text);
-END;
-CREATE TRIGGER chunks_ad AFTER DELETE ON chunks BEGIN
-    INSERT INTO chunks_fts(chunks_fts, rowid, text) VALUES('delete', old.id, old.text);
-END;
-CREATE TRIGGER chunks_au AFTER UPDATE ON chunks BEGIN
-    INSERT INTO chunks_fts(chunks_fts, rowid, text) VALUES('delete', old.id, old.text);
-    INSERT INTO chunks_fts(rowid, text) VALUES (new.id, new.text);
-END;
-
-- Vector storage (sqlite-vec)
-CREATE VIRTUAL TABLE chunks_vec USING vec0(
-    chunk_id INTEGER PRIMARY KEY,
-    embedding FLOAT[384]                 -- dimension matches model
-);
-
-- Tags
-CREATE TABLE tags (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    name TEXT UNIQUE NOT NULL
-);
-
-CREATE TABLE document_tags (
-    document_id INTEGER REFERENCES documents(id) ON DELETE CASCADE,
-    tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
-    PRIMARY KEY (document_id, tag_id)
-);
-
-- Config stored in DB (model binding)
-CREATE TABLE config (
-    key TEXT PRIMARY KEY,
-    value TEXT NOT NULL
-);
-- Keys: schema_version, model_name, embedding_dim, model_max_tokens
-```
-
-**Why SQLite for everything:** At ~22,000 chunks, SQLite handles FTS, vector search, and relational data without breaking a sweat. One file = trivial backup (`cp kb.db kb.db.bak`), no server process, no port conflicts. FTS5 is built into SQLite. sqlite-vec is a single loadable extension.
-
-**Why store config in DB _and_ YAML:** The YAML file holds user preferences (chunking params, model choice). The DB `config` table records what the DB was _actually built with_ (model name, dimension). This separation lets us detect mismatches: "config says use nomic-embed-text but DB was built with all-MiniLM-L6-v2."
-
-**Alternatives considered:**
- ChromaDB/Qdrant: External services, overkill for this scale, breaks single-file story
- DuckDB: Good at analytics, but FTS support is weaker than SQLite FTS5
- LanceDB: Interesting but less mature, no FTS built in
-
-### 3. Docling for Complex Document Ingestion
-
-Docling handles PDF, DOCX, HTML, and image files through a unified pipeline with ML-based layout detection and table reconstruction.
-
-**Why Docling over simpler extractors:** The 2,000 PDFs are "many and varied" — simple text extraction (pymupdf, pdfplumber) works for clean PDFs but silently produces garbage for complex layouts, tables, or multi-column documents. Docling's layout model correctly identifies structural elements, and its table reconstruction preserves data that would otherwise be lost. The quality difference matters because bad chunks → bad search results → useless tool.
-
-**Docling configuration for this project:**
- Use `pypdfium2` backend (default, fast for text-based PDFs)
- Enable OCR only when needed (detect pages with no extractable text)
- Use hierarchy-aware chunking (respects section/paragraph boundaries)
- Disable image extraction (we're indexing text, not images)
- Run with multiple workers for batch ingestion
-
-**Model download:** Docling models (~1.5 GB) download on first use or via `kb init`. Stored in `~/.kb/models/docling/` or HuggingFace's default cache.
-
-**Alternatives considered:**
- pymupdf4llm: Fast, lightweight, but poor table/layout handling
- Unstructured: Heavier than Docling, commercial focus, less predictable output
- LlamaParse: Cloud-only, violates local-first constraint
-
-### 4. Per-Type Chunking Strategy
-
-Each document type gets a purpose-built chunker with configurable parameters:
-
-**PDF (Docling):** Hierarchy-aware chunking. Docling's `HierarchicalChunker` splits at section/paragraph boundaries respecting the document's logical structure. Falls back to fixed-size if hierarchy detection fails.
-
-**Markdown:** Header-based splitting. Split at `##` and `###` boundaries. Preserve parent header chain as context (so a chunk under "## Config > ### Advanced" carries that path). Merge small sections (< `min_tokens`) with their neighbor. Configurable: `min_tokens`, `max_tokens`.
-
-**Code (Python):** Use stdlib `ast` module. Each function and class becomes a chunk. Class methods include the class docstring for context. Top-level code between definitions becomes its own chunk.
-
-**Code (Bash):** Regex-based. Split on `function name() {` and `name() {` patterns with brace-depth counting. Comment blocks preceding a function attach to that function's chunk. Fall back to fixed-size windowed chunks if no functions detected.
-
-**Code (Go):** Regex-based. Split on `func ` declarations. Type definitions with methods are grouped. Fall back to fixed-size if no recognisable boundaries.
-
-**Notes:** Whole document = one chunk. Notes are small by definition.
-
-**Configurable defaults (in `~/.kb/config.yaml`):**
-```yaml
-chunking:
-  defaults:
-    max_tokens: 512
-    overlap_tokens: 50
-  pdf:
-    strategy: hierarchy    # hierarchy | fixed
-    max_tokens: 1024       # for fixed strategy fallback
-  markdown:
-    strategy: header       # header | fixed
-    min_tokens: 50         # merge sections smaller than this
-    max_tokens: 1024
-  code:
-    strategy: ast          # ast | fixed
-    include_context: true  # include class/module docstring with methods
-    max_tokens: 1024
-  note:
-    strategy: whole
-```
-
-### 5. Embedding Model Management
-
-**Default model:** `all-MiniLM-L6-v2` (384 dimensions, 90 MB, good quality/speed tradeoff for CPU).
-
-**Model loading:** Use `sentence-transformers` library which provides a unified API across models. Models stored in HuggingFace's default cache (`~/.cache/huggingface/`), shared with other tools that use HF models. No custom cache directory override.
-
-**Model binding:** On `kb init`, the chosen model's name and dimension are written to the DB `config` table. Every subsequent `kb add` checks the loaded model matches the DB. Mismatch = hard error with clear message.
-
-**Model switching (`kb reindex`):**
-1. Download new model
-2. Read all chunks from DB
-3. Re-embed in batches (with progress bar)
-4. Replace all vectors in `chunks_vec`
-5. Update DB config (model_name, embedding_dim)
-6. Recreate `chunks_vec` table if dimension changed
-
-**ONNX Runtime for inference:** Use `sentence-transformers` with ONNX backend (`model = SentenceTransformer(model_name, backend="onnx")`). This gives us sentence-transformers' correct tokenization/pooling/normalization while using ONNX Runtime (~30 MB) instead of PyTorch (~200 MB) for inference. Models are automatically exported to ONNX format on first load. This keeps the install lightweight without sacrificing the convenience of the sentence-transformers API.
-
-**Model compatibility:** All models on HuggingFace that work with `sentence-transformers` are supported. The only per-model differences handled in code:
- Dimension (read from model config)
- Max sequence length (read from model config, used to cap chunk size)
- Query/passage prefixes (configurable in YAML, empty by default)
-
-```yaml
-embedding:
-  model: all-MiniLM-L6-v2
-  query_prefix: ""          # some models need "search_query: "
-  passage_prefix: ""        # some models need "search_document: "
-```
-
-### 6. Hybrid Search with Reciprocal Rank Fusion
-
-**Search flow:**
-
-```
-Query: "how to install git"
-         │
-         ├──▶ FTS5 query ──▶ BM25-ranked results (chunk_id, fts_score)
-         │
-         └──▶ Embed query ──▶ vec similarity search ──▶ (chunk_id, vec_score)
-                              (cosine distance, top-K)
-         │
-         ▼
-    Reciprocal Rank Fusion (RRF)
-    score(d) = Σ 1/(k + rank_in_list)  where k=60 (standard)
-         │
-         ▼
-    Merged results, sorted by RRF score
-         │
-         ▼
-    Apply filters (tags, doc_type) ──▶ Top-N results
-```
-
-**Why RRF over learned re-ranking:** RRF is simple, parameter-free (k=60 is standard), and performs surprisingly well. A learned re-ranker (e.g., cross-encoder) would add another model download, slow down queries, and the marginal quality improvement isn't worth it at this scale. RRF can be swapped out later if needed.
-
-**FTS5 query construction:** Pass the raw query string to FTS5. FTS5's porter stemmer handles basic normalisation. For queries with special characters, escape them. No query expansion or synonym handling — keep it simple.
-
-**Vector search:** Embed the query with the same model used for chunks. Retrieve top-K (K = 3× requested results, to give RRF enough candidates). sqlite-vec does brute-force cosine similarity over all vectors — at 22K vectors this is ~2-5ms.
-
-**Filter application:** Tag and type filters are applied as SQL WHERE clauses _before_ search where possible (for FTS5 via JOIN), or as post-filters on the merged results. This is a design choice per filter type:
- Type filter: Applied in the SQL query (efficient)
- Tag filter: Applied in the SQL query via JOIN (efficient)
- Score threshold: Applied post-RRF as a cutoff
-
-### 7. Output Format (Skill Contract)
-
-**JSON output (`--format json`, default):**
-
-```json
-{
-  "query": "how to install git",
-  "results": [
-    {
-      "chunk_id": 1423,
-      "score": 0.87,
-      "score_breakdown": {"fts": 0.72, "vector": 0.94},
-      "text": "To install the latest version of git from source...",
-      "source": {
-        "document_id": 42,
-        "title": "Git Admin Guide",
-        "path": "/home/user/docs/git-admin.pdf",
-        "type": "pdf",
-        "page": 12,
-        "chunk_index": 3,
-        "total_chunks": 28,
-        "tags": ["git", "admin"]
-      }
-    }
-  ],
-  "total_matches": 47,
-  "returned": 10
-}
-```
-
-**Human output (`--format human`):**
-
-```
-Search: "how to install git" (47 matches, showing top 10)
-
- 1. [0.87] Git Admin Guide (p.12)                    [pdf] [git, admin]
-    To install the latest version of git from source...
-
- 2. [0.65] setup-notes.md §Installation               [markdown] [git]
-    First, add the PPA repository for the latest git...
-```
-
-**Stability commitment:** The JSON schema is the contract with the skill. Fields may be _added_ but not removed or renamed once the skill is built.
-
-### 8. Configuration Architecture
-
-```
-Precedence (highest to lowest):
-  1. CLI flags (--top, --tags, --format)
-  2. Environment variables (KB_MODEL, KB_DATA_DIR, KB_DEFAULT_TOP)
-  3. ~/.kb/config.yaml
-  4. Built-in defaults
-
-ENV variable naming: KB_ prefix + UPPER_SNAKE_CASE
-  KB_DATA_DIR     → ~/.kb/
-  KB_MODEL        → all-MiniLM-L6-v2
-  KB_DEFAULT_TOP  → 10
-```
-
-**Full default config.yaml:**
-
-```yaml
-# ~/.kb/config.yaml
-
-data_dir: ~/.kb
-
-embedding:
-  model: all-MiniLM-L6-v2
-  query_prefix: ""
-  passage_prefix: ""
-
-search:
-  default_top: 10
-  default_format: json
-  rrf_k: 60
-
-chunking:
-  defaults:
-    max_tokens: 512
-    overlap_tokens: 50
-  pdf:
-    strategy: hierarchy
-    max_tokens: 1024
-  markdown:
-    strategy: header
-    min_tokens: 50
-    max_tokens: 1024
-  code:
-    strategy: ast
-    include_context: true
-    max_tokens: 1024
-  note:
-    strategy: whole
-
-ingestion:
-  workers: 4                 # parallel Docling workers
-  batch_size: 50             # commit to DB every N documents
-  enable_ocr: auto           # auto | always | never
-```
-
-### 9. CLI Framework: Click
-
-**Why Click:** Mature, well-documented, supports nested command groups, automatic `--help` generation, parameter validation, and progress bars (via `click.progressbar`). The alternative (Typer) adds type-hint magic but less control. argparse is too verbose for this many commands.
-
-### 10. Error Handling and Resumability
-
-**Batch ingestion must be resumable.** When adding a directory of 2,000 PDFs:
- Each document is processed independently
- On success: document + chunks inserted in a single transaction
- On failure: error logged, document skipped, processing continues
- `content_hash` (SHA-256 of file contents) enables skip-if-already-indexed
- Progress shown via `click.progressbar` or `rich.progress`
- Summary at end: "Added 1,847 documents. 12 failed. 141 skipped (already indexed)."
-
-Failed documents are logged to `~/.kb/ingest-errors.log` with the file path and error for later investigation.
-
-## Risks / Trade-offs
-
-**[Docling model size] → Mitigation:** ~1.5 GB download on first init. Clear progress indication during download. Models cached permanently in `~/.kb/models/`. Document this in `kb init` output and SKILL.md.
-
-**[Docling ingestion speed on CPU] → Mitigation:** ~17 hours for 2,000 PDFs on CPU. Support parallel workers (configurable). Show per-document progress. Resumable by design (skip already-indexed). Suggest GPU if available. This is a one-time cost.
-
-**[ONNX model export on first load] → Mitigation:** First time a model is loaded, sentence-transformers exports it to ONNX format. This takes 10-30 seconds and is cached for subsequent runs. Users see a one-time delay on first `kb add` or `kb search` after init. Show a clear message: "Optimising model for ONNX inference (one-time)..."
-
-**[sqlite-vec maturity] → Mitigation:** sqlite-vec is relatively new. At 22K vectors, brute-force search means we're not relying on its ANN indexing. If sqlite-vec has issues, swapping to numpy cosine similarity over a stored blob column is straightforward — same DB, different query path.
-
-**[FTS5 trigger sync] → Mitigation:** FTS5 content-sync triggers add write overhead. At our scale (inserts during ingestion, not real-time) this is negligible. If it becomes an issue, switch to manual sync with `INSERT INTO chunks_fts(chunks_fts) VALUES('rebuild')` after batch operations.
-
-**[Model lock-in] → Mitigation:** Changing embedding models requires full reindex (~22K embeddings, ~10-30 minutes on CPU). `kb reindex` with progress bar makes this manageable. Model name stored in DB prevents silent mixing.
-
-## Resolved Questions
-
-1. **ONNX for inference from day one.** Use sentence-transformers with ONNX backend. Smaller install (~30 MB vs ~200 MB for PyTorch), faster CPU inference. No PyTorch dependency.
-
-2. **HuggingFace default cache for models.** Both embedding and Docling models use `~/.cache/huggingface/`. Shared with other HF tools — no duplicate downloads if the user already has models cached.
-
-3. **Manual schema migrations.** Version number in `config` table. `database.py` checks version on open and runs ALTER TABLE scripts sequentially. Simple enough for this project's schema complexity.
@@ -1,39 +0,0 @@
-## Why
-
-There is no simple, local-first CLI tool for building a personal knowledge base across heterogeneous document types (PDFs, markdown, code snippets, text notes) with hybrid search that combines keyword matching and semantic understanding. Existing tools either require cloud services, lack semantic search, or can't handle the variety of document formats. This tool fills the gap — a retrieval engine that can be used standalone from the terminal or wrapped as an AI skill (e.g. Claude Code) where the LLM layer provides natural language synthesis over retrieved results.
-
-## What Changes
-
- New Python CLI tool (`kb`) distributed via pipx (PyPI package: `kb-search`)
- Ingestion pipeline with per-format handling:
-  - **PDFs/DOCX/HTML/images**: Docling (layout-aware, table reconstruction, optional OCR)
-  - **Markdown/text**: Header-based semantic splitting
-  - **Code (Python, Bash, Go)**: AST/regex-based splitting at function/class boundaries
-  - **Notes**: Inline text stored as whole-document chunks
- Hybrid search combining SQLite FTS5 (BM25 keyword scoring) and sqlite-vec (vector similarity), merged via Reciprocal Rank Fusion
- Local embedding models downloaded from HuggingFace on first run (`kb init`), with multi-model support and full reindex capability when switching models
- Document tagging system for manual categorisation and filtered search
- Structured JSON output designed for LLM skill consumption, plus human-readable terminal output
- Configurable chunking parameters per document type with sensible defaults
- All state in a single SQLite database (`~/.kb/kb.db`)
- Configuration via YAML (`~/.kb/config.yaml`) with ENV variable overrides
-
-## Capabilities
-
-### New Capabilities
- `document-ingestion`: Ingest PDFs, markdown, code, and text notes into chunked, embedded, searchable storage. Handles format detection, per-type chunking strategies, Docling pipeline for complex documents, and resumable batch imports.
- `hybrid-search`: Hybrid retrieval combining FTS5 full-text search and sqlite-vec vector similarity via Reciprocal Rank Fusion. Supports tag/type filtering, configurable result counts, score thresholds, and JSON/human output formats.
- `embedding-management`: Local embedding model lifecycle — download on init, bind model to database, detect mismatches, and full re-embedding via reindex when switching models.
- `document-management`: CRUD operations on the document store — list, inspect, remove documents. Tag management (add/remove tags, filter by tags, list tags with counts).
- `configuration`: TOML-based configuration with per-document-type chunking parameters, model selection, and ENV variable overrides. Sensible defaults that work without any config file.
- `skill-interface`: Structured JSON output contract designed for LLM skill consumption — chunks with scores, source metadata, and provenance for citation.
-
-### Modified Capabilities
-_(none — greenfield project)_
-
-## Impact
-
- **Dependencies**: Docling (~1.5 GB models), sentence-transformers with ONNX Runtime backend, sqlite-vec, Click
- **Storage**: ~/.kb/ directory containing SQLite database, config file, and downloaded models (~1.6 GB on init, database grows with content)
- **First-run experience**: `kb init` required before use to download models. Batch ingestion of 2,000 PDFs estimated at ~17 hours CPU / ~3 hours GPU (one-time cost, resumable)
- **External integration**: Designed to be wrapped as a Claude Code skill — the skill definition (SKILL.md) is a deliverable alongside the code
@@ -1,72 +0,0 @@
-## ADDED Requirements
-
-### Requirement: YAML configuration file
-The system SHALL read configuration from `~/.kb/config.yaml`. If the file does not exist, the system SHALL use built-in defaults. The configuration file SHALL be optional — the tool MUST work with zero configuration.
-
-#### Scenario: No config file
- **WHEN** `~/.kb/config.yaml` does not exist
- **THEN** the system uses built-in defaults for all settings and operates normally
-
-#### Scenario: Partial config file
- **WHEN** `~/.kb/config.yaml` exists but only specifies `chunking.pdf.max_tokens: 2048`
- **THEN** the system uses built-in defaults for all other settings, overriding only `chunking.pdf.max_tokens`
-
-#### Scenario: Invalid config file
- **WHEN** `~/.kb/config.yaml` contains invalid YAML
- **THEN** the system prints a clear error message identifying the YAML syntax issue and exits with non-zero status
-
-### Requirement: Environment variable overrides
-The system SHALL support environment variable overrides with the prefix `KB_`. ENV variables SHALL take precedence over the YAML config file. Supported variables: `KB_DATA_DIR`, `KB_MODEL`, `KB_DEFAULT_TOP`, `KB_DEFAULT_FORMAT`.
-
-#### Scenario: Override data directory
- **WHEN** `KB_DATA_DIR=/tmp/test-kb` is set
- **THEN** the system uses `/tmp/test-kb/` instead of `~/.kb/` for the database and config
-
-#### Scenario: Override model
- **WHEN** `KB_MODEL=nomic-embed-text` is set
- **THEN** the system uses `nomic-embed-text` as the embedding model, overriding the YAML config
-
-#### Scenario: ENV overrides YAML
- **WHEN** YAML config has `search.default_top: 10` and `KB_DEFAULT_TOP=20` is set
- **THEN** the default top value is 20
-
-### Requirement: Configuration precedence
-The system SHALL apply configuration in this order (highest to lowest precedence): CLI flags, environment variables, YAML config file, built-in defaults.
-
-#### Scenario: CLI flag overrides everything
- **WHEN** YAML config has `search.default_top: 10`, ENV has `KB_DEFAULT_TOP=20`, and user runs `kb search "test" --top 5`
- **THEN** 5 results are returned
-
-### Requirement: View and set configuration
-The system SHALL support viewing the current effective configuration via `kb config` and setting individual values via `kb config set <key> <value>`.
-
-#### Scenario: View configuration
- **WHEN** user runs `kb config`
- **THEN** the system displays the fully resolved configuration (defaults merged with YAML merged with ENV), indicating the source of each value
-
-#### Scenario: Set a config value
- **WHEN** user runs `kb config set chunking.pdf.max_tokens 2048`
- **THEN** the value is written to `~/.kb/config.yaml`, creating the file if necessary
-
-### Requirement: Configurable chunking parameters
-The system SHALL support per-document-type chunking configuration with sensible defaults.
-
-#### Scenario: Default chunking for PDF
- **WHEN** no chunking config is specified for PDF
- **THEN** the system uses `strategy: hierarchy, max_tokens: 1024`
-
-#### Scenario: Default chunking for markdown
- **WHEN** no chunking config is specified for markdown
- **THEN** the system uses `strategy: header, min_tokens: 50, max_tokens: 1024`
-
-#### Scenario: Default chunking for code
- **WHEN** no chunking config is specified for code
- **THEN** the system uses `strategy: ast, include_context: true, max_tokens: 1024`
-
-#### Scenario: Default chunking for notes
- **WHEN** no chunking config is specified for notes
- **THEN** the system uses `strategy: whole`
-
-#### Scenario: Custom chunking overrides
- **WHEN** YAML config specifies `chunking.pdf.strategy: fixed` and `chunking.pdf.max_tokens: 512`
- **THEN** PDFs are chunked with fixed-size windows of 512 tokens instead of hierarchy-aware chunking
@@ -1,125 +0,0 @@
-## ADDED Requirements
-
-### Requirement: File type detection and routing
-The system SHALL detect the type of a file being ingested and route it to the appropriate ingestion pipeline. Detection SHALL be based on file extension. Supported types: PDF (`.pdf`), DOCX (`.docx`), HTML (`.html`, `.htm`), Markdown (`.md`, `.markdown`, `.txt`), Code (`.py`, `.sh`, `.bash`, `.go`), and image files (`.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.webp`). The user MAY override detection with `--type` and `--language` flags.
-
-#### Scenario: Auto-detect PDF file
- **WHEN** user runs `kb add report.pdf`
- **THEN** the file is routed to the Docling ingestion pipeline
-
-#### Scenario: Auto-detect Python code
- **WHEN** user runs `kb add script.py`
- **THEN** the file is routed to the code ingestion pipeline with language set to `python`
-
-#### Scenario: Override type detection
- **WHEN** user runs `kb add data.txt --type code --language bash`
- **THEN** the file is routed to the code pipeline as Bash, regardless of the `.txt` extension
-
-#### Scenario: Unsupported file type
- **WHEN** user runs `kb add archive.zip`
- **THEN** the system SHALL print an error message listing supported formats and exit with non-zero status
-
-### Requirement: Docling pipeline for complex documents
-The system SHALL use Docling to ingest PDF, DOCX, HTML, and image files. The pipeline SHALL use the `pypdfium2` backend for PDFs, enable layout model for structural detection, and enable table reconstruction. OCR SHALL be configurable: `auto` (detect pages with no extractable text and OCR those), `always`, or `never`.
-
-#### Scenario: Ingest a text-based PDF
- **WHEN** user runs `kb add manual.pdf`
- **THEN** the system extracts text using Docling with layout detection, produces hierarchy-aware chunks preserving section structure, embeds each chunk, and stores the document with all chunks in the database
-
-#### Scenario: Ingest a PDF with tables
- **WHEN** user ingests a PDF containing data tables
- **THEN** Docling's table reconstruction SHALL produce chunks where table content is represented as structured text (markdown table format) rather than garbled column fragments
-
-#### Scenario: Ingest a scanned PDF with OCR auto mode
- **WHEN** user ingests a PDF where some pages contain only images (no extractable text) and OCR is set to `auto`
- **THEN** the system SHALL detect the imageless pages and apply OCR to those pages only, leaving text-extractable pages processed normally
-
-#### Scenario: Ingest an image file
- **WHEN** user runs `kb add diagram.png`
- **THEN** the system SHALL process it through Docling with OCR enabled, extracting any text content from the image
-
-### Requirement: Markdown ingestion with header-based splitting
-The system SHALL split markdown and text files at header boundaries (`##`, `###`). Each chunk SHALL include its parent header chain as context. Sections smaller than `min_tokens` SHALL be merged with the following section. Sections larger than `max_tokens` SHALL be split at paragraph boundaries with configurable overlap.
-
-#### Scenario: Split markdown at headers
- **WHEN** user runs `kb add guide.md` and the file contains multiple `##` sections
- **THEN** each section becomes a separate chunk, with the header text included in the chunk
-
-#### Scenario: Preserve header hierarchy
- **WHEN** a markdown file has nested headers like `## Config` > `### Advanced Options`
- **THEN** the chunk for "Advanced Options" SHALL include context indicating it falls under "Config > Advanced Options"
-
-#### Scenario: Merge small sections
- **WHEN** a markdown section contains fewer tokens than `min_tokens` (default: 50)
- **THEN** it SHALL be merged with the next section into a single chunk
-
-#### Scenario: Plain text file without headers
- **WHEN** user runs `kb add notes.txt` and the file has no markdown headers
- **THEN** the system SHALL fall back to fixed-size chunking with configurable `max_tokens` and `overlap_tokens`
-
-### Requirement: Code ingestion with AST/regex splitting
-The system SHALL split code files at function and class boundaries. Python files SHALL use the `ast` module. Bash and Go files SHALL use regex-based splitting. When `include_context` is enabled (default), class methods SHALL include the class docstring/signature for context. Files with no recognisable function/class boundaries SHALL fall back to fixed-size chunking.
-
-#### Scenario: Python file with functions and classes
- **WHEN** user runs `kb add auth.py` and the file contains a class with methods
- **THEN** each method becomes a chunk, and each chunk includes the class name and docstring as context
-
-#### Scenario: Bash script with functions
- **WHEN** user runs `kb add deploy.sh` and the file contains `function deploy() {` blocks
- **THEN** each function becomes a separate chunk, including any preceding comment block
-
-#### Scenario: Go file with functions
- **WHEN** user runs `kb add main.go` and the file contains `func` declarations
- **THEN** each function becomes a separate chunk
-
-#### Scenario: Code file with no functions
- **WHEN** user runs `kb add script.sh` and the file has no function declarations
- **THEN** the system SHALL fall back to fixed-size chunking with `max_tokens` and `overlap_tokens`
-
-### Requirement: Inline note ingestion
-The system SHALL support adding text notes directly from the command line via `kb add --note "text"`. Notes SHALL be stored as a single chunk (no splitting). Notes MAY have an optional `--title` for display purposes.
-
-#### Scenario: Add an inline note
- **WHEN** user runs `kb add --note "Always restart nginx after config changes" --title "nginx reminder"`
- **THEN** a document of type `note` is created with the title "nginx reminder", and the full text becomes a single chunk
-
-#### Scenario: Add a note without title
- **WHEN** user runs `kb add --note "some text"`
- **THEN** the system SHALL use the first 80 characters of the text (truncated at a word boundary) as the title
-
-### Requirement: Deduplication via content hash
-The system SHALL compute a SHA-256 hash of each file's contents before ingestion. If a document with the same `content_hash` already exists in the database, the file SHALL be skipped with a message indicating it is already indexed.
-
-#### Scenario: Add a file that is already indexed
- **WHEN** user runs `kb add report.pdf` and the file's SHA-256 matches an existing document
- **THEN** the system SHALL print "Skipped: report.pdf (already indexed)" and not create a duplicate
-
-#### Scenario: Add a modified version of an existing file
- **WHEN** user runs `kb add report.pdf` and the file has changed since last indexed (different hash)
- **THEN** the system SHALL ingest it as a new document (the old version remains unless manually removed)
-
-### Requirement: Batch ingestion with progress and resumability
-The system SHALL support ingesting entire directories via `kb add <dir> --recursive`. Processing SHALL be resumable — files already indexed (by content hash) are skipped. Failed files SHALL be logged and skipped without aborting the batch. A summary SHALL be displayed at completion.
-
-#### Scenario: Ingest a directory
- **WHEN** user runs `kb add ~/docs/ --recursive`
- **THEN** the system recursively finds all supported files, processes each one, skips duplicates, logs failures, and displays a summary: "Added X documents. Y failed. Z skipped (already indexed)."
-
-#### Scenario: Resume after interruption
- **WHEN** a batch ingestion is interrupted (Ctrl+C, crash) and user reruns the same command
- **THEN** already-indexed files are skipped via content hash, and processing continues with remaining files
-
-#### Scenario: Failed file during batch
- **WHEN** a single file fails to process (corrupt PDF, encoding error)
- **THEN** the error is logged to `~/.kb/ingest-errors.log` with the file path and error message, and processing continues with the next file
-
-### Requirement: Parallel ingestion workers
-The system SHALL support parallel document processing via configurable worker count (default: 4). Docling's `DocumentConverter` SHALL be used with multiple workers for PDF/DOCX/HTML ingestion. Database writes SHALL be serialised to avoid SQLite locking issues.
-
-#### Scenario: Parallel PDF ingestion
- **WHEN** user runs `kb add ~/pdfs/ --recursive` with `workers: 4` in config
- **THEN** up to 4 documents are processed concurrently through Docling, with chunks written to the database sequentially
-
-#### Scenario: Override worker count
- **WHEN** user runs `kb add ~/pdfs/ --recursive --workers 1`
- **THEN** documents are processed sequentially with a single worker
@@ -1,80 +0,0 @@
-## ADDED Requirements
-
-### Requirement: List documents
-The system SHALL list all indexed documents via `kb list`. Results SHALL include document ID, title, type, tag count, chunk count, and creation date. Output SHALL support `--format json` and `--format human`.
-
-#### Scenario: List all documents
- **WHEN** user runs `kb list`
- **THEN** all documents are listed with their ID, title, type, tags, chunk count, and creation date
-
-#### Scenario: Filter by type
- **WHEN** user runs `kb list --type pdf`
- **THEN** only PDF documents are listed
-
-#### Scenario: Filter by tags
- **WHEN** user runs `kb list --tags admin,ops`
- **THEN** only documents tagged with BOTH "admin" AND "ops" are listed
-
-#### Scenario: Empty database
- **WHEN** user runs `kb list` with no documents indexed
- **THEN** the system prints "No documents indexed. Run `kb add` to get started." and exits with zero status
-
-### Requirement: Document info
-The system SHALL display detailed information about a single document via `kb info <doc_id>`, including all metadata, tags, chunk count, and chunk previews (first 100 characters of each chunk).
-
-#### Scenario: View document info
- **WHEN** user runs `kb info 42`
- **THEN** the system displays: title, source path, type, language (if code), content hash, creation date, tags, total chunks, and a preview of each chunk
-
-#### Scenario: Invalid document ID
- **WHEN** user runs `kb info 9999` and no document with ID 9999 exists
- **THEN** the system prints "Document not found: 9999" and exits with non-zero status
-
-### Requirement: Remove document
-The system SHALL remove a document and all its associated chunks, embeddings, and tag associations via `kb remove <doc_id>`. The system SHALL ask for confirmation before deletion unless `--yes` is passed.
-
-#### Scenario: Remove with confirmation
- **WHEN** user runs `kb remove 42`
- **THEN** the system displays the document title and asks "Remove 'Git Admin Guide' and its 28 chunks? [y/N]". On confirmation, the document, its chunks, FTS entries, vector embeddings, and tag associations are deleted.
-
-#### Scenario: Remove with --yes flag
- **WHEN** user runs `kb remove 42 --yes`
- **THEN** the document is removed without confirmation prompt
-
-#### Scenario: Cascading delete
- **WHEN** a document is removed
- **THEN** all rows in `chunks`, `chunks_fts`, `chunks_vec`, and `document_tags` referencing that document SHALL be deleted
-
-### Requirement: Tag management
-The system SHALL support adding and removing tags on documents via `kb tag <doc_id> --add tag1,tag2` and `kb tag <doc_id> --remove tag1`. Tags are case-insensitive and stored lowercase. The system SHALL list all tags with document counts via `kb tags`.
-
-#### Scenario: Add tags to a document
- **WHEN** user runs `kb tag 42 --add git,admin`
- **THEN** the tags "git" and "admin" are associated with document 42. Tags are created if they don't exist.
-
-#### Scenario: Remove a tag from a document
- **WHEN** user runs `kb tag 42 --remove admin`
- **THEN** the "admin" tag association is removed from document 42. The tag itself remains in the tags table if other documents use it.
-
-#### Scenario: List all tags
- **WHEN** user runs `kb tags`
- **THEN** the system lists all tags with the count of documents using each tag, sorted by count descending
-
-#### Scenario: Tag on ingestion
- **WHEN** user runs `kb add report.pdf --tags compliance,q1`
- **THEN** the document is ingested and immediately tagged with "compliance" and "q1"
-
-#### Scenario: Tags in JSON format
- **WHEN** user runs `kb tags --format json`
- **THEN** output is a JSON array of objects: `[{"name": "git", "count": 15}, ...]`
-
-### Requirement: Database status
-The system SHALL report database statistics via `kb status`, including: total documents (by type), total chunks, database file size, active model name and dimension, and schema version.
-
-#### Scenario: Show status
- **WHEN** user runs `kb status`
- **THEN** the system displays: document counts by type, total chunks, DB file size, model name, embedding dimension, and schema version
-
-#### Scenario: Status before init
- **WHEN** user runs `kb status` before `kb init`
- **THEN** the system prints "Knowledge base not initialised. Run `kb init` first." and exits with non-zero status
@@ -1,57 +0,0 @@
-## ADDED Requirements
-
-### Requirement: Model initialisation
-The system SHALL download the embedding model on `kb init`. The default model SHALL be `all-MiniLM-L6-v2`. The user MAY specify a different model via `kb init --model <name>`. The model SHALL be downloaded via sentence-transformers to the HuggingFace default cache (`~/.cache/huggingface/`). On first load, the model SHALL be exported to ONNX format for inference.
-
-#### Scenario: Default init
- **WHEN** user runs `kb init`
- **THEN** the system downloads `all-MiniLM-L6-v2`, creates `~/.kb/kb.db` with the schema, and records `model_name=all-MiniLM-L6-v2` and `embedding_dim=384` in the DB config table
-
-#### Scenario: Init with custom model
- **WHEN** user runs `kb init --model nomic-embed-text`
- **THEN** the system downloads `nomic-embed-text`, creates the database, and records the model name and its dimension in the DB config table
-
-#### Scenario: Init status check
- **WHEN** user runs `kb init --status`
- **THEN** the system reports: whether `~/.kb/` exists, whether the DB is initialised, which model is configured, whether the model is downloaded, and Docling model status
-
-#### Scenario: ONNX export on first load
- **WHEN** the embedding model is loaded for the first time after download
- **THEN** the system SHALL display "Optimising model for ONNX inference (one-time)..." and export the model to ONNX format. Subsequent loads SHALL use the cached ONNX export.
-
-### Requirement: Model-database binding
-The system SHALL store the active model name and embedding dimension in the database `config` table. Every operation that uses the embedding model (add, search, reindex) SHALL verify that the loaded model matches the DB record. A mismatch SHALL be a hard error.
-
-#### Scenario: Model mismatch on add
- **WHEN** user runs `kb add doc.pdf` but the config YAML specifies a different model than what the DB was initialised with
- **THEN** the system SHALL print an error: "Model mismatch: DB uses 'all-MiniLM-L6-v2' (384 dim) but config specifies 'nomic-embed-text'. Run `kb reindex --model nomic-embed-text` to switch models." and exit with non-zero status
-
-#### Scenario: Model match on add
- **WHEN** user runs `kb add doc.pdf` and the config model matches the DB model
- **THEN** ingestion proceeds normally
-
-### Requirement: Full reindex with model switching
-The system SHALL support re-embedding all chunks via `kb reindex`. If `--model` is specified, the system SHALL download the new model, re-embed all chunks, replace all vectors, and update the DB config. A progress bar SHALL be displayed. The operation SHALL be atomic — if interrupted, the old embeddings remain intact.
-
-#### Scenario: Reindex with same model
- **WHEN** user runs `kb reindex`
- **THEN** all chunks are re-embedded with the current model and vectors are replaced. Useful if the model's ONNX export was corrupted or chunks were modified.
-
-#### Scenario: Reindex with new model
- **WHEN** user runs `kb reindex --model bge-small-en-v1.5`
- **THEN** the system downloads the new model, re-embeds all chunks (showing progress), replaces all vectors in `chunks_vec` (recreating the table if dimension changed), and updates `model_name` and `embedding_dim` in the DB config table
-
-#### Scenario: Interrupted reindex
- **WHEN** a reindex is interrupted partway through
- **THEN** the old embeddings remain intact (the vector table is only replaced on successful completion of all embeddings). The user can rerun `kb reindex` to retry.
-
-### Requirement: Embedding model inference via ONNX
-The system SHALL use `sentence-transformers` with the ONNX backend for all embedding inference. This avoids a PyTorch dependency. The ONNX Runtime (`onnxruntime`) SHALL be the inference engine.
-
-#### Scenario: Embed a chunk
- **WHEN** a chunk of text needs to be embedded during ingestion
- **THEN** the system uses the sentence-transformers ONNX backend to produce a float vector of the correct dimension for the active model
-
-#### Scenario: Embed a query
- **WHEN** a search query needs to be embedded
- **THEN** the system applies the configured `query_prefix` (if any) to the query text before embedding, and uses the same ONNX model used for chunk embeddings
@@ -1,70 +0,0 @@
-## ADDED Requirements
-
-### Requirement: Full-text search via FTS5
-The system SHALL maintain an FTS5 virtual table synchronised with the chunks table via triggers. FTS5 SHALL use the `porter unicode61` tokenizer for stemming and unicode support. Queries SHALL be passed to FTS5 with special characters escaped.
-
-#### Scenario: Keyword search
- **WHEN** user runs `kb search "install git"`
- **THEN** FTS5 returns chunks containing "install" and/or "git" (including stemmed variants like "installation"), ranked by BM25 score
-
-#### Scenario: FTS-only mode
- **WHEN** user runs `kb search "install git" --fts-only`
- **THEN** only FTS5 results are returned, no vector search is performed
-
-### Requirement: Vector similarity search via sqlite-vec
-The system SHALL embed the query using the same model that was used to embed stored chunks. The embedded query SHALL be compared against all chunk embeddings in `chunks_vec` using cosine similarity. The system SHALL retrieve 3× the requested result count as candidates for RRF merging.
-
-#### Scenario: Semantic search
- **WHEN** user runs `kb search "how to set up version control"`
- **THEN** the query is embedded and compared against stored vectors, returning semantically similar chunks even if they don't contain the exact words "version control"
-
-#### Scenario: Vector-only mode
- **WHEN** user runs `kb search "how to set up version control" --vec-only`
- **THEN** only vector similarity results are returned, no FTS search is performed
-
-### Requirement: Reciprocal Rank Fusion merging
-The system SHALL merge FTS5 and vector search results using Reciprocal Rank Fusion (RRF). The RRF formula SHALL be: `score(d) = Σ 1/(k + rank)` where `k` is configurable (default: 60). Results SHALL be sorted by descending RRF score.
-
-#### Scenario: Hybrid search combines both signals
- **WHEN** user runs `kb search "install git"` (default hybrid mode)
- **THEN** the system runs both FTS5 and vector searches, merges results via RRF, and returns results sorted by combined score
-
-#### Scenario: Document appears in both result sets
- **WHEN** a chunk ranks #2 in FTS5 and #5 in vector search
- **THEN** its RRF score SHALL be `1/(60+2) + 1/(60+5) = 0.0161 + 0.0154 = 0.0315`, higher than a chunk appearing in only one result set
-
-### Requirement: Tag-based filtering
-The system SHALL support filtering search results by one or more tags. When multiple tags are specified, the filter SHALL use AND logic (document must have ALL specified tags). Tag filtering SHALL be applied in the SQL query via JOIN for efficiency.
-
-#### Scenario: Filter by single tag
- **WHEN** user runs `kb search "deploy" --tags ops`
- **THEN** only chunks from documents tagged with "ops" are included in results
-
-#### Scenario: Filter by multiple tags
- **WHEN** user runs `kb search "deploy" --tags ops,production`
- **THEN** only chunks from documents tagged with BOTH "ops" AND "production" are included
-
-### Requirement: Type-based filtering
-The system SHALL support filtering search results by document type. Valid types: `pdf`, `markdown`, `code`, `note`.
-
-#### Scenario: Filter by type
- **WHEN** user runs `kb search "deploy" --type code`
- **THEN** only chunks from code documents are included in results
-
-### Requirement: Score threshold
-The system SHALL support a minimum score cutoff. Results with an RRF score below the threshold SHALL be excluded from output.
-
-#### Scenario: Apply score threshold
- **WHEN** user runs `kb search "deploy" --threshold 0.02`
- **THEN** only results with RRF score >= 0.02 are returned
-
-### Requirement: Result count control
-The system SHALL return a configurable number of results (default: 10, configurable via `--top` flag or `search.default_top` in config).
-
-#### Scenario: Request specific number of results
- **WHEN** user runs `kb search "deploy" --top 5`
- **THEN** at most 5 results are returned
-
-#### Scenario: Fewer matches than requested
- **WHEN** user searches and only 3 chunks match
- **THEN** the system returns 3 results without error, with `returned: 3` in the output
@@ -1,101 +0,0 @@
-## ADDED Requirements
-
-### Requirement: JSON output format for search
-The system SHALL output search results as JSON when `--format json` is used (this is the default). The JSON schema SHALL include: `query`, `results` array, `total_matches`, and `returned` count. Each result SHALL include: `chunk_id`, `score`, `score_breakdown` (with `fts` and `vector` sub-scores), `text`, and `source` object.
-
-#### Scenario: JSON search output
- **WHEN** user runs `kb search "install git" --format json`
- **THEN** the output is valid JSON matching this structure:
-  ```json
-  {
-    "query": "install git",
-    "results": [
-      {
-        "chunk_id": 1423,
-        "score": 0.031,
-        "score_breakdown": {"fts": 0.016, "vector": 0.015},
-        "text": "To install the latest version...",
-        "source": {
-          "document_id": 42,
-          "title": "Git Admin Guide",
-          "path": "/home/user/docs/git-admin.pdf",
-          "type": "pdf",
-          "page": 12,
-          "chunk_index": 3,
-          "total_chunks": 28,
-          "tags": ["git", "admin"]
-        }
-      }
-    ],
-    "total_matches": 47,
-    "returned": 10
-  }
-  ```
-
-#### Scenario: Score breakdown in FTS-only mode
- **WHEN** user runs `kb search "test" --fts-only --format json`
- **THEN** `score_breakdown` contains `{"fts": <score>, "vector": null}`
-
-#### Scenario: Score breakdown in vector-only mode
- **WHEN** user runs `kb search "test" --vec-only --format json`
- **THEN** `score_breakdown` contains `{"fts": null, "vector": <score>}`
-
-### Requirement: Human-readable output format
-The system SHALL support human-readable output via `--format human`. This format SHALL show: query, match count, and for each result: rank, score, title, page/section (if applicable), type, tags, and a text preview.
-
-#### Scenario: Human-readable search output
- **WHEN** user runs `kb search "install git" --format human`
- **THEN** output is formatted for terminal reading:
-  ```
-  Search: "install git" (47 matches, showing top 10)
-
-   1. [0.031] Git Admin Guide (p.12)               [pdf] [git, admin]
-      To install the latest version of git from source...
-
-   2. [0.025] setup-notes.md §Installation          [markdown] [git]
-      First, add the PPA repository for the latest git...
-  ```
-
-### Requirement: JSON output for list and tags commands
-The system SHALL support `--format json` for `kb list`, `kb tags`, `kb info`, and `kb status` commands. JSON output SHALL be valid and parseable by the skill wrapper.
-
-#### Scenario: List documents as JSON
- **WHEN** user runs `kb list --format json`
- **THEN** output is a JSON array of document objects with `id`, `title`, `type`, `tags`, `chunk_count`, `created_at`
-
-#### Scenario: Tags as JSON
- **WHEN** user runs `kb tags --format json`
- **THEN** output is a JSON array: `[{"name": "git", "count": 15}, ...]`
-
-#### Scenario: Status as JSON
- **WHEN** user runs `kb status --format json`
- **THEN** output is a JSON object with `documents` (counts by type), `total_chunks`, `db_size_bytes`, `model_name`, `embedding_dim`, `schema_version`
-
-### Requirement: JSON schema stability
-The JSON output schema SHALL be treated as a public contract. Fields MAY be added to JSON objects in future versions. Fields SHALL NOT be removed or renamed. The skill wrapper MUST be able to rely on the presence and type of all documented fields.
-
-#### Scenario: Forward compatibility
- **WHEN** a future version adds a `language` field to search results
- **THEN** all existing fields remain present and unchanged, the new field is additive only
-
-### Requirement: Exit codes
-The system SHALL use consistent exit codes: 0 for success, 1 for user errors (bad arguments, missing files), 2 for system errors (database corruption, model failure). JSON error output SHALL include an `error` field with a human-readable message.
-
-#### Scenario: Successful operation
- **WHEN** any command completes successfully
- **THEN** exit code is 0
-
-#### Scenario: User error with JSON output
- **WHEN** user runs `kb search` with no query argument
- **THEN** exit code is 1 and stderr contains a clear error message
-
-#### Scenario: System error
- **WHEN** the SQLite database is corrupted
- **THEN** exit code is 2 and stderr contains the error details
-
-### Requirement: Skill definition file
-The project SHALL include a `SKILL.md` file that defines how an LLM tool (e.g. Claude Code) should invoke and interpret `kb` commands. The skill file SHALL document: when to use the tool, available commands, output format, how to cite sources, and how to handle low-confidence results.
-
-#### Scenario: Skill file exists
- **WHEN** the project is built
- **THEN** a `SKILL.md` file exists at the project root describing the skill interface for LLM consumption
@@ -1,115 +0,0 @@
-## 1. Project Scaffolding
-
- [x] 1.1 Create Python virtual environment (`python3 -m venv .venv`) and add `.venv/` to `.gitignore`. All development and testing MUST run inside this venv.
- [x] 1.2 Create `pyproject.toml` with project metadata, dependencies (`click`, `sqlite-vec`, `pyyaml`, `sentence-transformers`, `onnxruntime`, `docling`), dev dependencies (`pytest`, `pytest-cov`), and `[project.scripts] kb = "kb_search.cli:main"` entry point
- [x] 1.3 Install the project in editable mode inside the venv: `.venv/bin/pip install -e ".[dev]"`
- [x] 1.4 Create `src/kb_search/` package directory with `__init__.py`
- [x] 1.5 Create `src/kb_search/cli.py` with Click group and stub subcommands (`init`, `add`, `search`, `list`, `info`, `remove`, `tags`, `tag`, `status`, `reindex`, `config`)
- [x] 1.6 Verify `.venv/bin/kb --help` shows all commands
-
-## 2. Configuration
-
- [x] 2.1 Create `src/kb_search/config.py` — load YAML from `~/.kb/config.yaml` with deep-merge against built-in defaults. Handle missing file gracefully.
- [x] 2.2 Implement ENV variable overrides (`KB_DATA_DIR`, `KB_MODEL`, `KB_DEFAULT_TOP`, `KB_DEFAULT_FORMAT`) with precedence: CLI flags > ENV > YAML > defaults
- [x] 2.3 Implement `kb config` command — display fully resolved config with source indicators
- [x] 2.4 Implement `kb config set <key> <value>` — write to `~/.kb/config.yaml`, creating file if needed
- [x] 2.5 Write tests for config loading, merging, ENV overrides, and precedence
-
-## 3. Database Layer
-
- [x] 3.1 Create `src/kb_search/database.py` — SQLite connection management with sqlite-vec extension loading
- [x] 3.2 Implement schema creation: `documents`, `chunks`, `tags`, `document_tags`, `config` tables per design.md
- [x] 3.3 Implement FTS5 virtual table (`chunks_fts`) with `porter unicode61` tokenizer and sync triggers (INSERT, UPDATE, DELETE)
- [x] 3.4 Implement `chunks_vec` virtual table via sqlite-vec
- [x] 3.5 Implement schema versioning: store `schema_version` in `config` table, check on open, run migrations sequentially
- [x] 3.6 Implement DB config helpers: `get_config(key)`, `set_config(key, value)` for model binding
- [x] 3.7 Write tests for schema creation, migrations, FTS sync triggers, and config helpers
-
-## 4. Embedding Management
-
- [x] 4.1 Create `src/kb_search/embeddings.py` — model download, ONNX export, and loading via `SentenceTransformer(model_name, backend="onnx")`
- [x] 4.2 Implement model-database binding: on init, write model_name + embedding_dim to DB config; on load, verify match and hard-error on mismatch
- [x] 4.3 Implement `embed_texts(texts: list[str]) -> list[list[float]]` with configurable query/passage prefix support
- [x] 4.4 Implement `kb init` command — create `~/.kb/`, init DB schema, download model, record binding. Support `--model` flag and `--status` check.
- [x] 4.5 Implement `kb reindex` command — download new model if `--model` specified, re-embed all chunks with progress bar, replace vectors atomically, update DB config
- [x] 4.6 Write tests for embedding, model binding verification, and mismatch detection
-
-## 5. Document Ingestion — Core
-
- [x] 5.1 Create `src/kb_search/ingest/__init__.py` and `src/kb_search/ingest/detector.py` — file type detection by extension, routing to correct pipeline, `--type`/`--language` override support
- [x] 5.2 Implement deduplication: SHA-256 content hash, skip-if-exists check against `documents.content_hash`
- [x] 5.3 Implement `kb add <file>` command — detect type, route to pipeline, store document + chunks + embeddings + tags in a single transaction
- [x] 5.4 Implement `kb add --note "text"` — create note document with whole-text chunk, optional `--title`, auto-title from first 80 chars
- [x] 5.5 Implement `kb add <dir> --recursive` — walk directory, filter supported extensions, process each file, skip dupes, log failures to `~/.kb/ingest-errors.log`, display summary
- [x] 5.6 Implement parallel ingestion with configurable `--workers` (default: 4), serialised DB writes
- [x] 5.7 Write tests for type detection, dedup, note creation, and batch processing
-
-## 6. Document Ingestion — Docling Pipeline
-
- [x] 6.1 Create `src/kb_search/ingest/docling.py` — Docling `DocumentConverter` setup with `pypdfium2` backend, layout model enabled, table reconstruction enabled
- [x] 6.2 Implement OCR configuration (`auto`/`always`/`never`) per config.yaml `ingestion.enable_ocr`
- [x] 6.3 Implement hierarchy-aware chunking via Docling's `HierarchicalChunker`, with fallback to fixed-size chunking when hierarchy detection fails
- [x] 6.4 Extract and preserve chunk metadata: page number, section headers, table markers
- [x] 6.5 Wire Docling models to download on `kb init` (using HuggingFace default cache)
- [x] 6.6 Write tests with sample PDFs (text-based, table-heavy, mixed layout)
-
-## 7. Document Ingestion — Markdown Pipeline
-
- [x] 7.1 Create `src/kb_search/ingest/markdown.py` — split at `##`/`###` header boundaries
- [x] 7.2 Implement parent header chain context (e.g. "Config > Advanced Options" prefix on nested chunks)
- [x] 7.3 Implement small section merging (sections below `min_tokens` merged with next section)
- [x] 7.4 Implement large section splitting at paragraph boundaries with overlap
- [x] 7.5 Implement fallback to fixed-size chunking for plain text files without headers
- [x] 7.6 Write tests for header splitting, merging, hierarchy context, and plain text fallback
-
-## 8. Document Ingestion — Code Pipeline
-
- [x] 8.1 Create `src/kb_search/ingest/code.py` — language detection from extension (`.py`, `.sh`, `.bash`, `.go`)
- [x] 8.2 Implement Python AST splitting using stdlib `ast` module — function and class boundaries, class docstring context on methods
- [x] 8.3 Implement Bash regex splitting — `function name()` and `name()` patterns with preceding comment blocks
- [x] 8.4 Implement Go regex splitting — `func` declarations with type grouping
- [x] 8.5 Implement fallback to fixed-size chunking when no function/class boundaries detected
- [x] 8.6 Write tests for each language parser and fallback behaviour
-
-## 9. Hybrid Search
-
- [x] 9.1 Create `src/kb_search/search.py` — FTS5 query execution with BM25 scoring, special character escaping
- [x] 9.2 Implement vector similarity search: embed query, query `chunks_vec` for top-K (3× requested), cosine similarity
- [x] 9.3 Implement Reciprocal Rank Fusion: merge FTS and vector results with `score(d) = Σ 1/(k + rank)`, configurable `k` (default: 60)
- [x] 9.4 Implement `--fts-only` and `--vec-only` modes
- [x] 9.5 Implement tag filtering via SQL JOIN and type filtering via WHERE clause
- [x] 9.6 Implement `--threshold` score cutoff (post-RRF)
- [x] 9.7 Implement `--top` result count control (default from config)
- [x] 9.8 Wire up `kb search` command with all flags: `--top`, `--tags`, `--type`, `--format`, `--fts-only`, `--vec-only`, `--threshold`
- [x] 9.9 Write tests for FTS, vector search, RRF merging, filtering, and edge cases (empty results, fewer matches than requested)
-
-## 10. Output Formatting
-
- [x] 10.1 Create `src/kb_search/output.py` — JSON formatter for search results matching the schema in skill-interface spec
- [x] 10.2 Implement human-readable formatter for search results (rank, score, title, page/section, type, tags, text preview)
- [x] 10.3 Implement JSON formatters for `list`, `tags`, `info`, and `status` commands
- [x] 10.4 Implement human-readable formatters for `list`, `tags`, `info`, and `status` commands
- [x] 10.5 Implement consistent exit codes: 0 success, 1 user error, 2 system error
- [x] 10.6 Write tests for JSON output schema validation and exit codes
-
-## 11. Document Management Commands
-
- [x] 11.1 Implement `kb list` — query documents with optional `--type` and `--tags` filters, `--format` output
- [x] 11.2 Implement `kb info <doc_id>` — document details with chunk previews
- [x] 11.3 Implement `kb remove <doc_id>` — cascading delete with confirmation prompt, `--yes` flag
- [x] 11.4 Implement `kb tags` — list all tags with document counts, `--format` support
- [x] 11.5 Implement `kb tag <doc_id> --add/--remove` — tag management, case-insensitive storage
- [x] 11.6 Implement `kb status` — DB stats, model info, storage size, schema version
- [x] 11.7 Write tests for each management command
-
-## 12. Skill Definition
-
- [x] 12.1 Write `SKILL.md` — when to use, available commands, output format, how to cite sources, handling low-confidence results, multi-query guidance
- [x] 12.2 Test the skill end-to-end: ingest sample documents, run searches via the skill prompt, verify Claude Code can parse and cite results
-
-## 13. Packaging and Distribution
-
- [x] 13.1 Verify `pipx install kb-search` works from a clean environment
- [x] 13.2 Verify `kb init` downloads both embedding model and Docling models successfully
- [x] 13.3 Add a README with quickstart: install, init, add, search
- [x] 13.4 Add `py.typed` marker and basic type annotations on public interfaces
@@ -0,0 +1,100 @@
+# Docker Deployment
+
+## Purpose
+
+Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA and AMD platforms, along with Compose files for single-command deployment.
+
+## Requirements
+
+### Requirement: NVIDIA CUDA Docker image
+
+The project SHALL provide a `Dockerfile.nvidia` that builds the engine on an NVIDIA CUDA runtime base image with GPU support for PyTorch and ONNX Runtime.
+
+#### Scenario: Build NVIDIA image
+- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml build`
+- **THEN** the build SHALL produce a working image with CUDA runtime, PyTorch with CUDA support, onnxruntime-gpu, and all engine dependencies
+
+#### Scenario: GPU access in NVIDIA container
+- **WHEN** the NVIDIA container starts with `--gpus all` or the NVIDIA runtime
+- **THEN** `torch.cuda.is_available()` SHALL return True and the engine SHALL load the embedding model on GPU
+
+---
+
+### Requirement: AMD ROCm Docker image
+
+The project SHALL provide a `Dockerfile.rocm` that builds the engine on an AMD ROCm base image with GPU support for PyTorch and ONNX Runtime.
+
+#### Scenario: Build ROCm image
+- **WHEN** an admin runs `docker compose -f compose.rocm.yaml build`
+- **THEN** the build SHALL produce a working image with ROCm runtime, PyTorch with ROCm support, onnxruntime-rocm, and all engine dependencies
+
+#### Scenario: GPU access in ROCm container
+- **WHEN** the ROCm container starts with `--device=/dev/kfd --device=/dev/dri`
+- **THEN** `torch.cuda.is_available()` SHALL return True (via HIP) and the engine SHALL load the embedding model on GPU
+
+---
+
+### Requirement: Application code is GPU-vendor-agnostic
+
+The Python engine code SHALL NOT reference CUDA or ROCm directly. GPU vendor abstraction SHALL be handled entirely at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and AMD images without modification.
+
+#### Scenario: Same engine code on both platforms
+- **WHEN** the engine starts on an NVIDIA image and an AMD image with identical configuration
+- **THEN** both SHALL load the model, accept requests, and return identical search results for the same query and data
+
+---
+
+### Requirement: Bind-mount data directory
+
+The engine SHALL store all persistent state (SQLite database, HF model cache, staging directory) under a single configurable data directory. This directory SHALL be mounted from the host via bind mount.
+
+#### Scenario: Data directory structure
+- **WHEN** the engine starts for the first time
+- **THEN** it SHALL create the following structure under the data directory:
+  - `kb.db` — SQLite database
+  - `hf_cache/` — HuggingFace model cache
+  - `staging/` — temporary files for queued ingestion jobs
+
+#### Scenario: Portable data across hosts
+- **WHEN** an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
+- **THEN** the engine SHALL start successfully and serve all previously ingested documents without reprocessing
+
+#### Scenario: Portable data across GPU vendors
+- **WHEN** an admin moves the data directory from an NVIDIA host to an AMD host (same model name)
+- **THEN** the engine SHALL start successfully. Embeddings in the database remain valid (they are model-specific, not GPU-vendor-specific)
+
+---
+
+### Requirement: Compose files for deployment
+
+The project SHALL provide Docker Compose files for single-command deployment.
+
+#### Scenario: Start NVIDIA deployment
+- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
+- **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
+
+#### Scenario: Start ROCm deployment
+- **WHEN** an admin runs `docker compose -f compose.rocm.yaml up -d`
+- **THEN** the engine SHALL start with GPU access via ROCm device passthrough, bind-mount the data directory, and be reachable on the configured port
+
+#### Scenario: Automatic restart
+- **WHEN** the engine process crashes or the host reboots
+- **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)
+
+#### Scenario: Configure via environment
+- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, etc.)
+- **THEN** the engine SHALL use those values
+
+---
+
+### Requirement: CPU-only fallback
+
+The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.
+
+#### Scenario: No GPU available
+- **WHEN** the container starts without GPU passthrough (no `--gpus`, no `/dev/kfd`)
+- **THEN** the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable
+
+#### Scenario: Explicit CPU mode
+- **WHEN** `KB_DEVICE=cpu` and `KB_INGEST_DEVICE=cpu` are set in the environment
+- **THEN** the engine SHALL use CPU regardless of GPU availability
@@ -0,0 +1,205 @@
+# Engine API
+
+## Purpose
+
+The engine API provides an HTTP interface for knowledge base operations including search, document ingestion, document management, tag management, and system status.
+
+## Requirements
+
+### Requirement: Engine startup and model loading
+
+The engine SHALL load the embedding model eagerly at startup before accepting HTTP requests. The engine SHALL expose a health endpoint that returns unhealthy until the model is fully loaded and the database is initialised.
+
+#### Scenario: Cold start with model download
+- **WHEN** the engine starts for the first time with no cached model
+- **THEN** it SHALL download the configured embedding model, load it into memory (GPU if available, CPU otherwise), enable WAL mode on the SQLite database, and begin accepting requests only after all initialisation completes
+
+#### Scenario: Health check during startup
+- **WHEN** a client sends `GET /api/v1/health` before the model is loaded
+- **THEN** the engine SHALL respond with HTTP 503 and `{"status": "starting"}`
+
+#### Scenario: Health check after startup
+- **WHEN** a client sends `GET /api/v1/health` after initialisation completes
+- **THEN** the engine SHALL respond with HTTP 200 and `{"status": "healthy"}`
+
+---
+
+### Requirement: Hybrid search
+
+The engine SHALL provide hybrid search combining BM25 full-text search (via FTS5) and vector similarity search (via sqlite-vec), merged using Reciprocal Rank Fusion. Search SHALL complete in under 100ms when the model is warm.
+
+#### Scenario: Hybrid search with results
+- **WHEN** a client sends `POST /api/v1/search` with body `{"query": "how to change oil", "top": 5}`
+- **THEN** the engine SHALL embed the query using the resident model, run both FTS5 and vector searches, merge results via RRF, and return a JSON response with matched chunks including scores, document metadata, and tags
+
+#### Scenario: Search with filters
+- **WHEN** a client sends `POST /api/v1/search` with body `{"query": "brakes", "tags": ["maintenance"], "doc_type": "pdf", "top": 3}`
+- **THEN** the engine SHALL apply tag and document type filters to both FTS5 and vector results before merging
+
+#### Scenario: Search with mode override
+- **WHEN** a client sends `POST /api/v1/search` with body `{"query": "error log", "fts_only": true}`
+- **THEN** the engine SHALL return only FTS5 results without running vector search
+
+#### Scenario: Empty knowledge base
+- **WHEN** a client searches against an empty database
+- **THEN** the engine SHALL return HTTP 200 with `{"query": "...", "results": [], "total_matches": 0}`
+
+---
+
+### Requirement: Async ingestion via job queue
+
+The engine SHALL accept file uploads and text notes for ingestion asynchronously. Uploaded content SHALL be written to a staging area and a job record created in the database. The engine SHALL return HTTP 202 immediately. A background worker SHALL process queued jobs sequentially.
+
+#### Scenario: Upload a PDF file
+- **WHEN** a client sends `POST /api/v1/jobs` with a multipart form containing a PDF file and optional fields (tags, doc_type)
+- **THEN** the engine SHALL write the file to the staging directory, create a job record with status `queued`, and return HTTP 202 with `{"job_id": "<id>", "status": "queued", "filename": "report.pdf"}`
+
+#### Scenario: Upload a text note
+- **WHEN** a client sends `POST /api/v1/jobs` with a multipart form containing a `note` text field and optional `title` field
+- **THEN** the engine SHALL write the note content to a staging file, create a job record with status `queued`, and return HTTP 202 with the job ID
+
+#### Scenario: Upload multiple files in sequence
+- **WHEN** a client sends multiple `POST /api/v1/jobs` requests in quick succession
+- **THEN** the engine SHALL queue each job independently and the background worker SHALL process them in FIFO order
+
+#### Scenario: Duplicate content detection
+- **WHEN** a client uploads a file whose content hash matches an already-ingested document
+- **THEN** the engine SHALL return HTTP 202 but the background worker SHALL mark the job as `skipped` with reason `duplicate`
+
+#### Scenario: Upload failure due to unsupported file type
+- **WHEN** a client uploads a file with an unsupported extension
+- **THEN** the engine SHALL return HTTP 422 with an error message listing supported types
+
+---
+
+### Requirement: Job status tracking
+
+The engine SHALL maintain job records in SQLite with status tracking. Jobs SHALL transition through states: `queued` → `processing` → `done` | `failed` | `skipped`.
+
+#### Scenario: List all jobs
+- **WHEN** a client sends `GET /api/v1/jobs`
+- **THEN** the engine SHALL return a JSON array of job records ordered by creation time (newest first), each including job_id, filename, status, created_at, and completed_at
+
+#### Scenario: Filter jobs by status
+- **WHEN** a client sends `GET /api/v1/jobs?status=failed`
+- **THEN** the engine SHALL return only jobs with the specified status
+
+#### Scenario: Get job details
+- **WHEN** a client sends `GET /api/v1/jobs/{id}`
+- **THEN** the engine SHALL return the full job record including status, filename, error message (if failed), document_id (if done), chunk count, and timing information
+
+#### Scenario: Job not found
+- **WHEN** a client sends `GET /api/v1/jobs/{id}` with a non-existent ID
+- **THEN** the engine SHALL return HTTP 404
+
+---
+
+### Requirement: Background ingestion worker
+
+The engine SHALL run a background worker that processes queued jobs. The worker SHALL process one job at a time. For each job, it SHALL: detect document type, run the appropriate chunking pipeline (Docling for PDFs, header-based for Markdown, AST-based for code, whole-text for notes), generate embeddings using the resident model, and insert chunks and vectors into the database.
+
+#### Scenario: Successful PDF ingestion
+- **WHEN** the background worker picks up a queued PDF job
+- **THEN** it SHALL update the job status to `processing`, run Docling conversion and chunking, embed all chunks, insert document and chunks into the database, update the job status to `done` with the resulting document_id and chunk count, and delete the staged file
+
+#### Scenario: Ingestion failure
+- **WHEN** the background worker encounters an error during processing (e.g., corrupt PDF)
+- **THEN** it SHALL update the job status to `failed` with the error message, delete the staged file, and continue processing the next queued job
+
+#### Scenario: Search during active ingestion
+- **WHEN** a search request arrives while the background worker is processing a job
+- **THEN** the search SHALL execute without blocking (SQLite WAL mode) and return results from already-ingested documents
+
+---
+
+### Requirement: Document management
+
+The engine SHALL provide endpoints to list, inspect, and remove ingested documents.
+
+#### Scenario: List documents
+- **WHEN** a client sends `GET /api/v1/documents`
+- **THEN** the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, and created_at
+
+#### Scenario: List documents with filters
+- **WHEN** a client sends `GET /api/v1/documents?type=pdf&tags=manual`
+- **THEN** the engine SHALL return only documents matching all specified filters
+
+#### Scenario: Get document details
+- **WHEN** a client sends `GET /api/v1/documents/{id}`
+- **THEN** the engine SHALL return the full document record including all chunks and their text content
+
+#### Scenario: Remove a document
+- **WHEN** a client sends `DELETE /api/v1/documents/{id}`
+- **THEN** the engine SHALL delete the document, all its chunks, associated embeddings, and tag associations, and return HTTP 200 with a confirmation
+
+#### Scenario: Remove non-existent document
+- **WHEN** a client sends `DELETE /api/v1/documents/{id}` with a non-existent ID
+- **THEN** the engine SHALL return HTTP 404
+
+---
+
+### Requirement: Tag management
+
+The engine SHALL provide endpoints to list all tags and manage tags on documents.
+
+#### Scenario: List all tags
+- **WHEN** a client sends `GET /api/v1/tags`
+- **THEN** the engine SHALL return a JSON array of tags with name and document count
+
+#### Scenario: Add tags to a document
+- **WHEN** a client sends `PUT /api/v1/documents/{id}/tags` with body `{"add": ["manual", "v2"]}`
+- **THEN** the engine SHALL add the specified tags to the document and return the updated tag list
+
+#### Scenario: Remove tags from a document
+- **WHEN** a client sends `PUT /api/v1/documents/{id}/tags` with body `{"remove": ["draft"]}`
+- **THEN** the engine SHALL remove the specified tags from the document and return the updated tag list
+
+---
+
+### Requirement: Engine status and reindex
+
+The engine SHALL provide status information and support re-embedding all chunks.
+
+#### Scenario: Get engine status
+- **WHEN** a client sends `GET /api/v1/status`
+- **THEN** the engine SHALL return JSON with model_name, embedding_dim, GPU device info, database stats (document count by type, total chunks, DB size), and queue stats (queued/processing job count)
+
+#### Scenario: Trigger reindex
+- **WHEN** a client sends `POST /api/v1/reindex`
+- **THEN** the engine SHALL re-embed all existing chunks using the currently loaded model and return progress information. This operation SHALL NOT block search queries.
+
+---
+
+### Requirement: API authentication
+
+The engine SHALL support optional API key authentication via Bearer token. When `KB_API_KEY` is set, all requests MUST include a matching `Authorization: Bearer <key>` header. When `KB_API_KEY` is not set, authentication SHALL be disabled.
+
+#### Scenario: Valid API key
+- **WHEN** `KB_API_KEY` is set and a request includes a matching Bearer token
+- **THEN** the engine SHALL process the request normally
+
+#### Scenario: Missing API key when required
+- **WHEN** `KB_API_KEY` is set and a request has no Authorization header
+- **THEN** the engine SHALL return HTTP 401 `{"error": "authentication required"}`
+
+#### Scenario: Invalid API key
+- **WHEN** `KB_API_KEY` is set and a request includes a non-matching Bearer token
+- **THEN** the engine SHALL return HTTP 401 `{"error": "invalid api key"}`
+
+#### Scenario: Auth disabled
+- **WHEN** `KB_API_KEY` is not set
+- **THEN** the engine SHALL process all requests without requiring authentication
+
+---
+
+### Requirement: Engine configuration via environment variables
+
+The engine SHALL be configured via environment variables. No config file is read by the engine — all configuration comes from the environment (set via compose.yaml or Docker run).
+
+#### Scenario: Default configuration
+- **WHEN** the engine starts with no environment variables set
+- **THEN** it SHALL use defaults: data directory `/data`, model `all-MiniLM-L6-v2`, device `auto`, no API key required
+
+#### Scenario: Custom model
+- **WHEN** `KB_MODEL` is set to `BAAI/bge-small-en-v1.5`
+- **THEN** the engine SHALL download and load that model instead of the default
@@ -0,0 +1,183 @@
+# Go Client
+
+## Purpose
+
+The Go client (`kb`) provides a command-line interface for interacting with the knowledge base engine, supporting search, document ingestion, job tracking, document management, tag management, and status display.
+
+## Requirements
+
+### Requirement: Single static binary with zero runtime dependencies
+
+The Go client SHALL compile to a single static binary with no runtime dependencies. It SHALL support cross-compilation for Linux (amd64, arm64), macOS (amd64, arm64), and Windows (amd64).
+
+#### Scenario: Install on a clean machine
+- **WHEN** a user downloads the `kb` binary for their platform
+- **THEN** they SHALL be able to run it immediately with no additional installs (no Python, no Docker, no shared libraries)
+
+---
+
+### Requirement: Client configuration
+
+The client SHALL read configuration from `~/.kb/client.yaml`. Configuration values SHALL be overridable via environment variables and CLI flags. Precedence: CLI flags > environment variables > config file > defaults.
+
+#### Scenario: Default configuration
+- **WHEN** no config file exists and no env vars or flags are set
+- **THEN** the client SHALL use defaults: engine URL `http://localhost:8000`, no API key, format `human`
+
+#### Scenario: Config file
+- **WHEN** `~/.kb/client.yaml` contains `engine_url: https://kb.example.com`
+- **THEN** the client SHALL use that URL for all API requests
+
+#### Scenario: Environment variable override
+- **WHEN** `KB_ENGINE_URL` is set
+- **THEN** it SHALL override the config file value
+
+#### Scenario: CLI flag override
+- **WHEN** the user passes `--engine https://other.host:8000`
+- **THEN** it SHALL override both the config file and environment variable
+
+#### Scenario: Engine unreachable
+- **WHEN** the client cannot connect to the engine URL
+- **THEN** it SHALL print a clear error message (e.g., "Cannot reach engine at http://localhost:8000 — is it running?") and exit with a non-zero code
+
+---
+
+### Requirement: Search command
+
+The client SHALL provide a `kb search <query>` command that sends the query to the engine and displays results.
+
+#### Scenario: Human-readable search output
+- **WHEN** the user runs `kb search "how to change oil"`
+- **THEN** the client SHALL POST to `/api/v1/search`, and display results in a human-readable format showing rank, score, document title, page/section, doc type, tags, and a text snippet
+
+#### Scenario: JSON search output
+- **WHEN** the user runs `kb search "query" --format json`
+- **THEN** the client SHALL output the raw JSON response from the engine
+
+#### Scenario: Search with filters
+- **WHEN** the user runs `kb search "brakes" --tags maintenance --type pdf --top 3`
+- **THEN** the client SHALL include the filters in the API request body
+
+#### Scenario: Search mode flags
+- **WHEN** the user runs `kb search "error" --fts-only`
+- **THEN** the client SHALL set `fts_only: true` in the request body
+
+---
+
+### Requirement: Add command (file and note ingestion)
+
+The client SHALL provide a `kb add` command that uploads files or notes to the engine for async ingestion. The client SHALL exit immediately after a successful upload.
+
+#### Scenario: Add a single file
+- **WHEN** the user runs `kb add report.pdf`
+- **THEN** the client SHALL upload the file via `POST /api/v1/jobs` (multipart), print "Queued: report.pdf", and exit
+
+#### Scenario: Add a file with tags
+- **WHEN** the user runs `kb add manual.pdf --tags car,maintenance`
+- **THEN** the client SHALL include the tags in the multipart upload metadata
+
+#### Scenario: Add a directory recursively
+- **WHEN** the user runs `kb add ~/documents/ --recursive`
+- **THEN** the client SHALL discover all supported files in the directory tree, upload each one sequentially, and print "Queued: N files"
+
+#### Scenario: Add a text note
+- **WHEN** the user runs `kb add --note "The server room is in building 3, floor 2"`
+- **THEN** the client SHALL submit the note text via `POST /api/v1/jobs` (multipart with note field), print "Queued: note", and exit
+
+#### Scenario: Add with JSON output
+- **WHEN** the user runs `kb add report.pdf --format json`
+- **THEN** the client SHALL output the JSON response from the engine including the job_id
+
+#### Scenario: File not found
+- **WHEN** the user runs `kb add nonexistent.pdf`
+- **THEN** the client SHALL print an error and exit with a non-zero code without making any API call
+
+#### Scenario: Upload failure
+- **WHEN** the upload fails (network error, engine returns 4xx/5xx)
+- **THEN** the client SHALL print the error and exit with a non-zero code
+
+---
+
+### Requirement: Jobs command
+
+The client SHALL provide a `kb jobs` command to view the ingestion queue.
+
+#### Scenario: List all jobs
+- **WHEN** the user runs `kb jobs`
+- **THEN** the client SHALL fetch `GET /api/v1/jobs` and display a table of recent jobs showing ID, filename, status, and timestamp
+
+#### Scenario: Filter jobs by status
+- **WHEN** the user runs `kb jobs --status failed`
+- **THEN** the client SHALL pass the status filter and display only matching jobs
+
+#### Scenario: Job details
+- **WHEN** the user runs `kb jobs <id>`
+- **THEN** the client SHALL fetch `GET /api/v1/jobs/{id}` and display full job details including error message (if failed), document_id (if done), and chunk count
+
+---
+
+### Requirement: Document management commands
+
+The client SHALL provide commands to list, inspect, and remove documents.
+
+#### Scenario: List documents
+- **WHEN** the user runs `kb list`
+- **THEN** the client SHALL fetch `GET /api/v1/documents` and display a table of documents with ID, title, type, tags, chunk count, and date
+
+#### Scenario: List with filters
+- **WHEN** the user runs `kb list --type pdf --tags manual`
+- **THEN** the client SHALL pass filters as query parameters
+
+#### Scenario: Document info
+- **WHEN** the user runs `kb info <id>`
+- **THEN** the client SHALL fetch `GET /api/v1/documents/{id}` and display full document details
+
+#### Scenario: Remove a document
+- **WHEN** the user runs `kb remove <id>`
+- **THEN** the client SHALL prompt for confirmation, then send `DELETE /api/v1/documents/{id}` and display the result
+
+#### Scenario: Remove with skip confirmation
+- **WHEN** the user runs `kb remove <id> --yes`
+- **THEN** the client SHALL skip the confirmation prompt
+
+---
+
+### Requirement: Tag management commands
+
+The client SHALL provide commands to list and manage tags.
+
+#### Scenario: List tags
+- **WHEN** the user runs `kb tags`
+- **THEN** the client SHALL fetch `GET /api/v1/tags` and display tags with document counts
+
+#### Scenario: Add tags to a document
+- **WHEN** the user runs `kb tag <id> --add manual,v2`
+- **THEN** the client SHALL send `PUT /api/v1/documents/{id}/tags` with the add payload
+
+#### Scenario: Remove tags from a document
+- **WHEN** the user runs `kb tag <id> --remove draft`
+- **THEN** the client SHALL send `PUT /api/v1/documents/{id}/tags` with the remove payload
+
+---
+
+### Requirement: Status command
+
+The client SHALL provide a `kb status` command to display engine status.
+
+#### Scenario: Display engine status
+- **WHEN** the user runs `kb status`
+- **THEN** the client SHALL fetch `GET /api/v1/status` and display model name, embedding dimensions, GPU info, document counts by type, total chunks, database size, and queue status
+
+---
+
+### Requirement: Global output format flag
+
+All commands SHALL support a `--format` flag accepting `human` (default) or `json`. The default MAY be changed via the `default_format` config value.
+
+#### Scenario: JSON output on any command
+- **WHEN** the user passes `--format json` to any command
+- **THEN** the client SHALL output the raw JSON response from the engine without human formatting
+
+#### Scenario: Human output (default)
+- **WHEN** the user runs any command without `--format`
+- **THEN** the client SHALL format the response in a human-readable table or structured text output