Remove AMD ROCm support — CPU and NVIDIA only

BREAKING: Remove Dockerfile.rocm, compose.rocm.yaml, and ROCm image build/push from the release pipeline. Remove AMD quick-start and ROCm references from README and DEVELOPER docs. Update docker-deployment and developer-docs specs to reflect CPU + NVIDIA only. The ROCm variant added significant complexity (4.2GB torch wheel, >20GB container) with limited usage. Users on AMD GPUs should stay on engine v3.2.x or switch to CPU mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch nvidia and rocm Dockerfiles from onnxruntime to torch
2026-04-06 16:39:37 +01:00 · 2026-04-06 16:13:41 +01:00 · 2026-04-06 16:12:58 +01:00 · 2026-04-04 22:50:19 +01:00 · 2026-04-04 22:43:42 +01:00 · 2026-04-04 22:34:47 +01:00
139 changed files with 8116 additions and 507 deletions
@@ -1,2 +1,9 @@
 examples/
-.claude/
+.claude/
 __pycache__/
 engine/data/
 TMP/
 .env
 .venv/
 test_mcp_client.py
@@ -0,0 +1,96 @@
 # Developer Guide
 Instructions for building from source, releasing, and contributing to kb.
 ## Building from source
 ### Engine
 ```bash
 cd engine
 # NVIDIA GPU
 KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
 ```
 ### Client
 ```bash
 cd client
 make build    # produces ./kb binary
 make all      # or cross-compile: dist/kb-{os}-{arch}
 ```
 ## Building and releasing
 Client and engine are versioned independently via `client/VERSION` and `engine/VERSION`. Each has its own release script and git tag prefix.
 ### Release client
 ```bash
 ./release-client.sh --gitea              # patch bump, release via Gitea
 ./release-client.sh --github --minor     # minor bump, release via GitHub
 ./release-client.sh --gitea --no-increment  # release current version as-is
 ./release-client.sh --gitea --dry-run    # preview without doing anything
 ```
 Creates tag `client-vX.Y.Z`, builds Go binaries for all platforms, and creates a Gitea/GitHub release with binaries attached.
 The client embeds a `MinEngineVersion` (from `client/MIN_ENGINE_VERSION`) and will hard-fail if the connected engine is too old.
 ### Release engine
 ```bash
 ./release-engine.sh --gitea              # patch bump, release via Gitea
 ./release-engine.sh --github --minor     # minor bump, release via GitHub
 ./release-engine.sh --gitea --no-increment  # release current version as-is
 ./release-engine.sh --gitea --dry-run    # preview without doing anything
 ```
 Creates tag `engine-vX.Y.Z`, builds NVIDIA and CPU Docker images, creates a Gitea/GitHub release, and pushes images to the registry.
 ### Checking versions
 ```bash
 # Client
 kb --version
 # Engine
 curl http://localhost:8000/api/v1/status | jq .version
 ```
 ### Docker images
 Images are pushed to `docker.dcglab.co.uk/dcg/kb/engine` with tags:
 - `engine-v2.0.6-nvidia` / `engine-v2.0.6-cpu` — versioned
 - `latest-nvidia` / `latest-cpu` — latest release
 Override the registry and org via environment variables:
 ```bash
 REGISTRY=ghcr.io IMAGE_ORG=myorg ./release-engine.sh --github
 ```
 ## API reference
 All endpoints are under `/api/v1/`. Requires `Authorization: Bearer <key>` header when `KB_API_KEY` is set.
 | Method | Endpoint | Description |
 |---|---|---|
 | `GET` | `/health` | Health check (bypasses auth) |
 | `POST` | `/search` | Hybrid search (JSON body) |
 | `POST` | `/jobs` | Upload file/note for ingestion (multipart, returns 202 or 409 if duplicate) |
 | `GET` | `/jobs` | List ingestion jobs |
 | `GET` | `/jobs/{id}` | Job details |
 | `GET` | `/documents` | List documents |
 | `GET` | `/documents/{id}` | Document details with chunks |
 | `GET` | `/documents/{id}/file` | Download original file |
 | `DELETE` | `/documents/{id}` | Remove a document (and stored file) |
 | `PUT` | `/documents/{id}/tags` | Add/remove tags |
 | `GET` | `/tags` | List all tags |
 | `GET` | `/status` | Engine status, GPU info, DB stats |
 | `POST` | `/reindex` | Re-embed all chunks |
 | `POST` | `/bulk/delete` | Bulk delete documents by filter |
 | `POST` | `/bulk/tags` | Bulk add/remove tags by filter |
 | `POST` | `/bulk/set-tags` | Bulk replace tags by filter |
@@ -0,0 +1,174 @@
 # MCP Server (Agent Integration)
 The MCP server exposes kb operations as native MCP tools, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI.
 ## Start the MCP server
 The compose files include a `kb-mcp` service alongside the engine. Set `KB_MCP_API_KEY` to require Bearer token auth from connecting agents:
 ```bash
 KB_API_KEY=your-engine-key KB_MCP_API_KEY=your-agent-key \
  docker compose -f engine/compose.nvidia.yaml up -d
 ```
 Or run the MCP server standalone:
 ```bash
 docker run -d --name kb-mcp \
  -p 3000:3000 \
  -e KB_ENGINE_URL=http://your-engine-host:8000 \
  -e KB_API_KEY=your-engine-key \
  -e KB_MCP_API_KEY=your-agent-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/mcp:latest
 ```
 ## MCP tools
 | Tool | Description |
 |---|---|
 | `kb_search` | Hybrid search with optional tag/type filters |
 | `kb_addnote` | Add a text note (queued for async ingestion) |
 | `kb_update_note` | Update an existing note in place |
 | `kb_get` | Get document details by ID or source path |
 | `kb_delete` | Permanently delete a document by ID |
 | `kb_status` | Engine health and statistics |
 | `kb_jobs` | Ingestion queue status |
 | `kb_upload_start` | Start a chunked file upload |
 | `kb_upload_chunk` | Upload a base64-encoded file chunk |
 | `kb_upload_finish` | Finish upload and submit for ingestion |
 | `kb_bulk_delete` | Delete multiple documents matching a filter |
 | `kb_bulk_tags` | Add/remove tags on multiple documents |
 | `kb_bulk_set_tags` | Replace all tags on multiple documents |
 ## Organising with tags
 Use tags to separate agent data from user documents. For example, an agent can tag all its notes with `agent:mybot` and filter by that tag when searching. This is a naming convention — configure it in your agent's system prompt. No special server-side enforcement is needed.
 Bulk tools accept filter-based selection (by tags, doc_type, ID list, or ID range) so agents can manage thousands of documents in a single call instead of looping. A safety threshold (default 70%, configurable via engine env var `KB_BULK_SAFETY_PERCENT`) prevents accidental mass operations unless `force: true` is set.
 ## MCP server configuration
 | Variable | Default | Description |
 |---|---|---|
 | `KB_ENGINE_URL` | `http://localhost:8000` | Engine API URL |
 | `KB_API_KEY` | (none) | Engine API key |
 | `KB_MCP_API_KEY` | (none) | Bearer token required from agents (disabled if unset) |
 | `KB_MCP_PORT` | `3000` | Port to listen on |
 ## Connecting AI coding tools
 The kb MCP server uses **Streamable HTTP** transport at `http://your-host:3000/mcp`. Below are configuration examples for popular AI coding tools.
 ### Claude Code (CLI / Desktop / Web)
 Add the server to your project or user settings:
 ```bash
 claude mcp add kb-server --transport http http://localhost:3000/mcp
 ```
 Or add it manually to `.claude/settings.json` (project) or `~/.claude/settings.json` (global):
 ```json
 {
  "mcpServers": {
    "kb-server": {
      "type": "http",
      "url": "http://localhost:3000/mcp",
      "headers": {
        "Authorization": "Bearer your-agent-key"
      }
    }
  }
 }
 ```
 ### VS Code (GitHub Copilot)
 Add to your `.vscode/settings.json` (workspace) or user settings:
 ```json
 {
  "mcp": {
    "servers": {
      "kb-server": {
        "type": "http",
        "url": "http://localhost:3000/mcp",
        "headers": {
          "Authorization": "Bearer your-agent-key"
        }
      }
    }
  }
 }
 ```
 Or add to `.vscode/mcp.json` in your workspace:
 ```json
 {
  "servers": {
    "kb-server": {
      "type": "http",
      "url": "http://localhost:3000/mcp",
      "headers": {
        "Authorization": "Bearer your-agent-key"
      }
    }
  }
 }
 ```
 ### Cursor
 Add to `.cursor/mcp.json` in your project root:
 ```json
 {
  "mcpServers": {
    "kb-server": {
      "type": "streamable-http",
      "url": "http://localhost:3000/mcp",
      "headers": {
        "Authorization": "Bearer your-agent-key"
      }
    }
  }
 }
 ```
 ### Windsurf
 Add to `~/.codeium/windsurf/mcp_config.json`:
 ```json
 {
  "mcpServers": {
    "kb-server": {
      "serverUrl": "http://localhost:3000/mcp",
      "headers": {
        "Authorization": "Bearer your-agent-key"
      }
    }
  }
 }
 ```
 ### JetBrains IDEs (IntelliJ, WebStorm, PyCharm, etc.)
 Add to `.junie/mcp.json` in your project root, or configure via **Settings > Tools > AI Assistant > MCP Servers**:
 ```json
 {
  "servers": {
    "kb-server": {
      "type": "http",
      "url": "http://localhost:3000/mcp",
      "headers": {
        "Authorization": "Bearer your-agent-key"
      }
    }
  }
 }
 ```
@@ -2,33 +2,62 @@
 Personal knowledge base with hybrid search (full-text + semantic vector search).
-v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP.
+Client-server architecture: a **FastAPI engine** running in Docker (with optional GPU acceleration), a lightweight **Go CLI client**, and an **MCP server** for native agent integration.
 ## Architecture
 ```
 Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
                            ▲
 MCP Agents  ──MCP/HTTP──▶ MCP Server (Docker) ──┘
 ```
- **Engine**: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
+- **Engine**: Keeps the embedding model warm in memory. Handles search, ingestion, document management, and note mutation via REST API. Runs in Docker with NVIDIA GPU or CPU-only support.
 - **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
 - **MCP Server**: Exposes kb operations as native MCP tools over Streamable HTTP. Runs as a separate Docker container alongside the engine. Use tags to scope agent data from user documents.
 - **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.
 ## Quick start
 ### 1. Start the engine
 **From pre-built images** (recommended):
 ```bash
 cd engine
 # NVIDIA GPU
-KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
+docker run -d --name kb-engine \
  --gpus all \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia
-# AMD GPU (ROCm)
+# CPU only (no GPU required — smaller image)
-KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d
+docker run -d --name kb-engine \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-cpu
 ```
-The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:
+Or use a compose file from the repo:
 ```bash
 # NVIDIA GPU
 KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d
 # CPU only
 KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d
 ```
 See [DEVELOPER.md](DEVELOPER.md) to run the engine from source.
 The engine will download the embedding model on first start (~90MB) and load it into memory (GPU or CPU). Check readiness:
 ```bash
 curl http://localhost:8000/api/v1/health
@@ -37,18 +66,32 @@ curl http://localhost:8000/api/v1/health
 ### 2. Install the client
-Build from source:
+**From a release** (recommended):
 Check [releases](https://gitea.dcglab.co.uk/steve/kb/releases) for the latest client tag, then:
 ```bash
-cd client
+# Set the version tag
-make build    # produces ./kb binary
+TAG=client-v3.0.0
 # Linux (amd64)
 curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-amd64
 # Linux (arm64)
 curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-arm64
 # macOS (Apple Silicon)
 curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-arm64
 # macOS (Intel)
 curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-amd64
 # Then install
 chmod +x kb
 sudo mv kb /usr/local/bin/
 ```
-Or cross-compile for all platforms:
+See [DEVELOPER.md](DEVELOPER.md) to build the client from source.
 ```bash
 make all      # produces dist/kb-{os}-{arch} binaries
 ```
 ### 3. Configure the client
@@ -65,10 +108,13 @@ Override via environment variables (`KB_ENGINE_URL`, `KB_API_KEY`) or CLI flags
 ### 4. Use it
 ```bash
-# Add documents (async — uploads and exits immediately)
+# Add notes
-kb add ~/docs/manual.pdf --tags admin
+kb addnote "Always restart nginx after config changes"
-kb add ~/notes/ --recursive
+kb addnote "Server room is building 3, floor 2" --tags ops
-kb add --note "Always restart nginx after config changes" --tags ops
+
 # Add files (async — uploads and exits immediately)
 kb addfile ~/docs/manual.pdf --tags admin
 kb addfile ~/notes/ --recursive
 # Check ingestion progress
 kb jobs
@@ -77,13 +123,22 @@ kb jobs
 kb search "how to install git"
 kb search "deploy process" --tags ops --type pdf
 # Update a note in place
 kb updatenote 42 "revised note content"
 # Manage
 kb list
 kb info 1
 kb tags
 kb tag 1 --add important
 kb export 1 -o manual.pdf    # download original file
 kb remove 3 --yes
 kb status
 # Bulk operations
 kb bulk-remove --tags "draft,old" --type note --yes
 kb bulk-tag --type note --add "archived" --yes
 kb bulk-set-tags --tags "old-scheme" --set "new-scheme" --yes
 ```
 ## How it works
@@ -100,12 +155,15 @@ The engine is configured via environment variables (set in the compose file or v
 |---|---|---|
 | `KB_DATA_DIR` | `/data` | Data directory inside the container (bind-mounted) |
 | `KB_MODEL` | `all-MiniLM-L6-v2` | HuggingFace embedding model name |
-| `KB_DEVICE` | `auto` | Embedding device: `auto`, `cpu`, or `cuda` |
+| `KB_DEVICE` | `auto` | Embedding/search device: `auto`, `cpu`, or `cuda` |
-| `KB_INGEST_DEVICE` | `auto` | Docling layout detection device |
+| `KB_INGEST_DEVICE` | `auto` | Docling layout detection device: `auto`, `cpu`, or `cuda` |
 | `KB_API_KEY` | (none) | Optional Bearer token for API authentication |
 | `KB_SEARCH_THRESHOLD` | `0.01` | Minimum score for search results (filters noise) |
 | `KB_BULK_SAFETY_PERCENT` | `70` | Bulk operations affecting more than this % of documents are rejected unless `force` is set (0 disables) |
 | `KB_PORT` | `8000` | Port to expose |
-| `KB_DATA_PATH` | `./data` | Host path for bind mount (compose variable) |
+| `KB_HOST` | `0.0.0.0` | Host to bind to |
 | `HF_HUB_OFFLINE` | (none) | Set to `1` to prevent model downloads (use cached only) |
 | `KB_DATA_PATH` | `./data` | Host path for bind mount (compose variable, not used by engine) |
 ## Data portability
@@ -119,77 +177,14 @@ rsync -a ~/kb-data/ user@target:/home/user/kb-data/
 KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
 ```
-Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.
+Data is device-agnostic — you can ingest on NVIDIA and serve from CPU (or vice versa) with the same data directory.
-## API reference
+## MCP server (agent integration)
-All endpoints are under `/api/v1/`. Requires `Authorization: Bearer <key>` header when `KB_API_KEY` is set.
+The MCP server exposes kb operations as native MCP tools over Streamable HTTP, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI. Includes setup guides for Claude Code, VS Code, Cursor, Windsurf, and JetBrains IDEs.
-| Method | Endpoint | Description |
+See **[MCP.md](MCP.md)** for full details — server setup, available tools, tag-based organisation, configuration, and client examples.
 |---|---|---|
 | `GET` | `/health` | Health check (bypasses auth) |
 | `POST` | `/search` | Hybrid search (JSON body) |
 | `POST` | `/jobs` | Upload file/note for ingestion (multipart, returns 202 or 409 if duplicate) |
 | `GET` | `/jobs` | List ingestion jobs |
 | `GET` | `/jobs/{id}` | Job details |
 | `GET` | `/documents` | List documents |
 | `GET` | `/documents/{id}` | Document details with chunks |
 | `DELETE` | `/documents/{id}` | Remove a document |
 | `PUT` | `/documents/{id}/tags` | Add/remove tags |
 | `GET` | `/tags` | List all tags |
 | `GET` | `/status` | Engine status, GPU info, DB stats |
 | `POST` | `/reindex` | Re-embed all chunks |
-## Building and releasing
+## Agent skill
-Versioning is managed via `client/VERSION` and `engine/VERSION` files. The release script bumps these, builds all artifacts, tags, and publishes in one step.
+If you are restricted from using MCP server, or you just prefer to utilise Agent SKILLS, please also see `SKILL.md` for the skill definition.
 ### Release
 ```bash
 ./release.sh --gitea              # patch bump (e.g. 2.0.0 → 2.0.1), release via Gitea
 ./release.sh --github --minor     # minor bump (e.g. 2.0.1 → 2.1.0), release via GitHub
 ./release.sh --gitea --major      # major bump (e.g. 2.1.0 → 3.0.0)
 ./release.sh --gitea --no-increment  # release current version as-is
 ./release.sh --gitea --dry-run    # preview without doing anything
 ```
 The script will:
 1. Bump the version in both `client/VERSION` and `engine/VERSION` (unless `--no-increment`)
 2. Build Go client binaries for all platforms (linux/darwin/windows, amd64/arm64)
 3. Build Docker engine images for NVIDIA and ROCm
 4. Commit the version bump, create an annotated git tag, and push
 5. Create a release (with client binaries attached) via `tea` or `gh`
 6. Push Docker images to the registry
 ### Checking versions
 ```bash
 # Client
 kb --version
 # Engine
 curl http://localhost:8000/api/v1/status | jq .version
 ```
 ### Docker images
 Images are pushed to `docker.dcglab.co.uk/dcg/kb/engine` with tags:
 - `v2.1.0-nvidia` / `v2.1.0-rocm` — versioned
 - `latest-nvidia` / `latest-rocm` — latest release
 Override the registry and org via environment variables:
 ```bash
 REGISTRY=ghcr.io IMAGE_ORG=myorg ./release.sh --github
 ```
 ## Future: ROCm runtime migration
 The `onnxruntime-rocm` execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the **MIGraphX execution provider** as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from `onnxruntime-rocm` to `onnxruntime` with the MIGraphX EP and install the `migraphx` runtime libraries instead.
 ## Claude Code skill
 This tool is designed to be wrapped as a Claude Code skill. See `SKILL.md` for the skill definition.
@@ -1,6 +1,6 @@
 # kb-search skill
-Search the user's personal knowledge base containing PDFs, markdown documents, code snippets, and text notes.
+Search, manage, and add to the user's personal knowledge base containing PDFs, Word docs, HTML, markdown, code files, and text notes.
 ## When to use
@@ -8,10 +8,18 @@ Search the user's personal knowledge base containing PDFs, markdown documents, c
 - User explicitly says "check my notes", "search kb", "look in my knowledge base", "what do my docs say about..."
 - User references documents or notes they've previously stored
 - User asks "how do I..." style questions that their knowledge base likely covers
 - User wants to save a note, add a file, or manage their knowledge base
-## Available commands
+## Adding notes
-### Search (primary)
+```bash
 kb addnote "remember to update DNS records"                # add a note
 kb addnote "server room is building 3, floor 2" --tags ops # add a tagged note
 ```
 The note text must be a single quoted argument.
 ## Search (primary use case)
 ```bash
 kb search "<query>" --top 10 --format json
@@ -20,25 +28,109 @@ kb search "<query>" --top 10 --format json
 Returns JSON with ranked results combining full-text and semantic search.
 **Flags:**
- `--top N` — number of results (default: 10)
+- `-n, --top N` — number of results (default: 10)
 - `--tags tag1,tag2` — filter by tags (AND logic)
 - `--type pdf|markdown|code|note` — filter by document type
- `--format json|human` — output format (always use json)
+- `--format json|human` — output format (always use json for parsing)
 - `--fts-only` — keyword search only (skip semantic)
 - `--vec-only` — semantic search only (skip keyword)
 - `--threshold FLOAT` — minimum score cutoff
-### Other useful commands
+## Adding files
 ```bash
-kb list --format json                    # List all documents
+kb addfile report.pdf                           # single file
-kb list --type pdf --format json         # List only PDFs
+kb addfile report.pdf --tags admin,reference    # with tags
-kb tags --format json                    # List tags with counts
+kb addfile ~/docs/ --recursive                  # directory (recursive)
-kb info <doc_id> --format json           # Document details
+kb addfile ~/docs/ --recursive --tags reference # directory with tags
 kb status --format json                  # DB stats
 ```
-## Output format (search)
+Supported file types: `.pdf`, `.docx`, `.html`, `.md`, `.txt`, `.py`, `.sh`, `.go`. Unsupported extensions are rejected before upload.
 **Flags:**
 - `--tags tag1,tag2` — tags (comma-separated)
 - `-r, --recursive` — recursively add directory contents
 ## Document management
 ```bash
 kb list --format json                    # list all documents
 kb list --type pdf --format json         # filter by type
 kb list --tags admin --format json       # filter by tags
 kb info <doc_id> --format json           # document details with chunks
 kb export <doc_id> -o file.pdf           # download original file
 kb remove <doc_id>                       # remove (prompts for confirmation)
 kb remove <doc_id> --yes                 # remove without confirmation
 ```
 ## Tag management
 ```bash
 kb tags --format json                    # list all tags with counts
 kb tag <doc_id> --add important,ops      # add tags to a document
 kb tag <doc_id> --remove draft           # remove tags from a document
 ```
 ## Bulk operations
 Operate on multiple documents at once using filter-based selection. Filters combine with AND logic.
 **Filter flags (shared across all bulk commands):**
 - `--tags tag1,tag2` — match documents with ALL specified tags
 - `--type pdf|note|...` — match by document type
 - `--ids 1,5,12` — match specific document IDs
 - `--from-id N` — match documents with id >= N
 - `--to-id N` — match documents with id <= N
 - `--force` / `-f` — override safety threshold (blocks operations affecting >70% of all documents)
 - `--yes` / `-y` — skip confirmation prompt
 ```bash
 # Bulk delete
 kb bulk-remove --tags "draft,old" --type note --yes             # delete matching docs
 kb bulk-remove --from-id 10 --to-id 50 --yes                   # delete by ID range
 kb bulk-remove --ids "3,7,12" --yes                             # delete specific IDs
 # Bulk tag add/remove
 kb bulk-tag --tags "agent:mybot" --add "reviewed" --remove "pending" --yes
 kb bulk-tag --type note --add "archived" --yes                  # tag all notes
 # Bulk replace tags
 kb bulk-set-tags --tags "old-scheme" --set "new-scheme,migrated" --yes
 ```
 All bulk commands return a summary: matched count, succeeded count, failed count, and errors.
 A safety threshold prevents accidentally affecting more than 70% of documents unless `--force` is used.
 The threshold is configurable on the engine via `KB_BULK_SAFETY_PERCENT` (integer 0-100, default 70; 0 disables).
 ## Jobs (ingestion queue)
 ```bash
 kb jobs --format json                    # list recent jobs
 kb jobs --status failed --format json    # filter by status
 kb jobs <job_id> --format json           # job details
 ```
 ## Examples
 ```bash
 kb examples                              # show common usage examples
 ```
 ## Engine status and maintenance
 ```bash
 kb status --format json                  # engine status, GPU info, DB stats
 kb reindex --yes                         # re-embed all chunks (skip confirmation)
 ```
 ## Global flags
 All commands support:
 - `--format json|human` — output format (always use `json` for machine parsing)
 - `--engine <url>` — engine API URL (default: http://localhost:8000)
 - `--api-key <key>` — API key for authentication
 ## Search output format
 ```json
 {
@@ -47,18 +139,14 @@ kb status --format json                  # DB stats
    {
      "chunk_id": 1423,
      "score": 0.031,
      "score_breakdown": {"fts": 0.016, "vector": 0.015},
      "text": "To install the latest version of git from source...",
-      "source": {
+      "chunk_index": 3,
-        "document_id": 42,
+      "chunk_metadata": {"page": 12},
-        "title": "Git Admin Guide",
+      "title": "Git Admin Guide",
-        "path": "/home/user/docs/git-admin.pdf",
+      "doc_type": "pdf",
-        "type": "pdf",
+      "source_path": "/home/user/docs/git-admin.pdf",
-        "page": 12,
+      "created_at": "2026-03-15T10:30:00",
-        "chunk_index": 3,
+      "tags": ["git", "admin"]
        "total_chunks": 28,
        "tags": ["git", "admin"]
      }
    }
  ],
  "total_matches": 47,
@@ -66,7 +154,7 @@ kb status --format json                  # DB stats
 }
 ```
-## How to answer
+## How to answer search queries
 1. Run `kb search "<query>" --top 10 --format json`
 2. Read the returned chunks
@@ -93,7 +181,7 @@ Query 2: kb search "git merge explanation" --top 5 --format json
 Query 3: kb search "git rebase vs merge" --top 5 --format json
 ```
-## Filtering
+## Filtering tips
 Use filters when the question implies a specific domain:
@@ -101,10 +189,36 @@ Use filters when the question implies a specific domain:
 - From a specific topic → `--tags <topic>`
 - Check available tags first: `kb tags --format json`
 ## Updating notes
 ```bash
 kb updatenote 42 "revised note content"           # update note by ID
 ```
 Updates the text of an existing note in place, preserving its ID, creation timestamp, and tags. Re-chunks and re-embeds the new text.
 ## MCP server (agent integration)
 For agent-to-agent integration, kb provides an MCP server alongside the CLI. The MCP server
 exposes the same operations as native MCP tools over Streamable HTTP transport, which agents
 can connect to directly without subprocess overhead.
 **MCP tools:** `kb_search`, `kb_addnote`, `kb_update_note`, `kb_get`, `kb_delete`, `kb_status`,
 `kb_jobs`, `kb_upload_start`, `kb_upload_chunk`, `kb_upload_finish`, `kb_bulk_delete`,
 `kb_bulk_tags`, `kb_bulk_set_tags`.
 Use tags to separate agent data from user documents (e.g. tag all agent notes with
 `agent:mybot` and filter by that tag when searching). This convention is communicated
 via system prompt — no special server-side enforcement needed.
 If the kb engine is already running via Docker Compose, add the MCP server by deploying the
 `kb-mcp` service from the same compose file. Agents connect to it on port 3000 (default).
 ## Important notes
 - Always use `--format json` for machine parsing
 - The `score` field is relative, not absolute — compare scores within a result set
- `source.page` is only present for PDF documents
+- `chunk_metadata.page` is only present for PDF documents
- `source.section_header` is only present for markdown documents with headers
+- `chunk_metadata.section_header` is only present for markdown documents with headers
 - Results are already ranked by relevance (hybrid FTS + vector search)
 - Duplicate files are detected at upload time (HTTP 409) — the client handles this gracefully
@@ -0,0 +1 @@
 3.2.0
@@ -1,5 +1,6 @@
 VERSION ?= $(shell cat VERSION 2>/dev/null || echo "dev")
-LDFLAGS := -ldflags "-s -w -X github.com/kb-search/kb/cmd.Version=$(VERSION)"
+MIN_ENGINE_VERSION ?= $(shell cat MIN_ENGINE_VERSION 2>/dev/null || echo "dev")
 LDFLAGS := -ldflags "-s -w -X github.com/kb-search/kb/cmd.Version=$(VERSION) -X github.com/kb-search/kb/cmd.MinEngineVersion=$(MIN_ENGINE_VERSION)"
 PLATFORMS := linux/amd64 linux/arm64 darwin/amd64 darwin/arm64 windows/amd64
@@ -1 +1 @@
-2.0.5
+3.2.0
@@ -6,6 +6,7 @@ import (
 	"net/http"
 	"os"
 	"path/filepath"
 	"sort"
 	"strings"
 	"github.com/kb-search/kb/internal/api"
@@ -39,93 +40,25 @@ var supportedExts = map[string]bool{
 	".go":   true,
 }
-var addCmd = &cobra.Command{
+var addfileCmd = &cobra.Command{
-	Use:   "add <path>",
+	Use:   "addfile <path>",
-	Short: "Add a document or directory to the knowledge base",
+	Short: "Upload a file or directory to the knowledge base",
-	Args:  cobra.MaximumNArgs(1),
+	Args:  cobra.ExactArgs(1),
-	RunE:  runAdd,
+	RunE:  runAddfile,
 }
 func init() {
-	addCmd.Flags().String("tags", "", "tags (comma-separated)")
+	addfileCmd.Flags().String("tags", "", "tags (comma-separated)")
-	addCmd.Flags().String("type", "", "document type")
+	addfileCmd.Flags().BoolP("recursive", "r", false, "recursively add directory contents")
-	addCmd.Flags().BoolP("recursive", "r", false, "recursively add directory contents")
+	rootCmd.AddCommand(addfileCmd)
 	addCmd.Flags().String("note", "", "add a text note instead of a file")
 	addCmd.Flags().String("title", "", "title for the note")
 	rootCmd.AddCommand(addCmd)
 }
-func runAdd(cmd *cobra.Command, args []string) error {
+func runAddfile(cmd *cobra.Command, args []string) error {
 	tags, _ := cmd.Flags().GetString("tags")
 	docType, _ := cmd.Flags().GetString("type")
 	recursive, _ := cmd.Flags().GetBool("recursive")
 	note, _ := cmd.Flags().GetString("note")
 	title, _ := cmd.Flags().GetString("title")
 	client := api.NewClient()
 	// Note mode
 	if note != "" {
 		fields := map[string]string{
 			"note": note,
 		}
 		if title != "" {
 			fields["title"] = title
 		}
 		if tags != "" {
 			fields["tags"] = tags
 		}
 		if docType != "" {
 			fields["type"] = docType
 		}
 		resp, err := client.PostMultipart("/api/v1/jobs", fields, nil)
 		if err != nil {
 			fmt.Fprintln(os.Stderr, err)
 			os.Exit(1)
 		}
 		if resp.StatusCode == http.StatusConflict {
 			var result interface{}
 			if err := api.DecodeJSON(resp, &result); err != nil {
 				return fmt.Errorf("failed to decode response: %w", err)
 			}
 			if output.IsJSON() {
 				output.PrintJSON(result)
 			} else {
 				if m, ok := result.(map[string]interface{}); ok {
 					if docID, ok := m["document_id"].(float64); ok {
 						fmt.Printf("Already imported: %s (doc ID: %.0f)\n", m["title"], docID)
 					} else if jobID, ok := m["job_id"].(float64); ok {
 						fmt.Printf("Already queued: %s (job ID: %.0f)\n", m["title"], jobID)
 					}
 				}
 			}
 			return nil
 		}
 		if err := api.CheckError(resp); err != nil {
 			fmt.Fprintln(os.Stderr, err)
 			os.Exit(1)
 		}
 		var result interface{}
 		if err := api.DecodeJSON(resp, &result); err != nil {
 			return fmt.Errorf("failed to decode response: %w", err)
 		}
 		if output.IsJSON() {
 			output.PrintJSON(result)
 		} else {
 			fmt.Println("Queued: note")
 		}
 		return nil
 	}
 	if len(args) == 0 {
 		return fmt.Errorf("path argument is required (or use --note)")
 	}
 	path := args[0]
 	info, err := os.Stat(path)
 	if err != nil {
@@ -133,8 +66,19 @@ func runAdd(cmd *cobra.Command, args []string) error {
 	}
 	if !info.IsDir() {
 		// Validate extension
 		ext := strings.ToLower(filepath.Ext(path))
 		if !supportedExts[ext] {
 			supported := make([]string, 0, len(supportedExts))
 			for e := range supportedExts {
 				supported = append(supported, e)
 			}
 			sort.Strings(supported)
 			return fmt.Errorf("unsupported file type %q — supported: %s", ext, strings.Join(supported, ", "))
 		}
 		// Single file upload
-		result, err := uploadFile(client, path, tags, docType)
+		result, err := uploadFile(client, path, tags)
 		if err != nil {
 			fmt.Fprintln(os.Stderr, err)
 			os.Exit(1)
@@ -177,7 +121,7 @@ func runAdd(cmd *cobra.Command, args []string) error {
 	queued := 0
 	duplicates := 0
 	for _, f := range files {
-		result, err := uploadFile(client, f, tags, docType)
+		result, err := uploadFile(client, f, tags)
 		if err != nil {
 			fmt.Fprintf(os.Stderr, "Error uploading %s: %v\n", f, err)
 			continue
@@ -206,7 +150,7 @@ func runAdd(cmd *cobra.Command, args []string) error {
 	return nil
 }
-func uploadFile(client *api.Client, path, tags, docType string) (*uploadResult, error) {
+func uploadFile(client *api.Client, path, tags string) (*uploadResult, error) {
 	f, err := os.Open(path)
 	if err != nil {
 		return nil, fmt.Errorf("cannot open %s: %w", path, err)
@@ -217,9 +161,6 @@ func uploadFile(client *api.Client, path, tags, docType string) (*uploadResult,
 	if tags != "" {
 		fields["tags"] = tags
 	}
 	if docType != "" {
 		fields["type"] = docType
 	}
 	upload := &api.FileUpload{
 		FieldName: "file",
@@ -264,3 +205,4 @@ func uploadFile(client *api.Client, path, tags, docType string) (*uploadResult,
 	}
 	return &uploadResult{Raw: result}, nil
 }
@@ -0,0 +1,88 @@
 package cmd
 import (
 	"fmt"
 	"net/http"
 	"os"
 	"github.com/kb-search/kb/internal/api"
 	"github.com/kb-search/kb/internal/output"
 	"github.com/spf13/cobra"
 )
 var addnoteCmd = &cobra.Command{
 	Use:   "addnote <text>",
 	Short: "Add a text note to the knowledge base",
 	Args: func(cmd *cobra.Command, args []string) error {
 		if len(args) == 0 {
 			return fmt.Errorf("requires a note text argument\n\n  Usage: kb addnote \"your note text here\"")
 		}
 		if len(args) > 1 {
 			return fmt.Errorf("accepts 1 arg but received %d — quote your note text, e.g. kb addnote \"your note text here\"", len(args))
 		}
 		return nil
 	},
 	RunE:  runAddnote,
 }
 func init() {
 	addnoteCmd.Flags().String("tags", "", "tags (comma-separated)")
 	rootCmd.AddCommand(addnoteCmd)
 }
 func runAddnote(cmd *cobra.Command, args []string) error {
 	tags, _ := cmd.Flags().GetString("tags")
 	client := api.NewClient()
 	return submitNote(client, args[0], tags)
 }
 func submitNote(client *api.Client, note, tags string) error {
 	fields := map[string]string{
 		"note": note,
 	}
 	if tags != "" {
 		fields["tags"] = tags
 	}
 	resp, err := client.PostMultipart("/api/v1/jobs", fields, nil)
 	if err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	if resp.StatusCode == http.StatusConflict {
 		var result interface{}
 		if err := api.DecodeJSON(resp, &result); err != nil {
 			return fmt.Errorf("failed to decode response: %w", err)
 		}
 		if output.IsJSON() {
 			output.PrintJSON(result)
 		} else {
 			if m, ok := result.(map[string]interface{}); ok {
 				if docID, ok := m["document_id"].(float64); ok {
 					fmt.Printf("Already imported: %s (doc ID: %.0f)\n", m["title"], docID)
 				} else if jobID, ok := m["job_id"].(float64); ok {
 					fmt.Printf("Already queued: %s (job ID: %.0f)\n", m["title"], jobID)
 				}
 			}
 		}
 		return nil
 	}
 	if err := api.CheckError(resp); err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	var result interface{}
 	if err := api.DecodeJSON(resp, &result); err != nil {
 		return fmt.Errorf("failed to decode response: %w", err)
 	}
 	if output.IsJSON() {
 		output.PrintJSON(result)
 	} else {
 		fmt.Println("Queued: note")
 	}
 	return nil
 }
@@ -0,0 +1,186 @@
 package cmd
 import (
 	"bufio"
 	"fmt"
 	"os"
 	"strconv"
 	"strings"
 	"github.com/kb-search/kb/internal/api"
 	"github.com/kb-search/kb/internal/output"
 	"github.com/spf13/cobra"
 )
 var bulkRemoveCmd = &cobra.Command{
 	Use:   "bulk-remove",
 	Short: "Delete multiple documents matching a filter",
 	RunE:  runBulkRemove,
 }
 func init() {
 	addBulkFilterFlags(bulkRemoveCmd)
 	rootCmd.AddCommand(bulkRemoveCmd)
 }
 func runBulkRemove(cmd *cobra.Command, args []string) error {
 	body, err := buildBulkBody(cmd)
 	if err != nil {
 		return err
 	}
 	yes, _ := cmd.Flags().GetBool("yes")
 	if !yes {
 		desc := describeBulkFilter(cmd)
 		fmt.Printf("This will delete documents matching: %s\nProceed? [y/N] ", desc)
 		reader := bufio.NewReader(os.Stdin)
 		answer, _ := reader.ReadString('\n')
 		answer = strings.TrimSpace(strings.ToLower(answer))
 		if answer != "y" && answer != "yes" {
 			fmt.Println("Cancelled.")
 			return nil
 		}
 	}
 	client := api.NewClient()
 	resp, err := client.Post("/api/v1/bulk/delete", body)
 	if err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	if err := api.CheckError(resp); err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	var result map[string]interface{}
 	if err := api.DecodeJSON(resp, &result); err != nil {
 		return fmt.Errorf("failed to decode response: %w", err)
 	}
 	if output.IsJSON() {
 		output.PrintJSON(result)
 	} else {
 		printBulkResult("Deleted", result)
 	}
 	return nil
 }
 // ---------------------------------------------------------------------------
 // Shared helpers for all bulk commands
 // ---------------------------------------------------------------------------
 func addBulkFilterFlags(cmd *cobra.Command) {
 	cmd.Flags().String("tags", "", "filter by tags (comma-separated)")
 	cmd.Flags().String("type", "", "filter by document type")
 	cmd.Flags().String("ids", "", "filter by document IDs (comma-separated)")
 	cmd.Flags().Int("from-id", 0, "filter by id >= value")
 	cmd.Flags().Int("to-id", 0, "filter by id <= value")
 	cmd.Flags().BoolP("force", "f", false, "override safety threshold")
 	cmd.Flags().BoolP("yes", "y", false, "skip confirmation prompt")
 }
 func buildBulkBody(cmd *cobra.Command) (map[string]interface{}, error) {
 	body := map[string]interface{}{}
 	tagsStr, _ := cmd.Flags().GetString("tags")
 	if tagsStr != "" {
 		body["tags"] = splitTags(tagsStr)
 	}
 	docType, _ := cmd.Flags().GetString("type")
 	if docType != "" {
 		body["doc_type"] = docType
 	}
 	idsStr, _ := cmd.Flags().GetString("ids")
 	if idsStr != "" {
 		ids, err := parseIntList(idsStr)
 		if err != nil {
 			return nil, fmt.Errorf("invalid --ids: %w", err)
 		}
 		body["document_ids"] = ids
 	}
 	fromID, _ := cmd.Flags().GetInt("from-id")
 	if fromID > 0 {
 		body["from_id"] = fromID
 	}
 	toID, _ := cmd.Flags().GetInt("to-id")
 	if toID > 0 {
 		body["to_id"] = toID
 	}
 	force, _ := cmd.Flags().GetBool("force")
 	if force {
 		body["force"] = true
 	}
 	// Ensure at least one filter
 	hasFilter := tagsStr != "" || docType != "" || idsStr != "" || fromID > 0 || toID > 0
 	if !hasFilter {
 		return nil, fmt.Errorf("at least one filter is required (--tags, --type, --ids, --from-id, --to-id)")
 	}
 	return body, nil
 }
 func describeBulkFilter(cmd *cobra.Command) string {
 	var parts []string
 	tagsStr, _ := cmd.Flags().GetString("tags")
 	if tagsStr != "" {
 		parts = append(parts, fmt.Sprintf("tags=[%s]", tagsStr))
 	}
 	docType, _ := cmd.Flags().GetString("type")
 	if docType != "" {
 		parts = append(parts, fmt.Sprintf("type=%s", docType))
 	}
 	idsStr, _ := cmd.Flags().GetString("ids")
 	if idsStr != "" {
 		parts = append(parts, fmt.Sprintf("ids=[%s]", idsStr))
 	}
 	fromID, _ := cmd.Flags().GetInt("from-id")
 	if fromID > 0 {
 		parts = append(parts, fmt.Sprintf("from_id=%d", fromID))
 	}
 	toID, _ := cmd.Flags().GetInt("to-id")
 	if toID > 0 {
 		parts = append(parts, fmt.Sprintf("to_id=%d", toID))
 	}
 	return strings.Join(parts, " ")
 }
 func printBulkResult(action string, result map[string]interface{}) {
 	matched := int(result["matched"].(float64))
 	succeeded := int(result["succeeded"].(float64))
 	failed := int(result["failed"].(float64))
 	fmt.Printf("%s %d of %d documents", action, succeeded, matched)
 	if failed > 0 {
 		fmt.Printf(" (%d failed)", failed)
 	}
 	fmt.Println()
 }
 func parseIntList(s string) ([]int, error) {
 	var ids []int
 	for _, part := range strings.Split(s, ",") {
 		part = strings.TrimSpace(part)
 		if part == "" {
 			continue
 		}
 		id, err := strconv.Atoi(part)
 		if err != nil {
 			return nil, fmt.Errorf("invalid ID %q: %w", part, err)
 		}
 		ids = append(ids, id)
 	}
 	return ids, nil
 }
@@ -0,0 +1,73 @@
 package cmd
 import (
 	"bufio"
 	"fmt"
 	"os"
 	"strings"
 	"github.com/kb-search/kb/internal/api"
 	"github.com/kb-search/kb/internal/output"
 	"github.com/spf13/cobra"
 )
 var bulkSetTagsCmd = &cobra.Command{
 	Use:   "bulk-set-tags",
 	Short: "Replace all tags on multiple documents matching a filter",
 	RunE:  runBulkSetTags,
 }
 func init() {
 	addBulkFilterFlags(bulkSetTagsCmd)
 	bulkSetTagsCmd.Flags().String("set", "", "replacement tags (comma-separated)")
 	rootCmd.AddCommand(bulkSetTagsCmd)
 }
 func runBulkSetTags(cmd *cobra.Command, args []string) error {
 	body, err := buildBulkBody(cmd)
 	if err != nil {
 		return err
 	}
 	setStr, _ := cmd.Flags().GetString("set")
 	if setStr == "" {
 		return fmt.Errorf("--set is required (comma-separated list of replacement tags)")
 	}
 	body["new_tags"] = splitTags(setStr)
 	yes, _ := cmd.Flags().GetBool("yes")
 	if !yes {
 		desc := describeBulkFilter(cmd)
 		fmt.Printf("This will replace all tags with [%s] on documents matching: %s\nProceed? [y/N] ", setStr, desc)
 		reader := bufio.NewReader(os.Stdin)
 		answer, _ := reader.ReadString('\n')
 		answer = strings.TrimSpace(strings.ToLower(answer))
 		if answer != "y" && answer != "yes" {
 			fmt.Println("Cancelled.")
 			return nil
 		}
 	}
 	client := api.NewClient()
 	resp, err := client.Post("/api/v1/bulk/set-tags", body)
 	if err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	if err := api.CheckError(resp); err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	var result map[string]interface{}
 	if err := api.DecodeJSON(resp, &result); err != nil {
 		return fmt.Errorf("failed to decode response: %w", err)
 	}
 	if output.IsJSON() {
 		output.PrintJSON(result)
 	} else {
 		printBulkResult("Set tags on", result)
 	}
 	return nil
 }
@@ -0,0 +1,92 @@
 package cmd
 import (
 	"bufio"
 	"fmt"
 	"os"
 	"strings"
 	"github.com/kb-search/kb/internal/api"
 	"github.com/kb-search/kb/internal/output"
 	"github.com/spf13/cobra"
 )
 var bulkTagCmd = &cobra.Command{
 	Use:   "bulk-tag",
 	Short: "Add or remove tags on multiple documents matching a filter",
 	RunE:  runBulkTag,
 }
 func init() {
 	addBulkFilterFlags(bulkTagCmd)
 	bulkTagCmd.Flags().String("add", "", "tags to add (comma-separated)")
 	bulkTagCmd.Flags().String("remove", "", "tags to remove (comma-separated)")
 	rootCmd.AddCommand(bulkTagCmd)
 }
 func runBulkTag(cmd *cobra.Command, args []string) error {
 	body, err := buildBulkBody(cmd)
 	if err != nil {
 		return err
 	}
 	addStr, _ := cmd.Flags().GetString("add")
 	removeStr, _ := cmd.Flags().GetString("remove")
 	if addStr == "" && removeStr == "" {
 		return fmt.Errorf("specify --add and/or --remove")
 	}
 	if addStr != "" {
 		body["add"] = splitTags(addStr)
 	}
 	if removeStr != "" {
 		body["remove"] = splitTags(removeStr)
 	}
 	yes, _ := cmd.Flags().GetBool("yes")
 	if !yes {
 		desc := describeBulkFilter(cmd)
 		action := ""
 		if addStr != "" {
 			action += fmt.Sprintf("add=[%s]", addStr)
 		}
 		if removeStr != "" {
 			if action != "" {
 				action += " "
 			}
 			action += fmt.Sprintf("remove=[%s]", removeStr)
 		}
 		fmt.Printf("This will update tags (%s) on documents matching: %s\nProceed? [y/N] ", action, desc)
 		reader := bufio.NewReader(os.Stdin)
 		answer, _ := reader.ReadString('\n')
 		answer = strings.TrimSpace(strings.ToLower(answer))
 		if answer != "y" && answer != "yes" {
 			fmt.Println("Cancelled.")
 			return nil
 		}
 	}
 	client := api.NewClient()
 	resp, err := client.Post("/api/v1/bulk/tags", body)
 	if err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	if err := api.CheckError(resp); err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	var result map[string]interface{}
 	if err := api.DecodeJSON(resp, &result); err != nil {
 		return fmt.Errorf("failed to decode response: %w", err)
 	}
 	if output.IsJSON() {
 		output.PrintJSON(result)
 	} else {
 		printBulkResult("Tagged", result)
 	}
 	return nil
 }
@@ -0,0 +1,40 @@
 package cmd
 import (
 	"fmt"
 	"github.com/spf13/cobra"
 )
 var examplesCmd = &cobra.Command{
 	Use:   "examples",
 	Short: "Show common usage examples",
 	Args:  cobra.NoArgs,
 	Run: func(cmd *cobra.Command, args []string) {
 		fmt.Print(`Add notes:
  kb addnote "Remember to update DNS records"
  kb addnote "Server room is building 3" --tags ops
 Add files:
  kb addfile report.pdf
  kb addfile ~/docs/ --recursive --tags reference
 Search:
  kb search "how to restart nginx"
  kb search "deploy" --tags ops --top 5
 Update notes:
  kb updatenote 42 "revised note content"
 Manage documents:
  kb list --type pdf
  kb info 3
  kb tag 3 --add important,ops
  kb remove 3 --yes
 `)
 	},
 }
 func init() {
 	rootCmd.AddCommand(examplesCmd)
 }
@@ -0,0 +1,74 @@
 package cmd
 import (
 	"fmt"
 	"io"
 	"mime"
 	"os"
 	"path/filepath"
 	"github.com/kb-search/kb/internal/api"
 	"github.com/spf13/cobra"
 )
 var exportCmd = &cobra.Command{
 	Use:   "export <id>",
 	Short: "Download original document file",
 	Args:  cobra.ExactArgs(1),
 	RunE:  runExport,
 }
 func init() {
 	exportCmd.Flags().StringP("output", "o", "", "output file path (default: original filename to current directory)")
 	rootCmd.AddCommand(exportCmd)
 }
 func runExport(cmd *cobra.Command, args []string) error {
 	client := api.NewClient()
 	resp, err := client.Get("/api/v1/documents/" + args[0] + "/file")
 	if err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	if err := api.CheckError(resp); err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	defer resp.Body.Close()
 	outPath, _ := cmd.Flags().GetString("output")
 	if outPath == "" {
 		// Try to get filename from Content-Disposition header
 		cd := resp.Header.Get("Content-Disposition")
 		if cd != "" {
 			_, params, err := mime.ParseMediaType(cd)
 			if err == nil && params["filename"] != "" {
 				outPath = params["filename"]
 			}
 		}
 		if outPath == "" {
 			outPath = "document-" + args[0]
 		}
 	}
 	if outPath == "-" {
 		_, err := io.Copy(os.Stdout, resp.Body)
 		return err
 	}
 	outPath = filepath.Clean(outPath)
 	f, err := os.Create(outPath)
 	if err != nil {
 		return fmt.Errorf("failed to create output file: %w", err)
 	}
 	defer f.Close()
 	n, err := io.Copy(f, resp.Body)
 	if err != nil {
 		return fmt.Errorf("failed to write file: %w", err)
 	}
 	fmt.Fprintf(os.Stderr, "Saved %s (%d bytes)\n", outPath, n)
 	return nil
 }
@@ -0,0 +1,83 @@
 package cmd
 import (
 	"bufio"
 	"fmt"
 	"os"
 	"strings"
 	"github.com/kb-search/kb/internal/api"
 	"github.com/kb-search/kb/internal/output"
 	"github.com/spf13/cobra"
 )
 var reindexCmd = &cobra.Command{
 	Use:   "reindex",
 	Short: "Re-embed all chunks with the current engine model",
 	Args:  cobra.NoArgs,
 	RunE:  runReindex,
 }
 func init() {
 	reindexCmd.Flags().BoolP("yes", "y", false, "skip confirmation prompt")
 	rootCmd.AddCommand(reindexCmd)
 }
 func runReindex(cmd *cobra.Command, args []string) error {
 	yes, _ := cmd.Flags().GetBool("yes")
 	client := api.NewClient()
 	if !yes {
 		// Fetch model name from engine status
 		modelName := "current"
 		statusResp, err := client.Get("/api/v1/status")
 		if err == nil && api.CheckError(statusResp) == nil {
 			var status struct {
 				ModelName string `json:"model_name"`
 			}
 			if api.DecodeJSON(statusResp, &status) == nil && status.ModelName != "" {
 				modelName = status.ModelName
 			}
 		}
 		fmt.Printf("Reindex all chunks? This will re-embed everything with the %s model. [y/N] ", modelName)
 		reader := bufio.NewReader(os.Stdin)
 		answer, _ := reader.ReadString('\n')
 		answer = strings.TrimSpace(strings.ToLower(answer))
 		if answer != "y" && answer != "yes" {
 			fmt.Println("Cancelled.")
 			return nil
 		}
 	}
 	resp, err := client.Post("/api/v1/reindex", nil)
 	if err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	if err := api.CheckError(resp); err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	var result struct {
 		ChunksReindexed int    `json:"chunks_reindexed"`
 		Model           string `json:"model"`
 	}
 	if output.IsJSON() {
 		var raw interface{}
 		if err := api.DecodeJSON(resp, &raw); err != nil {
 			fmt.Fprintln(os.Stderr, "Failed to parse response:", err)
 			os.Exit(1)
 		}
 		output.PrintJSON(raw)
 	} else {
 		if err := api.DecodeJSON(resp, &result); err != nil {
 			fmt.Fprintln(os.Stderr, "Failed to parse response:", err)
 			os.Exit(1)
 		}
 		fmt.Printf("Reindexed %d chunks (model: %s)\n", result.ChunksReindexed, result.Model)
 	}
 	return nil
 }
@@ -4,6 +4,7 @@ import (
 	"fmt"
 	"os"
 	"github.com/kb-search/kb/internal/api"
 	"github.com/kb-search/kb/internal/config"
 	"github.com/spf13/cobra"
 )
@@ -11,6 +12,9 @@ import (
 // Version is set at build time via -ldflags.
 var Version = "dev"
 // MinEngineVersion is set at build time via -ldflags.
 var MinEngineVersion = "dev"
 var (
 	flagEngine string
 	flagFormat string
@@ -18,9 +22,9 @@ var (
 )
 var rootCmd = &cobra.Command{
-	Use:   "kb",
+	Use:   "kb [command]",
 	Short: "kb-search CLI client",
-	Long:  "A CLI client for the kb-search v2 engine API.",
+	Long:  "A CLI client for the kb-search v2 engine API.\nRun 'kb examples' for common usage patterns.",
 	PersistentPreRunE: func(cmd *cobra.Command, args []string) error {
 		if err := config.Load(); err != nil {
 			return err
@@ -31,6 +35,7 @@ var rootCmd = &cobra.Command{
 }
 func init() {
 	api.SetVersionInfo(Version, MinEngineVersion)
 	rootCmd.Version = Version
 	rootCmd.PersistentFlags().StringVar(&flagEngine, "engine", "", "engine API URL")
 	rootCmd.PersistentFlags().StringVar(&flagFormat, "format", "", "output format (human|json)")
@@ -0,0 +1,69 @@
 package cmd
 import (
 	"bytes"
 	"strings"
 	"testing"
 )
 func TestRootCmd_NoArgs_ShowsHelp(t *testing.T) {
 	rootCmd.SetArgs([]string{})
 	var stdout bytes.Buffer
 	rootCmd.SetOut(&stdout)
 	err := rootCmd.Execute()
 	if err != nil {
 		t.Fatalf("expected no error for zero args, got: %v", err)
 	}
 	output := stdout.String()
 	if !strings.Contains(output, "Available Commands") {
 		t.Errorf("expected help output, got: %s", output)
 	}
 }
 func TestRootCmd_UnknownCommand_ReturnsError(t *testing.T) {
 	rootCmd.SetArgs([]string{"notacommand"})
 	var stderr bytes.Buffer
 	rootCmd.SetErr(&stderr)
 	err := rootCmd.Execute()
 	if err == nil {
 		t.Fatal("expected error for unknown command, got nil")
 	}
 	errMsg := err.Error()
 	if !strings.Contains(errMsg, "unknown command") {
 		t.Errorf("expected 'unknown command' error, got: %s", errMsg)
 	}
 }
 func TestAddnoteCmd_NoArgs_ReturnsError(t *testing.T) {
 	rootCmd.SetArgs([]string{"addnote"})
 	err := rootCmd.Execute()
 	if err == nil {
 		t.Fatal("expected error for addnote with no args, got nil")
 	}
 	errMsg := err.Error()
 	if !strings.Contains(errMsg, "requires a note text argument") {
 		t.Errorf("expected 'requires a note text argument' error, got: %s", errMsg)
 	}
 }
 func TestAddnoteCmd_TooManyArgs_ReturnsError(t *testing.T) {
 	rootCmd.SetArgs([]string{"addnote", "hello", "world"})
 	err := rootCmd.Execute()
 	if err == nil {
 		t.Fatal("expected error for addnote with too many args, got nil")
 	}
 	errMsg := err.Error()
 	if !strings.Contains(errMsg, "quote your note text") {
 		t.Errorf("expected 'accepts 1 arg' error, got: %s", errMsg)
 	}
 }
@@ -67,15 +67,12 @@ func runSearch(cmd *cobra.Command, args []string) error {
 	var result struct {
 		Results []struct {
-			Score    float64 `json:"score"`
+			Score         float64                `json:"score"`
-			Document struct {
+			Title         string                 `json:"title"`
-				Title string `json:"title"`
+			DocType       string                 `json:"doc_type"`
-				Type  string `json:"doc_type"`
+			Tags          []string               `json:"tags"`
-				Tags  []string `json:"tags"`
+			ChunkMetadata map[string]interface{} `json:"chunk_metadata"`
-			} `json:"document"`
+			Text          string                 `json:"text"`
 			Page    interface{} `json:"page"`
 			Section string      `json:"section"`
 			Text    string      `json:"text"`
 		} `json:"results"`
 	}
@@ -103,26 +100,28 @@ func runSearch(cmd *cobra.Command, args []string) error {
 			snippet = snippet[:200] + "..."
 		}
-		fmt.Printf("\n%d. [%.4f] %s\n", i+1, r.Score, r.Document.Title)
+		fmt.Printf("\n%d. [%.4f] %s\n", i+1, r.Score, r.Title)
 		location := ""
-		if r.Page != nil {
+		if page, ok := r.ChunkMetadata["page"]; ok && page != nil {
-			location = fmt.Sprintf("Page %v", r.Page)
+			location = fmt.Sprintf("Page %v", page)
 		}
-		if r.Section != "" {
+		if section, ok := r.ChunkMetadata["section_header"]; ok && section != nil {
-			if location != "" {
+			if s, ok := section.(string); ok && s != "" {
-				location += " / "
+				if location != "" {
 					location += " / "
 				}
 				location += s
 			}
 			location += r.Section
 		}
 		if location != "" {
 			fmt.Printf("   Location: %s\n", location)
 		}
-		if r.Document.Type != "" {
+		if r.DocType != "" {
-			fmt.Printf("   Type: %s\n", r.Document.Type)
+			fmt.Printf("   Type: %s\n", r.DocType)
 		}
-		if len(r.Document.Tags) > 0 {
+		if len(r.Tags) > 0 {
-			fmt.Printf("   Tags: %s\n", joinStrings(r.Document.Tags))
+			fmt.Printf("   Tags: %s\n", joinStrings(r.Tags))
 		}
 		fmt.Printf("   %s\n", snippet)
 	}
@@ -0,0 +1,61 @@
 package cmd
 import (
 	"fmt"
 	"os"
 	"strconv"
 	"github.com/kb-search/kb/internal/api"
 	"github.com/kb-search/kb/internal/output"
 	"github.com/spf13/cobra"
 )
 var updatenoteCmd = &cobra.Command{
 	Use:   "updatenote <id> <text>",
 	Short: "Update an existing note's content",
 	Args: func(cmd *cobra.Command, args []string) error {
 		if len(args) < 2 {
 			return fmt.Errorf("requires document ID and text arguments\n\n  Usage: kb updatenote 42 \"updated note text\"")
 		}
 		if _, err := strconv.Atoi(args[0]); err != nil {
 			return fmt.Errorf("document ID must be an integer, got %q", args[0])
 		}
 		return nil
 	},
 	RunE: runUpdatenote,
 }
 func init() {
 	rootCmd.AddCommand(updatenoteCmd)
 }
 func runUpdatenote(cmd *cobra.Command, args []string) error {
 	docID := args[0]
 	text := args[1]
 	client := api.NewClient()
 	body := map[string]string{"text": text}
 	resp, err := client.Patch(fmt.Sprintf("/api/v1/notes/%s", docID), body)
 	if err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	if err := api.CheckError(resp); err != nil {
 		fmt.Fprintln(os.Stderr, err)
 		os.Exit(1)
 	}
 	var result interface{}
 	if err := api.DecodeJSON(resp, &result); err != nil {
 		return fmt.Errorf("failed to decode response: %w", err)
 	}
 	if output.IsJSON() {
 		output.PrintJSON(result)
 	} else {
 		fmt.Printf("Updated note %s\n", docID)
 	}
 	return nil
 }
@@ -7,6 +7,9 @@ import (
 	"io"
 	"mime/multipart"
 	"net/http"
 	"os"
 	"strconv"
 	"strings"
 	"github.com/kb-search/kb/internal/config"
 )
@@ -18,11 +21,25 @@ type FileUpload struct {
 	Reader    io.Reader
 }
 // Package-level version info, set once by cmd.init via SetVersionInfo.
 var (
 	clientVersion    string
 	minEngineVersion string
 )
 // SetVersionInfo configures the client and minimum engine version for compatibility checking.
 // Called once from cmd package initialization.
 func SetVersionInfo(cv, minEV string) {
 	clientVersion = cv
 	minEngineVersion = minEV
 }
 // Client is an HTTP client for the kb-search engine API.
 type Client struct {
-	baseURL    string
+	baseURL        string
-	apiKey     string
+	apiKey         string
-	httpClient *http.Client
+	httpClient     *http.Client
 	versionChecked bool
 }
 // NewClient creates a Client from the current configuration.
@@ -48,6 +65,7 @@ func (c *Client) newRequest(method, path string, body io.Reader) (*http.Request,
 }
 func (c *Client) do(req *http.Request) (*http.Response, error) {
 	c.checkEngineVersion()
 	resp, err := c.httpClient.Do(req)
 	if err != nil {
 		return nil, fmt.Errorf("Cannot reach engine at %s: %v", c.baseURL, err)
@@ -55,6 +73,75 @@ func (c *Client) do(req *http.Request) (*http.Response, error) {
 	return resp, nil
 }
 func (c *Client) checkEngineVersion() {
 	if c.versionChecked {
 		return
 	}
 	c.versionChecked = true
 	minVer := minEngineVersion
 	if minVer == "" || minVer == "dev" {
 		return
 	}
 	statusReq, err := c.newRequest(http.MethodGet, "/api/v1/status", nil)
 	if err != nil {
 		return
 	}
 	resp, err := c.httpClient.Do(statusReq)
 	if err != nil {
 		return // unreachable — let the actual request surface the error
 	}
 	defer resp.Body.Close()
 	if resp.StatusCode != http.StatusOK {
 		return // auth error or other issue — let the actual request surface it
 	}
 	var status struct {
 		Version string `json:"version"`
 	}
 	if err := json.NewDecoder(resp.Body).Decode(&status); err != nil {
 		return
 	}
 	if !semverAtLeast(status.Version, minVer) {
 		fmt.Fprintf(os.Stderr, "Error: kb client v%s requires engine v%s+ (connected engine is v%s)\nUpdate your engine image to engine-v%s or later.\n",
 			clientVersion, minVer, status.Version, minVer)
 		os.Exit(1)
 	}
 }
 // semverAtLeast returns true if version >= minimum, comparing major.minor.patch.
 func semverAtLeast(version, minimum string) bool {
 	parse := func(s string) (int, int, int) {
 		s = strings.TrimPrefix(s, "v")
 		parts := strings.SplitN(s, ".", 3)
 		var major, minor, patch int
 		if len(parts) >= 1 {
 			major, _ = strconv.Atoi(parts[0])
 		}
 		if len(parts) >= 2 {
 			minor, _ = strconv.Atoi(parts[1])
 		}
 		if len(parts) >= 3 {
 			patch, _ = strconv.Atoi(parts[2])
 		}
 		return major, minor, patch
 	}
 	vMaj, vMin, vPat := parse(version)
 	mMaj, mMin, mPat := parse(minimum)
 	if vMaj != mMaj {
 		return vMaj > mMaj
 	}
 	if vMin != mMin {
 		return vMin > mMin
 	}
 	return vPat >= mPat
 }
 // Get performs a GET request to the given path.
 func (c *Client) Get(path string) (*http.Response, error) {
 	req, err := c.newRequest(http.MethodGet, path, nil)
@@ -134,6 +221,20 @@ func (c *Client) Put(path string, body interface{}) (*http.Response, error) {
 	return c.do(req)
 }
 // Patch performs a PATCH request with a JSON body.
 func (c *Client) Patch(path string, body interface{}) (*http.Response, error) {
 	data, err := json.Marshal(body)
 	if err != nil {
 		return nil, fmt.Errorf("failed to marshal request body: %w", err)
 	}
 	req, err := c.newRequest(http.MethodPatch, path, bytes.NewReader(data))
 	if err != nil {
 		return nil, err
 	}
 	req.Header.Set("Content-Type", "application/json")
 	return c.do(req)
 }
 // DecodeJSON reads the response body and decodes it into target.
 func DecodeJSON(resp *http.Response, target interface{}) error {
 	defer resp.Body.Close()
@@ -0,0 +1,136 @@
 package api
 import (
 	"encoding/json"
 	"net/http"
 	"net/http/httptest"
 	"testing"
 )
 func TestSemverAtLeast(t *testing.T) {
 	tests := []struct {
 		version  string
 		minimum  string
 		expected bool
 	}{
 		{"2.1.0", "2.0.0", true},
 		{"2.0.0", "2.0.0", true},
 		{"2.0.5", "2.0.0", true},
 		{"2.1.5", "2.1.0", true},
 		{"2.0.9", "2.1.0", false},
 		{"1.9.9", "2.0.0", false},
 		{"3.0.0", "2.9.9", true},
 		{"2.0.0", "2.0.1", false},
 	}
 	for _, tt := range tests {
 		t.Run(tt.version+">="+tt.minimum, func(t *testing.T) {
 			got := semverAtLeast(tt.version, tt.minimum)
 			if got != tt.expected {
 				t.Errorf("semverAtLeast(%q, %q) = %v, want %v", tt.version, tt.minimum, got, tt.expected)
 			}
 		})
 	}
 }
 func TestCheckEngineVersion_Compatible(t *testing.T) {
 	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		json.NewEncoder(w).Encode(map[string]string{"version": "2.1.0"})
 	}))
 	defer srv.Close()
 	clientVersion = "2.2.0"
 	minEngineVersion = "2.1.0"
 	defer func() { clientVersion = ""; minEngineVersion = "" }()
 	c := &Client{
 		baseURL:    srv.URL,
 		httpClient: &http.Client{},
 	}
 	// Should not panic or exit
 	c.checkEngineVersion()
 	if !c.versionChecked {
 		t.Error("versionChecked should be true after check")
 	}
 }
 func TestCheckEngineVersion_SkipsWhenDev(t *testing.T) {
 	clientVersion = "dev"
 	minEngineVersion = "dev"
 	defer func() { clientVersion = ""; minEngineVersion = "" }()
 	c := &Client{
 		baseURL:    "http://localhost:99999",
 		httpClient: &http.Client{},
 	}
 	// Should not attempt connection
 	c.checkEngineVersion()
 	if !c.versionChecked {
 		t.Error("versionChecked should be true after skipping")
 	}
 }
 func TestCheckEngineVersion_SkipsWhenEmpty(t *testing.T) {
 	clientVersion = "1.0.0"
 	minEngineVersion = ""
 	defer func() { clientVersion = ""; minEngineVersion = "" }()
 	c := &Client{
 		baseURL:    "http://localhost:99999",
 		httpClient: &http.Client{},
 	}
 	c.checkEngineVersion()
 	if !c.versionChecked {
 		t.Error("versionChecked should be true after skipping")
 	}
 }
 func TestCheckEngineVersion_SkipsWhenUnreachable(t *testing.T) {
 	clientVersion = "2.0.0"
 	minEngineVersion = "2.0.0"
 	defer func() { clientVersion = ""; minEngineVersion = "" }()
 	c := &Client{
 		baseURL:    "http://localhost:99999",
 		httpClient: &http.Client{},
 	}
 	// Should not panic — just skip
 	c.checkEngineVersion()
 	if !c.versionChecked {
 		t.Error("versionChecked should be true even when unreachable")
 	}
 }
 func TestCheckEngineVersion_CachedAfterFirstCall(t *testing.T) {
 	callCount := 0
 	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		callCount++
 		json.NewEncoder(w).Encode(map[string]string{"version": "2.1.0"})
 	}))
 	defer srv.Close()
 	clientVersion = "2.1.0"
 	minEngineVersion = "2.0.0"
 	defer func() { clientVersion = ""; minEngineVersion = "" }()
 	c := &Client{
 		baseURL:    srv.URL,
 		httpClient: &http.Client{},
 	}
 	c.checkEngineVersion()
 	c.checkEngineVersion()
 	c.checkEngineVersion()
 	if callCount != 1 {
 		t.Errorf("expected 1 status call, got %d", callCount)
 	}
 }
@@ -0,0 +1,36 @@
 FROM ubuntu:24.04
 ENV DEBIAN_FRONTEND=noninteractive
 RUN apt-get update && apt-get install -y --no-install-recommends \
    python3.12 python3.12-venv python3.12-dev python3-pip \
    libpoppler-cpp-dev poppler-utils \
    libgl1 libglib2.0-0 \
    build-essential curl \
    && rm -rf /var/lib/apt/lists/*
 COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
 WORKDIR /app
 COPY pyproject.toml ./
 COPY kb/ kb/
 COPY main.py ./
 COPY VERSION ./
 RUN uv venv .venv && \
    . .venv/bin/activate && \
    uv pip install -e . && \
    uv pip install "sentence-transformers[onnx]" && \
    uv pip install --reinstall torch torchvision --index-url https://download.pytorch.org/whl/cpu
 ENV PATH="/app/.venv/bin:$PATH"
 ENV VIRTUAL_ENV="/app/.venv"
 ENV KB_DEVICE=cpu
 ENV KB_INGEST_DEVICE=cpu
 ENV KB_DATA_DIR=/data
 EXPOSE 8000
 VOLUME ["/data"]
 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
@@ -20,8 +20,8 @@ COPY VERSION ./
 RUN uv venv .venv && \
    . .venv/bin/activate && \
-    uv pip install -e . && \
+    UV_HTTP_TIMEOUT=600 uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130 && \
-    uv pip install --no-deps onnxruntime-gpu
+    uv pip install -e .
 ENV PATH="/app/.venv/bin:$PATH"
 ENV VIRTUAL_ENV="/app/.venv"
@@ -1,68 +0,0 @@
 # Stage 1: Build — install Python deps with dev tools available
 FROM rocm/dev-ubuntu-24.04:6.4-complete AS builder
 ENV DEBIAN_FRONTEND=noninteractive
 RUN apt-get update && apt-get install -y --no-install-recommends \
    python3.12 python3.12-venv python3.12-dev python3-pip \
    libpoppler-cpp-dev poppler-utils \
    build-essential curl \
    && rm -rf /var/lib/apt/lists/*
 COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
 WORKDIR /app
 COPY pyproject.toml ./
 COPY kb/ kb/
 COPY main.py ./
 COPY VERSION ./
 RUN uv venv .venv && \
    . .venv/bin/activate && \
    uv pip install -e . && \
    uv pip install --no-deps onnxruntime-rocm
 # Stage 2: Runtime — minimal ROCm runtime libs only
 FROM ubuntu:24.04
 ENV DEBIAN_FRONTEND=noninteractive
 # Add ROCm apt repository
 RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates curl gnupg \
    && mkdir -p /etc/apt/keyrings \
    && curl -fsSL https://repo.radeon.com/rocm/rocm.gpg.key \
       | gpg --dearmor -o /etc/apt/keyrings/rocm.gpg \
    && echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4.1 noble main" \
       > /etc/apt/sources.list.d/rocm.list \
    && printf 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600\n' \
       > /etc/apt/preferences.d/rocm-pin-600 \
    && apt-get update && apt-get install -y --no-install-recommends \
    python3.12 python3.12-venv \
    libpoppler-cpp0t64 poppler-utils \
    libgl1 libglib2.0-0 \
    rocm-hip-runtime \
    rocm-hip-libraries \
    miopen-hip \
    && rm -rf /var/lib/apt/lists/*
 WORKDIR /app
 # Copy built venv and application from builder
 COPY --from=builder /app/.venv .venv
 COPY --from=builder /app/kb kb
 COPY --from=builder /app/main.py .
 COPY --from=builder /app/pyproject.toml .
 COPY --from=builder /app/VERSION .
 ENV PATH="/app/.venv/bin:$PATH"
 ENV VIRTUAL_ENV="/app/.venv"
 ENV KB_DEVICE=auto
 ENV KB_INGEST_DEVICE=auto
 ENV KB_DATA_DIR=/data
 EXPOSE 8000
 VOLUME ["/data"]
 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
@@ -1 +1 @@
-2.0.5
+3.2.1
@@ -0,0 +1,33 @@
 services:
  kb-engine:
    build:
      context: .
      dockerfile: Dockerfile.cpu
    ports:
      - "${KB_PORT:-8000}:8000"
    volumes:
      - ${KB_DATA_PATH:-./data}:/data
    environment:
      - KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
      - KB_DEVICE=cpu
      - KB_INGEST_DEVICE=cpu
      - KB_API_KEY=${KB_API_KEY:-}
      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
    restart: unless-stopped
  kb-mcp:
    build:
      context: ../mcp
      dockerfile: Dockerfile
    ports:
      - "${KB_MCP_PORT:-3000}:3000"
    environment:
      - KB_ENGINE_URL=http://kb-engine:8000
      - KB_API_KEY=${KB_API_KEY:-}
      - KB_MCP_API_KEY=${KB_MCP_API_KEY:-}
      # Comma-separated IPs/FQDNs allowed to connect remotely (e.g. 192.168.1.50,kb.example.com)
      - KB_MCP_ALLOWED_HOSTS=${KB_MCP_ALLOWED_HOSTS:-}
    depends_on:
      - kb-engine
    restart: unless-stopped
@@ -21,4 +21,21 @@ services:
      - KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
      - KB_API_KEY=${KB_API_KEY:-}
      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
    restart: unless-stopped
  kb-mcp:
    build:
      context: ../mcp
      dockerfile: Dockerfile
    ports:
      - "${KB_MCP_PORT:-3000}:3000"
    environment:
      - KB_ENGINE_URL=http://kb-engine:8000
      - KB_API_KEY=${KB_API_KEY:-}
      - KB_MCP_API_KEY=${KB_MCP_API_KEY:-}
      # Comma-separated IPs/FQDNs allowed to connect remotely (e.g. 192.168.1.50,kb.example.com)
      - KB_MCP_ALLOWED_HOSTS=${KB_MCP_ALLOWED_HOSTS:-}
    depends_on:
      - kb-engine
    restart: unless-stopped
@@ -1,21 +0,0 @@
 services:
  kb-engine:
    build:
      context: .
      dockerfile: Dockerfile.rocm
    devices:
      - "/dev/kfd"
      - "/dev/dri"
    group_add:
      - "video"
    ports:
      - "${KB_PORT:-8000}:8000"
    volumes:
      - ${KB_DATA_PATH:-./data}:/data
    environment:
      - KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
      - KB_DEVICE=${KB_DEVICE:-auto}
      - KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
      - KB_API_KEY=${KB_API_KEY:-}
      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
    restart: unless-stopped
@@ -0,0 +1,4 @@
 #!/bin/bash
 docker stop engine-kb-engine-1
 KB_MODEL=BAAI/bge-base-en-v1.5 KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d --build
@@ -20,6 +20,7 @@ class Config:
        self.ingest_device = os.environ.get("KB_INGEST_DEVICE", "auto")
        self.api_key = os.environ.get("KB_API_KEY") or None
        self.search_threshold = float(os.environ.get("KB_SEARCH_THRESHOLD", "0.01"))
        self.bulk_safety_percent = int(os.environ.get("KB_BULK_SAFETY_PERCENT", "70"))
        self.host = os.environ.get("KB_HOST", "0.0.0.0")
        self.port = int(os.environ.get("KB_PORT", "8000"))
@@ -35,10 +36,15 @@ class Config:
    def staging_dir(self) -> Path:
        return self.data_dir / "staging"
    @property
    def documents_dir(self) -> Path:
        return self.data_dir / "documents"
    def ensure_dirs(self):
        self.data_dir.mkdir(parents=True, exist_ok=True)
        self.hf_cache.mkdir(exist_ok=True)
        self.staging_dir.mkdir(exist_ok=True)
        self.documents_dir.mkdir(exist_ok=True)
 cfg = Config()
@@ -10,6 +10,60 @@ import struct
 from typing import Any, Optional
 def build_enriched_text(title: str, chunk_text: str, metadata: dict | None = None) -> str:
    """Build enriched text by prepending document title and optional section header.
    Format: "{title} > {section_header}\\n\\n{chunk_text}" or "{title}\\n\\n{chunk_text}".
    """
    section_header = (metadata or {}).get("section_header")
    if section_header:
        return f"{title} > {section_header}\n\n{chunk_text}"
    return f"{title}\n\n{chunk_text}"
 def _backfill_enriched_text(conn: sqlite3.Connection) -> None:
    """Backfill enriched_text for all existing chunks."""
    rows = conn.execute(
        "SELECT c.id, c.text, c.metadata, d.title "
        "FROM chunks c JOIN documents d ON c.document_id = d.id"
    ).fetchall()
    for row in rows:
        metadata = json.loads(row["metadata"]) if row["metadata"] else None
        enriched = build_enriched_text(row["title"], row["text"], metadata)
        conn.execute("UPDATE chunks SET enriched_text = ? WHERE id = ?", (enriched, row["id"]))
 def _rebuild_fts(conn: sqlite3.Connection) -> None:
    """Drop and recreate chunks_fts to index enriched_text, with updated triggers."""
    conn.executescript("""
        DROP TRIGGER IF EXISTS chunks_ai;
        DROP TRIGGER IF EXISTS chunks_ad;
        DROP TRIGGER IF EXISTS chunks_au;
        DROP TABLE IF EXISTS chunks_fts;
        CREATE VIRTUAL TABLE chunks_fts USING fts5(
            text,
            content=chunks,
            content_rowid=id
        );
        CREATE TRIGGER chunks_ai AFTER INSERT ON chunks BEGIN
            INSERT INTO chunks_fts(rowid, text) VALUES (new.id, new.enriched_text);
        END;
        CREATE TRIGGER chunks_ad AFTER DELETE ON chunks BEGIN
            INSERT INTO chunks_fts(chunks_fts, rowid, text) VALUES ('delete', old.id, old.enriched_text);
        END;
        CREATE TRIGGER chunks_au AFTER UPDATE ON chunks BEGIN
            INSERT INTO chunks_fts(chunks_fts, rowid, text) VALUES ('delete', old.id, old.enriched_text);
            INSERT INTO chunks_fts(rowid, text) VALUES (new.id, new.enriched_text);
        END;
    """)
    # Repopulate FTS from existing enriched_text
    conn.execute("INSERT INTO chunks_fts(rowid, text) SELECT id, enriched_text FROM chunks")
 def get_connection(db_path: str) -> sqlite3.Connection:
    """Return a sqlite3 connection with WAL mode, Row factory, and foreign keys enabled."""
    import sqlite_vec
@@ -20,6 +74,7 @@ def get_connection(db_path: str) -> sqlite3.Connection:
    conn.enable_load_extension(False)
    conn.row_factory = sqlite3.Row
    conn.execute("PRAGMA journal_mode=WAL")
    conn.execute("PRAGMA busy_timeout=5000")
    conn.execute("PRAGMA foreign_keys=ON")
    return conn
@@ -34,6 +89,8 @@ def init_schema(conn: sqlite3.Connection, embedding_dim: int) -> None:
            content_hash TEXT UNIQUE,
            doc_type TEXT,
            language TEXT,
            stored_path TEXT,
            original_filename TEXT,
            created_at TEXT DEFAULT current_timestamp
        );
@@ -42,6 +99,7 @@ def init_schema(conn: sqlite3.Connection, embedding_dim: int) -> None:
            document_id INTEGER REFERENCES documents(id) ON DELETE CASCADE,
            chunk_index INTEGER,
            text TEXT,
            enriched_text TEXT,
            token_count INTEGER,
            metadata TEXT DEFAULT '{{}}',
            UNIQUE(document_id, chunk_index)
@@ -53,18 +111,18 @@ def init_schema(conn: sqlite3.Connection, embedding_dim: int) -> None:
            content_rowid=id
        );
-        -- Triggers to keep FTS index in sync with chunks table
+        -- Triggers to keep FTS index in sync with chunks table (using enriched_text)
        CREATE TRIGGER IF NOT EXISTS chunks_ai AFTER INSERT ON chunks BEGIN
-            INSERT INTO chunks_fts(rowid, text) VALUES (new.id, new.text);
+            INSERT INTO chunks_fts(rowid, text) VALUES (new.id, new.enriched_text);
        END;
        CREATE TRIGGER IF NOT EXISTS chunks_ad AFTER DELETE ON chunks BEGIN
-            INSERT INTO chunks_fts(chunks_fts, rowid, text) VALUES ('delete', old.id, old.text);
+            INSERT INTO chunks_fts(chunks_fts, rowid, text) VALUES ('delete', old.id, old.enriched_text);
        END;
        CREATE TRIGGER IF NOT EXISTS chunks_au AFTER UPDATE ON chunks BEGIN
-            INSERT INTO chunks_fts(chunks_fts, rowid, text) VALUES ('delete', old.id, old.text);
+            INSERT INTO chunks_fts(chunks_fts, rowid, text) VALUES ('delete', old.id, old.enriched_text);
-            INSERT INTO chunks_fts(rowid, text) VALUES (new.id, new.text);
+            INSERT INTO chunks_fts(rowid, text) VALUES (new.id, new.enriched_text);
        END;
        CREATE TABLE IF NOT EXISTS tags (
@@ -114,6 +172,29 @@ def init_schema(conn: sqlite3.Connection, embedding_dim: int) -> None:
    if "content_hash" not in cols:
        conn.execute("ALTER TABLE jobs ADD COLUMN content_hash TEXT")
    # Migrate: add stored_path and original_filename to documents if missing
    doc_cols = {row[1] for row in conn.execute("PRAGMA table_info(documents)").fetchall()}
    if "stored_path" not in doc_cols:
        conn.execute("ALTER TABLE documents ADD COLUMN stored_path TEXT")
    if "original_filename" not in doc_cols:
        conn.execute("ALTER TABLE documents ADD COLUMN original_filename TEXT")
    # Migrate: add enriched_text to chunks and rebuild FTS to index it
    chunk_cols = {row[1] for row in conn.execute("PRAGMA table_info(chunks)").fetchall()}
    if "enriched_text" not in chunk_cols:
        conn.execute("ALTER TABLE chunks ADD COLUMN enriched_text TEXT")
        _backfill_enriched_text(conn)
        _rebuild_fts(conn)
    # Migrate: add updated_at to documents if missing (v3.0.0)
    if "updated_at" not in doc_cols:
        conn.execute("ALTER TABLE documents ADD COLUMN updated_at TEXT")
    # Migrate: add job_type to jobs if missing (bulk operations)
    job_cols = {row[1] for row in conn.execute("PRAGMA table_info(jobs)").fetchall()}
    if "job_type" not in job_cols:
        conn.execute("ALTER TABLE jobs ADD COLUMN job_type TEXT DEFAULT 'ingest'")
    conn.commit()
@@ -196,6 +277,7 @@ def insert_chunk(
    document_id: int,
    chunk_index: int,
    text: str,
    enriched_text: str | None = None,
    token_count: Optional[int] = None,
    metadata: Any = None,
 ) -> int:
@@ -208,8 +290,8 @@ def insert_chunk(
        metadata_str = str(metadata)
    cur = conn.execute(
-        "INSERT INTO chunks(document_id, chunk_index, text, token_count, metadata) VALUES (?, ?, ?, ?, ?)",
+        "INSERT INTO chunks(document_id, chunk_index, text, enriched_text, token_count, metadata) VALUES (?, ?, ?, ?, ?, ?)",
-        (document_id, chunk_index, text, token_count, metadata_str),
+        (document_id, chunk_index, text, enriched_text or text, token_count, metadata_str),
    )
    conn.commit()
    return cur.lastrowid
@@ -253,6 +335,92 @@ def untag_document(conn: sqlite3.Connection, document_id: int, tag_names: list[s
    conn.commit()
 # ---------------------------------------------------------------------------
 # Bulk operation helpers
 # ---------------------------------------------------------------------------
 def resolve_bulk_selection(
    conn: sqlite3.Connection,
    document_ids: list[int] | None = None,
    tags: list[str] | None = None,
    doc_type: str | None = None,
    from_id: int | None = None,
    to_id: int | None = None,
 ) -> list[int]:
    """Return document IDs matching the bulk selection filter.
    Filters combine with AND logic. At least one filter must be provided.
    """
    sql = "SELECT DISTINCT d.id FROM documents d"
    joins: list[str] = []
    where: list[str] = []
    params: list = []
    if tags:
        for i, tag in enumerate(tags):
            joins.append(f"JOIN document_tags dt{i} ON d.id = dt{i}.document_id")
            joins.append(f"JOIN tags t{i} ON dt{i}.tag_id = t{i}.id")
            where.append(f"t{i}.name = ?")
            params.append(tag)
    if doc_type:
        where.append("d.doc_type = ?")
        params.append(doc_type)
    if document_ids:
        placeholders = ",".join("?" for _ in document_ids)
        where.append(f"d.id IN ({placeholders})")
        params.extend(document_ids)
    if from_id is not None:
        where.append("d.id >= ?")
        params.append(from_id)
    if to_id is not None:
        where.append("d.id <= ?")
        params.append(to_id)
    if joins:
        sql += " " + " ".join(joins)
    if where:
        sql += " WHERE " + " AND ".join(where)
    rows = conn.execute(sql, params).fetchall()
    return [row["id"] for row in rows]
 def create_bulk_job(
    conn: sqlite3.Connection,
    job_type: str,
    filters_json: str,
    matched: int,
    succeeded: int,
    failed: int,
    errors_json: str = "[]",
 ) -> int:
    """Create an audit log entry for a bulk operation and return its id."""
    cur = conn.execute(
        """INSERT INTO jobs(filename, status, job_type, document_id, chunk_count, error, completed_at)
           VALUES (?, ?, ?, ?, ?, ?, current_timestamp)""",
        (
            filters_json,
            "done" if failed == 0 else "partial_failure",
            job_type,
            matched,
            succeeded,
            errors_json if failed > 0 else None,
        ),
    )
    conn.commit()
    return cur.lastrowid
 def count_documents(conn: sqlite3.Connection) -> int:
    """Return total number of documents in the database."""
    row = conn.execute("SELECT COUNT(*) AS cnt FROM documents").fetchone()
    return row["cnt"]
 # ---------------------------------------------------------------------------
 # Vec table management
 # ---------------------------------------------------------------------------
@@ -1 +1 @@
-from kb.routes import health, search, jobs, documents, tags, status, reindex, auth
+from kb.routes import health, search, jobs, documents, tags, status, reindex, auth, notes
@@ -0,0 +1,281 @@
 """Bulk operation endpoints — delete, tag, and set-tags on multiple documents."""
 import json
 import logging
 from pathlib import Path
 from typing import Optional
 from fastapi import HTTPException
 from pydantic import BaseModel, model_validator
 from main import app
 from kb.config import cfg
 from kb.database import (
    get_connection,
    resolve_bulk_selection,
    count_documents,
    create_bulk_job,
    tag_document,
    untag_document,
 )
 logger = logging.getLogger("kb.routes.bulk")
 # ---------------------------------------------------------------------------
 # Request models
 # ---------------------------------------------------------------------------
 class BulkSelectionRequest(BaseModel):
    document_ids: Optional[list[int]] = None
    tags: Optional[list[str]] = None
    doc_type: Optional[str] = None
    from_id: Optional[int] = None
    to_id: Optional[int] = None
    force: bool = False
    @model_validator(mode="after")
    def require_at_least_one_filter(self):
        if not any([self.document_ids, self.tags, self.doc_type,
                     self.from_id is not None, self.to_id is not None]):
            raise ValueError("At least one selection filter is required")
        return self
 class BulkDeleteRequest(BulkSelectionRequest):
    pass
 class BulkTagsRequest(BulkSelectionRequest):
    add: Optional[list[str]] = None
    remove: Optional[list[str]] = None
    @model_validator(mode="after")
    def require_add_or_remove(self):
        if not self.add and not self.remove:
            raise ValueError("At least one of 'add' or 'remove' is required")
        return self
 class BulkSetTagsRequest(BulkSelectionRequest):
    new_tags: list[str]
 # ---------------------------------------------------------------------------
 # Shared helpers
 # ---------------------------------------------------------------------------
 def _check_safety_threshold(matched: int, total: int, force: bool) -> None:
    """Raise 409 if the operation would affect too many documents."""
    threshold = cfg.bulk_safety_percent
    if threshold <= 0 or force or total == 0:
        return
    percent = (matched / total) * 100
    if percent > threshold:
        raise HTTPException(
            status_code=409,
            detail={
                "error": "safety_threshold_exceeded",
                "message": (
                    f"Operation would affect {matched} of {total} documents "
                    f"({percent:.1f}%). Exceeds safety threshold of {threshold}%. "
                    f"Use force: true to proceed."
                ),
                "matched": matched,
                "total": total,
                "percent": round(percent, 1),
                "threshold": threshold,
            },
        )
 def _filters_dict(req: BulkSelectionRequest) -> str:
    """Build a JSON string of the selection filter for audit logging."""
    d = {}
    if req.document_ids:
        d["document_ids"] = req.document_ids
    if req.tags:
        d["tags"] = req.tags
    if req.doc_type:
        d["doc_type"] = req.doc_type
    if req.from_id is not None:
        d["from_id"] = req.from_id
    if req.to_id is not None:
        d["to_id"] = req.to_id
    return json.dumps(d)
 # ---------------------------------------------------------------------------
 # Endpoints
 # ---------------------------------------------------------------------------
@app.post("/api/v1/bulk/delete")
 async def bulk_delete(req: BulkDeleteRequest):
    conn = get_connection(cfg.db_path)
    try:
        doc_ids = resolve_bulk_selection(
            conn, req.document_ids, req.tags, req.doc_type, req.from_id, req.to_id,
        )
        total = count_documents(conn)
        _check_safety_threshold(len(doc_ids), total, req.force)
        succeeded = 0
        failed = 0
        errors = []
        stored_files: list[str] = []
        for doc_id in doc_ids:
            try:
                doc = conn.execute(
                    "SELECT id, stored_path FROM documents WHERE id = ?", (doc_id,)
                ).fetchone()
                if not doc:
                    failed += 1
                    errors.append({"document_id": doc_id, "error": "not found"})
                    continue
                if doc["stored_path"]:
                    stored_files.append(doc["stored_path"])
                # Delete embeddings
                chunk_ids = conn.execute(
                    "SELECT id FROM chunks WHERE document_id = ?", (doc_id,)
                ).fetchall()
                for row in chunk_ids:
                    conn.execute("DELETE FROM chunks_vec WHERE chunk_id = ?", (row["id"],))
                # Delete document (cascades to chunks, document_tags)
                conn.execute("DELETE FROM documents WHERE id = ?", (doc_id,))
                succeeded += 1
            except Exception as exc:
                failed += 1
                errors.append({"document_id": doc_id, "error": str(exc)})
        conn.commit()
        # Best-effort file cleanup after commit
        for path in stored_files:
            try:
                f = Path(path)
                if f.exists():
                    f.unlink()
            except OSError as exc:
                logger.warning("Failed to delete stored file %s: %s", path, exc)
        errors_json = json.dumps(errors) if errors else "[]"
        job_id = create_bulk_job(
            conn, "bulk_delete", _filters_dict(req),
            len(doc_ids), succeeded, failed, errors_json,
        )
        return {
            "job_id": job_id,
            "status": "done" if failed == 0 else "partial_failure",
            "matched": len(doc_ids),
            "succeeded": succeeded,
            "failed": failed,
            "errors": errors,
        }
    finally:
        conn.close()
@app.post("/api/v1/bulk/tags")
 async def bulk_tags(req: BulkTagsRequest):
    conn = get_connection(cfg.db_path)
    try:
        doc_ids = resolve_bulk_selection(
            conn, req.document_ids, req.tags, req.doc_type, req.from_id, req.to_id,
        )
        total = count_documents(conn)
        _check_safety_threshold(len(doc_ids), total, req.force)
        succeeded = 0
        failed = 0
        errors = []
        for doc_id in doc_ids:
            try:
                if req.add:
                    tag_document(conn, doc_id, req.add)
                if req.remove:
                    untag_document(conn, doc_id, req.remove)
                conn.execute(
                    "UPDATE documents SET updated_at = current_timestamp WHERE id = ?",
                    (doc_id,),
                )
                succeeded += 1
            except Exception as exc:
                failed += 1
                errors.append({"document_id": doc_id, "error": str(exc)})
        conn.commit()
        errors_json = json.dumps(errors) if errors else "[]"
        job_id = create_bulk_job(
            conn, "bulk_tags", _filters_dict(req),
            len(doc_ids), succeeded, failed, errors_json,
        )
        return {
            "job_id": job_id,
            "status": "done" if failed == 0 else "partial_failure",
            "matched": len(doc_ids),
            "succeeded": succeeded,
            "failed": failed,
            "errors": errors,
        }
    finally:
        conn.close()
@app.post("/api/v1/bulk/set-tags")
 async def bulk_set_tags(req: BulkSetTagsRequest):
    conn = get_connection(cfg.db_path)
    try:
        doc_ids = resolve_bulk_selection(
            conn, req.document_ids, req.tags, req.doc_type, req.from_id, req.to_id,
        )
        total = count_documents(conn)
        _check_safety_threshold(len(doc_ids), total, req.force)
        succeeded = 0
        failed = 0
        errors = []
        for doc_id in doc_ids:
            try:
                # Remove all existing tags
                conn.execute(
                    "DELETE FROM document_tags WHERE document_id = ?", (doc_id,)
                )
                # Apply new tag set
                if req.new_tags:
                    tag_document(conn, doc_id, req.new_tags)
                conn.execute(
                    "UPDATE documents SET updated_at = current_timestamp WHERE id = ?",
                    (doc_id,),
                )
                succeeded += 1
            except Exception as exc:
                failed += 1
                errors.append({"document_id": doc_id, "error": str(exc)})
        conn.commit()
        errors_json = json.dumps(errors) if errors else "[]"
        job_id = create_bulk_job(
            conn, "bulk_set_tags", _filters_dict(req),
            len(doc_ids), succeeded, failed, errors_json,
        )
        return {
            "job_id": job_id,
            "status": "done" if failed == 0 else "partial_failure",
            "matched": len(doc_ids),
            "succeeded": succeeded,
            "failed": failed,
            "errors": errors,
        }
    finally:
        conn.close()
@@ -1,14 +1,20 @@
 """Document management endpoints — list, view, and delete documents."""
 import json
 import logging
 import mimetypes
 from pathlib import Path
 from typing import Optional
 from fastapi import HTTPException, Query
 from fastapi.responses import FileResponse
 from main import app
 from kb.config import cfg
 from kb.database import get_connection
 logger = logging.getLogger("kb.routes.documents")
@app.get("/api/v1/documents")
 async def list_documents(
@@ -20,7 +26,7 @@ async def list_documents(
        sql = """
            SELECT d.id, d.title, d.doc_type,
                   (SELECT COUNT(*) FROM chunks c WHERE c.document_id = d.id) AS chunk_count,
-                   d.created_at
+                   d.created_at, d.updated_at
            FROM documents d
        """
        joins: list[str] = []
@@ -44,7 +50,7 @@ async def list_documents(
        if where:
            sql += " WHERE " + " AND ".join(where)
-        sql += " ORDER BY d.created_at DESC"
+        sql += " ORDER BY COALESCE(d.updated_at, d.created_at) DESC"
        rows = conn.execute(sql, params).fetchall()
@@ -68,6 +74,7 @@ async def list_documents(
                "tags": [t["name"] for t in tag_rows],
                "chunk_count": row["chunk_count"],
                "created_at": row["created_at"],
                "updated_at": row["updated_at"],
            })
        return results
@@ -100,8 +107,12 @@ async def get_document(doc_id: int):
            (doc_id,),
        ).fetchall()
        stored_path = doc["stored_path"]
        has_file = bool(stored_path and Path(stored_path).exists())
        return {
            **dict(doc),
            "has_file": has_file,
            "tags": [t["name"] for t in tag_rows],
            "chunks": [dict(c) for c in chunks],
        }
@@ -109,12 +120,53 @@ async def get_document(doc_id: int):
        conn.close()
@app.get("/api/v1/documents/{doc_id}/file")
 async def download_document_file(doc_id: int):
    conn = get_connection(cfg.db_path)
    try:
        doc = conn.execute(
            "SELECT id, title, stored_path, original_filename FROM documents WHERE id = ?",
            (doc_id,),
        ).fetchone()
        if not doc:
            raise HTTPException(status_code=404, detail="Document not found.")
        stored_path = doc["stored_path"]
        if not stored_path:
            raise HTTPException(
                status_code=404,
                detail="Original file not available - ingested before document storage was enabled.",
            )
        file_path = Path(stored_path)
        if not file_path.exists():
            raise HTTPException(
                status_code=404,
                detail="Stored file not found on disk.",
            )
        original_filename = doc["original_filename"]
        if not original_filename:
            ext = file_path.suffix
            original_filename = (doc["title"] or "document") + ext
        media_type = mimetypes.guess_type(original_filename)[0] or "application/octet-stream"
        return FileResponse(
            path=str(file_path),
            media_type=media_type,
            filename=original_filename,
        )
    finally:
        conn.close()
@app.delete("/api/v1/documents/{doc_id}")
 async def delete_document(doc_id: int):
    conn = get_connection(cfg.db_path)
    try:
        doc = conn.execute(
-            "SELECT id, title FROM documents WHERE id = ?", (doc_id,)
+            "SELECT id, title, stored_path FROM documents WHERE id = ?", (doc_id,)
        ).fetchone()
        if not doc:
            raise HTTPException(status_code=404, detail="Document not found.")
@@ -134,6 +186,19 @@ async def delete_document(doc_id: int):
        conn.execute("DELETE FROM documents WHERE id = ?", (doc_id,))
        conn.commit()
        # Delete stored file from disk
        stored_path = doc["stored_path"]
        if stored_path:
            try:
                file_path = Path(stored_path)
                if file_path.exists():
                    file_path.unlink()
                    logger.info("Deleted stored file: %s", stored_path)
                else:
                    logger.warning("Stored file already missing: %s", stored_path)
            except OSError as exc:
                logger.warning("Failed to delete stored file %s: %s", stored_path, exc)
        return {
            "status": "deleted",
            "document_id": doc_id,
@@ -0,0 +1,120 @@
 """Note mutation endpoint — update existing notes in place."""
 import hashlib
 import logging
 from fastapi import HTTPException
 from pydantic import BaseModel
 from main import app
 from kb.config import cfg
 from kb.database import (
    get_connection,
    build_enriched_text,
    insert_chunk,
    insert_embedding,
 )
 from kb.embeddings import embed_texts
 from kb.ingest.note import chunk_note
 logger = logging.getLogger("kb.routes.notes")
 class NoteUpdateRequest(BaseModel):
    text: str
@app.patch("/api/v1/notes/{doc_id}")
 async def update_note(doc_id: int, req: NoteUpdateRequest):
    conn = get_connection(cfg.db_path)
    try:
        doc = conn.execute(
            "SELECT id, title, doc_type FROM documents WHERE id = ?", (doc_id,)
        ).fetchone()
        if not doc:
            raise HTTPException(status_code=404, detail="Document not found.")
        if doc["doc_type"] != "note":
            raise HTTPException(
                status_code=422,
                detail="Only notes can be updated via this endpoint.",
            )
        title = doc["title"]
        # Delete existing chunks and their embeddings
        chunk_ids = conn.execute(
            "SELECT id FROM chunks WHERE document_id = ?", (doc_id,)
        ).fetchall()
        for row in chunk_ids:
            conn.execute("DELETE FROM chunks_vec WHERE chunk_id = ?", (row["id"],))
        conn.execute("DELETE FROM chunks WHERE document_id = ?", (doc_id,))
        # Run note chunking pipeline on new text
        chunks = chunk_note(req.text)
        chunk_texts = [c["text"] for c in chunks]
        chunk_metas = [
            {k: v for k, v in c.items() if k != "text"} or None for c in chunks
        ]
        enriched_texts = [
            build_enriched_text(title, ct, cm)
            for ct, cm in zip(chunk_texts, chunk_metas)
        ]
        # Embed — if this fails, the transaction rolls back
        vectors = embed_texts(enriched_texts)
        for idx, (chunk_text, enriched, vector) in enumerate(
            zip(chunk_texts, enriched_texts, vectors)
        ):
            chunk_id = insert_chunk(
                conn,
                document_id=doc_id,
                chunk_index=idx,
                text=chunk_text,
                enriched_text=enriched,
                metadata=chunk_metas[idx],
            )
            insert_embedding(conn, chunk_id, vector)
        # Update content_hash and updated_at
        content_hash = hashlib.sha256(req.text.encode("utf-8")).hexdigest()
        conn.execute(
            "UPDATE documents SET content_hash = ?, updated_at = current_timestamp WHERE id = ?",
            (content_hash, doc_id),
        )
        conn.commit()
        # Return updated document
        updated_doc = conn.execute(
            "SELECT * FROM documents WHERE id = ?", (doc_id,)
        ).fetchone()
        new_chunks = conn.execute(
            "SELECT * FROM chunks WHERE document_id = ? ORDER BY chunk_index",
            (doc_id,),
        ).fetchall()
        tag_rows = conn.execute(
            """
            SELECT t.name FROM tags t
            JOIN document_tags dt ON t.id = dt.tag_id
            WHERE dt.document_id = ?
            ORDER BY t.name
            """,
            (doc_id,),
        ).fetchall()
        return {
            **dict(updated_doc),
            "tags": [t["name"] for t in tag_rows],
            "chunks": [dict(c) for c in new_chunks],
        }
    except HTTPException:
        raise
    except Exception:
        conn.rollback()
        logger.exception("Failed to update note %d", doc_id)
        raise HTTPException(status_code=500, detail="Failed to update note.")
    finally:
        conn.close()
@@ -19,10 +19,10 @@ async def reindex():
    conn = get_connection(cfg.db_path)
    try:
-        # Fetch all chunks
+        # Fetch all chunks — use enriched_text for embedding (includes title context)
-        rows = conn.execute("SELECT id, text FROM chunks ORDER BY id").fetchall()
+        rows = conn.execute("SELECT id, enriched_text FROM chunks ORDER BY id").fetchall()
        chunk_ids = [row["id"] for row in rows]
-        chunk_texts = [row["text"] for row in rows]
+        chunk_texts = [row["enriched_text"] or "" for row in rows]
        logger.info("Reindexing %d chunks with model '%s'", len(chunk_ids), cfg.model)
@@ -48,6 +48,13 @@ async def update_document_tags(doc_id: int, req: TagUpdateRequest):
        if req.remove:
            untag_document(conn, doc_id, req.remove)
        if req.add or req.remove:
            conn.execute(
                "UPDATE documents SET updated_at = current_timestamp WHERE id = ?",
                (doc_id,),
            )
            conn.commit()
        tag_rows = conn.execute(
            """
            SELECT t.name FROM tags t
@@ -16,7 +16,8 @@ def stage_file(staging_dir: Path, filename: str, content: bytes) -> Path:
        The path to the newly created staged file.
    """
    staging_dir.mkdir(parents=True, exist_ok=True)
-    dest = staging_dir / f"{uuid.uuid4()}_{filename}"
+    safe_filename = filename.replace("/", "_").replace("\\", "_")
    dest = staging_dir / f"{uuid.uuid4()}_{safe_filename}"
    dest.write_bytes(content)
    logger.debug("Staged file: %s (%d bytes)", dest, len(content))
    return dest
@@ -31,7 +32,8 @@ def stage_note(staging_dir: Path, title: str, text: str) -> Path:
        The path to the newly created staged note file.
    """
    staging_dir.mkdir(parents=True, exist_ok=True)
-    dest = staging_dir / f"{uuid.uuid4()}_{title}.note"
+    safe_title = title.replace("/", "_").replace("\\", "_")
    dest = staging_dir / f"{uuid.uuid4()}_{safe_title}.note"
    dest.write_text(text, encoding="utf-8")
    logger.debug("Staged note: %s (%d chars)", dest, len(text))
    return dest
@@ -4,9 +4,11 @@ import asyncio
 import hashlib
 import json
 import logging
 import shutil
 from pathlib import Path
 from kb import config, database, embeddings, staging
 from kb.database import build_enriched_text
 from kb.ingest import detector
 logger = logging.getLogger("kb.worker")
@@ -145,20 +147,30 @@ def _process_job(job_row) -> tuple[str, int | None, int]:
        )
        chunk_texts = [c if isinstance(c, str) else c["text"] for c in chunks]
-        vectors = embeddings.embed_texts(chunk_texts)
+        chunk_metas = []
        for idx, c in enumerate(chunks):
            if isinstance(c, str):
                chunk_metas.append(None)
            else:
                meta = {k: v for k, v in c.items() if k != "text"} or None
                chunk_metas.append(meta)
-        for idx, (chunk_text, vector) in enumerate(zip(chunk_texts, vectors)):
+        enriched_texts = [
-            metadata = None
+            build_enriched_text(title, ct, cm)
-            if not isinstance(chunks[idx], str):
+            for ct, cm in zip(chunk_texts, chunk_metas)
-                metadata = {
+        ]
-                    k: v for k, v in chunks[idx].items() if k != "text"
+        vectors = embeddings.embed_texts(enriched_texts)
-                } or None
+
        for idx, (chunk_text, enriched, vector) in enumerate(
            zip(chunk_texts, enriched_texts, vectors)
        ):
            chunk_id = database.insert_chunk(
                conn,
                document_id=doc_id,
                chunk_index=idx,
                text=chunk_text,
-                metadata=metadata,
+                enriched_text=enriched,
                metadata=chunk_metas[idx],
            )
            database.insert_embedding(conn, chunk_id, vector)
@@ -168,8 +180,31 @@ def _process_job(job_row) -> tuple[str, int | None, int]:
            database.tag_document(conn, doc_id, tags)
        conn.commit()
        # --- Move original file to persistent storage ---------------------
        ext = Path(filename).suffix or staged_path.suffix
        dest = cfg.documents_dir / f"{content_hash}{ext}"
        try:
            cfg.documents_dir.mkdir(parents=True, exist_ok=True)
            shutil.move(str(staged_path), str(dest))
            conn_update = database.get_connection(cfg.db_path)
            try:
                conn_update.execute(
                    "UPDATE documents SET stored_path = ?, original_filename = ? WHERE id = ?",
                    (str(dest), filename, doc_id),
                )
                conn_update.commit()
            finally:
                conn_update.close()
            logger.info("Stored original file: %s", dest)
        except Exception as exc:
            logger.warning("Failed to store original file: %s", exc)
            staging.cleanup(staged_path)
        return ("done", doc_id, len(chunk_texts))
    finally:
        conn.close()
-        staging.cleanup(staged_path)
+        # Only clean up staging if the file is still there (not moved)
        if staged_path.exists():
            staging.cleanup(staged_path)
@@ -62,7 +62,7 @@ async def lifespan(app: FastAPI):
 app = FastAPI(title="kb-engine", version=__version__, lifespan=lifespan)
 # Import routes after app is created
-from kb.routes import health, search, jobs, documents, tags, status, reindex, auth  # noqa: E402, F401
+from kb.routes import health, search, jobs, documents, tags, status, reindex, auth, notes, bulk  # noqa: E402, F401
 if __name__ == "__main__":
    import uvicorn
@@ -0,0 +1,223 @@
 """Tests for original document storage feature."""
 import hashlib
 import shutil
 import sqlite3
 from pathlib import Path
 from unittest.mock import patch
 import pytest
 from fastapi.testclient import TestClient
@pytest.fixture
 def data_dir(tmp_path):
    """Create a temporary data directory with required subdirectories."""
    staging = tmp_path / "staging"
    staging.mkdir()
    documents = tmp_path / "documents"
    documents.mkdir()
    return tmp_path
@pytest.fixture
 def db_conn(data_dir):
    """Create an in-memory-style SQLite DB with the full schema."""
    db_path = data_dir / "kb.db"
    conn = sqlite3.connect(str(db_path))
    conn.row_factory = sqlite3.Row
    conn.execute("PRAGMA foreign_keys=ON")
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS documents (
            id INTEGER PRIMARY KEY,
            title TEXT,
            source_path TEXT,
            content_hash TEXT UNIQUE,
            doc_type TEXT,
            language TEXT,
            stored_path TEXT,
            original_filename TEXT,
            created_at TEXT DEFAULT current_timestamp
        );
        CREATE TABLE IF NOT EXISTS chunks (
            id INTEGER PRIMARY KEY,
            document_id INTEGER REFERENCES documents(id) ON DELETE CASCADE,
            chunk_index INTEGER,
            text TEXT,
            token_count INTEGER,
            metadata TEXT DEFAULT '{}',
            UNIQUE(document_id, chunk_index)
        );
        CREATE TABLE IF NOT EXISTS tags (
            id INTEGER PRIMARY KEY,
            name TEXT UNIQUE COLLATE NOCASE
        );
        CREATE TABLE IF NOT EXISTS document_tags (
            document_id INTEGER REFERENCES documents(id) ON DELETE CASCADE,
            tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
            UNIQUE(document_id, tag_id)
        );
        CREATE TABLE IF NOT EXISTS jobs (
            id INTEGER PRIMARY KEY,
            filename TEXT,
            status TEXT DEFAULT 'queued',
            doc_type TEXT,
            tags_json TEXT DEFAULT '[]',
            title TEXT,
            error TEXT,
            document_id INTEGER,
            chunk_count INTEGER DEFAULT 0,
            staging_path TEXT,
            content_hash TEXT,
            created_at TEXT DEFAULT current_timestamp,
            completed_at TEXT
        );
    """)
    conn.commit()
    yield conn
    conn.close()
@pytest.fixture
 def sample_pdf(data_dir):
    """Create a fake PDF file in staging."""
    content = b"%PDF-1.4 fake pdf content for testing"
    staging = data_dir / "staging"
    path = staging / "test_upload.pdf"
    path.write_bytes(content)
    return path, content
 class TestWorkerFileStorage:
    """Tests for worker moving files to persistent storage."""
    def test_successful_ingestion_stores_file(self, data_dir, db_conn, sample_pdf):
        """7.1 - Test successful ingestion stores file at expected path."""
        staged_path, content = sample_pdf
        content_hash = hashlib.sha256(content).hexdigest()
        documents_dir = data_dir / "documents"
        expected_dest = documents_dir / f"{content_hash}.pdf"
        # Simulate what the worker does: move file to documents dir
        shutil.move(str(staged_path), str(expected_dest))
        assert expected_dest.exists()
        assert expected_dest.read_bytes() == content
        assert not staged_path.exists()
        # Simulate DB update
        db_conn.execute(
            "INSERT INTO documents(title, source_path, content_hash, doc_type, stored_path, original_filename) "
            "VALUES (?, ?, ?, ?, ?, ?)",
            ("Test PDF", str(staged_path), content_hash, "pdf", str(expected_dest), "test_upload.pdf"),
        )
        db_conn.commit()
        row = db_conn.execute("SELECT stored_path, original_filename FROM documents WHERE content_hash = ?", (content_hash,)).fetchone()
        assert row["stored_path"] == str(expected_dest)
        assert row["original_filename"] == "test_upload.pdf"
    def test_failed_ingestion_no_file_in_documents(self, data_dir, sample_pdf):
        """7.2 - Test failed ingestion does not leave file in documents dir."""
        staged_path, _ = sample_pdf
        documents_dir = data_dir / "documents"
        # Simulate failure: staging file gets cleaned up, nothing in documents dir
        staged_path.unlink()
        assert len(list(documents_dir.iterdir())) == 0
    def test_document_deletion_removes_stored_file(self, data_dir, db_conn, sample_pdf):
        """7.4 - Test document deletion removes stored file."""
        staged_path, content = sample_pdf
        content_hash = hashlib.sha256(content).hexdigest()
        documents_dir = data_dir / "documents"
        dest = documents_dir / f"{content_hash}.pdf"
        shutil.move(str(staged_path), str(dest))
        db_conn.execute(
            "INSERT INTO documents(title, source_path, content_hash, doc_type, stored_path, original_filename) "
            "VALUES (?, ?, ?, ?, ?, ?)",
            ("Test PDF", str(staged_path), content_hash, "pdf", str(dest), "test_upload.pdf"),
        )
        db_conn.commit()
        # Simulate delete: remove from DB and disk
        doc = db_conn.execute("SELECT id, stored_path FROM documents WHERE content_hash = ?", (content_hash,)).fetchone()
        stored = Path(doc["stored_path"])
        db_conn.execute("DELETE FROM documents WHERE id = ?", (doc["id"],))
        db_conn.commit()
        if stored.exists():
            stored.unlink()
        assert not stored.exists()
        assert db_conn.execute("SELECT COUNT(*) FROM documents", ()).fetchone()[0] == 0
    def test_download_404_for_document_without_stored_file(self, db_conn):
        """7.5 - Test download returns 404 for documents without stored files."""
        db_conn.execute(
            "INSERT INTO documents(title, source_path, content_hash, doc_type) "
            "VALUES (?, ?, ?, ?)",
            ("Old Doc", "/tmp/gone", "abc123", "pdf"),
        )
        db_conn.commit()
        row = db_conn.execute("SELECT stored_path FROM documents WHERE content_hash = 'abc123'").fetchone()
        assert row["stored_path"] is None
 class TestFileDownloadEndpoint:
    """Tests for the /api/v1/documents/{id}/file endpoint logic."""
    def test_file_response_uses_original_filename(self, data_dir, db_conn, sample_pdf):
        """7.3 - Test file download uses correct original filename."""
        staged_path, content = sample_pdf
        content_hash = hashlib.sha256(content).hexdigest()
        documents_dir = data_dir / "documents"
        dest = documents_dir / f"{content_hash}.pdf"
        shutil.move(str(staged_path), str(dest))
        db_conn.execute(
            "INSERT INTO documents(title, source_path, content_hash, doc_type, stored_path, original_filename) "
            "VALUES (?, ?, ?, ?, ?, ?)",
            ("My Report", str(staged_path), content_hash, "pdf", str(dest), "quarterly_report.pdf"),
        )
        db_conn.commit()
        doc = db_conn.execute("SELECT stored_path, original_filename, title FROM documents WHERE content_hash = ?", (content_hash,)).fetchone()
        # Verify the original filename is preserved and different from title
        assert doc["original_filename"] == "quarterly_report.pdf"
        assert doc["title"] == "My Report"
        assert Path(doc["stored_path"]).exists()
    def test_fallback_to_title_when_no_original_filename(self, data_dir, db_conn):
        """Test that title+ext is used when original_filename is NULL."""
        documents_dir = data_dir / "documents"
        fake_file = documents_dir / "somehash.pdf"
        fake_file.write_bytes(b"fake")
        db_conn.execute(
            "INSERT INTO documents(title, source_path, content_hash, doc_type, stored_path) "
            "VALUES (?, ?, ?, ?, ?)",
            ("Engine Manual", "/tmp/old", "hash456", "pdf", str(fake_file)),
        )
        db_conn.commit()
        doc = db_conn.execute("SELECT original_filename, title, stored_path FROM documents WHERE content_hash = 'hash456'").fetchone()
        # When original_filename is NULL, the endpoint should fall back to title + ext
        original_filename = doc["original_filename"]
        if not original_filename:
            ext = Path(doc["stored_path"]).suffix
            original_filename = (doc["title"] or "document") + ext
        assert original_filename == "Engine Manual.pdf"
@@ -0,0 +1,17 @@
 FROM python:3.12-slim
 WORKDIR /app
 COPY requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt
 COPY *.py ./
 ENV KB_ENGINE_URL=http://engine:8000
 ENV KB_API_KEY=
 ENV KB_MCP_API_KEY=
 ENV KB_MCP_PORT=3000
 EXPOSE 3000
 CMD ["python", "server.py"]
@@ -0,0 +1,17 @@
 """Configuration from environment variables."""
 import os
 KB_ENGINE_URL = os.environ.get("KB_ENGINE_URL", "http://localhost:8000")
 KB_API_KEY = os.environ.get("KB_API_KEY", "")
 KB_MCP_API_KEY = os.environ.get("KB_MCP_API_KEY", "")
 KB_MCP_PORT = int(os.environ.get("KB_MCP_PORT", "3000"))
 KB_MCP_ALLOWED_HOSTS = os.environ.get("KB_MCP_ALLOWED_HOSTS", "")
 def parse_allowed_hosts() -> list[str]:
    """Parse KB_MCP_ALLOWED_HOSTS into a list of host strings."""
    if not KB_MCP_ALLOWED_HOSTS:
        return []
    return [h.strip() for h in KB_MCP_ALLOWED_HOSTS.split(",") if h.strip()]
@@ -0,0 +1,208 @@
 """HTTP client for the kb engine API."""
 import httpx
 from config import KB_ENGINE_URL, KB_API_KEY
 def _auth_headers() -> dict[str, str]:
    h: dict[str, str] = {}
    if KB_API_KEY:
        h["Authorization"] = f"Bearer {KB_API_KEY}"
    return h
 def _client() -> httpx.Client:
    return httpx.Client(base_url=KB_ENGINE_URL, headers=_auth_headers(), timeout=60.0)
 def search(query: str, top: int = 10, tags: list[str] | None = None,
           doc_type: str | None = None, fts_only: bool = False,
           vec_only: bool = False, threshold: float | None = None) -> dict:
    body: dict = {"query": query, "top": top}
    if tags:
        body["tags"] = tags
    if doc_type:
        body["doc_type"] = doc_type
    if fts_only:
        body["fts_only"] = True
    if vec_only:
        body["vec_only"] = True
    if threshold is not None:
        body["threshold"] = threshold
    with _client() as c:
        r = c.post("/api/v1/search", json=body)
        r.raise_for_status()
        return r.json()
 def add_note(text: str, tags: list[str] | None = None,
             title: str | None = None) -> dict:
    fields = {"note": text}
    if tags:
        fields["tags"] = ",".join(tags)
    if title:
        fields["title"] = title
    with _client() as c:
        r = c.post("/api/v1/jobs", data=fields)
        r.raise_for_status()
        return r.json()
 def update_note(doc_id: int, text: str) -> dict:
    with _client() as c:
        r = c.patch(f"/api/v1/notes/{doc_id}", json={"text": text})
        r.raise_for_status()
        return r.json()
 def get_document(doc_id: int) -> dict:
    with _client() as c:
        r = c.get(f"/api/v1/documents/{doc_id}")
        r.raise_for_status()
        return r.json()
 def list_documents(doc_type: str | None = None,
                   tags: str | None = None) -> list[dict]:
    params: dict = {}
    if doc_type:
        params["type"] = doc_type
    if tags:
        params["tags"] = tags
    with _client() as c:
        r = c.get("/api/v1/documents", params=params)
        r.raise_for_status()
        return r.json()
 def get_status() -> dict:
    with _client() as c:
        r = c.get("/api/v1/status")
        r.raise_for_status()
        return r.json()
 def list_jobs(status: str | None = None) -> list[dict]:
    params: dict = {}
    if status:
        params["status"] = status
    with _client() as c:
        r = c.get("/api/v1/jobs", params=params)
        r.raise_for_status()
        return r.json()
 def update_tags(doc_id: int, add: list[str] | None = None,
                remove: list[str] | None = None) -> dict:
    body: dict = {}
    if add:
        body["add"] = add
    if remove:
        body["remove"] = remove
    with _client() as c:
        r = c.put(f"/api/v1/documents/{doc_id}/tags", json=body)
        r.raise_for_status()
        return r.json()
 def delete_document(doc_id: int) -> dict:
    with _client() as c:
        r = c.delete(f"/api/v1/documents/{doc_id}")
        r.raise_for_status()
        return r.json()
 def _bulk_body(
    document_ids: list[int] | None = None,
    tags: list[str] | None = None,
    doc_type: str | None = None,
    from_id: int | None = None,
    to_id: int | None = None,
    force: bool = False,
    **extra,
 ) -> dict:
    body: dict = {}
    if document_ids:
        body["document_ids"] = document_ids
    if tags:
        body["tags"] = tags
    if doc_type:
        body["doc_type"] = doc_type
    if from_id is not None:
        body["from_id"] = from_id
    if to_id is not None:
        body["to_id"] = to_id
    if force:
        body["force"] = True
    body.update(extra)
    return body
 def bulk_delete(
    document_ids: list[int] | None = None,
    tags: list[str] | None = None,
    doc_type: str | None = None,
    from_id: int | None = None,
    to_id: int | None = None,
    force: bool = False,
 ) -> dict:
    body = _bulk_body(document_ids, tags, doc_type, from_id, to_id, force)
    with _client() as c:
        r = c.post("/api/v1/bulk/delete", json=body)
        r.raise_for_status()
        return r.json()
 def bulk_tags(
    document_ids: list[int] | None = None,
    tags: list[str] | None = None,
    doc_type: str | None = None,
    from_id: int | None = None,
    to_id: int | None = None,
    add: list[str] | None = None,
    remove: list[str] | None = None,
    force: bool = False,
 ) -> dict:
    extra = {}
    if add:
        extra["add"] = add
    if remove:
        extra["remove"] = remove
    body = _bulk_body(document_ids, tags, doc_type, from_id, to_id, force, **extra)
    with _client() as c:
        r = c.post("/api/v1/bulk/tags", json=body)
        r.raise_for_status()
        return r.json()
 def bulk_set_tags(
    document_ids: list[int] | None = None,
    tags: list[str] | None = None,
    doc_type: str | None = None,
    from_id: int | None = None,
    to_id: int | None = None,
    new_tags: list[str] | None = None,
    force: bool = False,
 ) -> dict:
    extra = {"new_tags": new_tags or []}
    body = _bulk_body(document_ids, tags, doc_type, from_id, to_id, force, **extra)
    with _client() as c:
        r = c.post("/api/v1/bulk/set-tags", json=body)
        r.raise_for_status()
        return r.json()
 def upload_file(filename: str, file_bytes: bytes,
                tags: list[str] | None = None) -> dict:
    fields: dict = {}
    if tags:
        fields["tags"] = ",".join(tags)
    with _client() as c:
        r = c.post(
            "/api/v1/jobs",
            data=fields,
            files={"file": (filename, file_bytes)},
        )
        r.raise_for_status()
        return r.json()
@@ -0,0 +1,4 @@
 mcp>=1.9.0
 httpx>=0.27
 uvicorn>=0.30
 starlette>=0.38
@@ -0,0 +1,446 @@
 """kb MCP server — exposes knowledge base operations as MCP tools."""
 import asyncio
 import json
 import logging
 from mcp.server.fastmcp import FastMCP
 from mcp.server.transport_security import TransportSecuritySettings
 from starlette.applications import Starlette
 from starlette.middleware import Middleware
 from starlette.middleware.base import BaseHTTPMiddleware
 from starlette.requests import Request
 from starlette.responses import JSONResponse
 from starlette.routing import Mount
 import config
 import engine
 import uploads
 logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
 logger = logging.getLogger("kb.mcp")
 # ---------------------------------------------------------------------------
 # Transport security — DNS rebinding protection with configurable allowed hosts
 # ---------------------------------------------------------------------------
 _LOCALHOST_HOSTS = ["127.0.0.1:*", "localhost:*", "[::1]:*"]
 _LOCALHOST_ORIGINS = ["http://127.0.0.1:*", "http://localhost:*", "http://[::1]:*"]
 _extra_hosts = config.parse_allowed_hosts()
 _allowed_hosts = _LOCALHOST_HOSTS + [f"{h}:*" for h in _extra_hosts]
 _allowed_origins = _LOCALHOST_ORIGINS + [f"http://{h}:*" for h in _extra_hosts]
 _transport_security = TransportSecuritySettings(
    enable_dns_rebinding_protection=True,
    allowed_hosts=_allowed_hosts,
    allowed_origins=_allowed_origins,
 )
 # ---------------------------------------------------------------------------
 # FastMCP server
 # ---------------------------------------------------------------------------
 mcp = FastMCP(
    "kb",
    instructions=(
        "Knowledge base MCP server. Provides tools for searching, adding, and "
        "managing documents and notes. Use tags to organise and filter documents "
        "(e.g. tag notes with 'agent:mybot' and filter searches by that tag). "
        "This server requires Bearer token authentication — all requests are "
        "authenticated via the Authorization header at the HTTP transport layer."
    ),
    transport_security=_transport_security,
 )
@mcp.tool()
 async def kb_search(
    query: str,
    top: int = 10,
    tags: list[str] | None = None,
    doc_type: str | None = None,
    fts_only: bool = False,
 ) -> str:
    """Search the knowledge base for relevant documents and notes.
    Returns ranked chunks matching the query, with text content, relevance scores,
    and document metadata.
    Args:
        query: The search query. Can be a natural language question or keywords.
        top: Maximum number of results to return (default 10).
        tags: Filter results to documents with ALL of these tags.
        doc_type: Filter by document type (e.g. "note", "pdf", "markdown", "code").
        fts_only: If true, use only full-text search (no vector similarity).
    Tips for complex queries:
    - Consider expanding into 2-3 variant phrasings and calling this tool multiple
      times, then deduplicating results by chunk_id. For example, search for both
      "pension revaluation rules" and "how are pensions revalued" to cast a wider net.
    - For precision, rerank the returned results using your own judgement based on
      relevance to the original question.
    """
    result = engine.search(
        query=query,
        top=top,
        tags=tags or None,
        doc_type=doc_type,
        fts_only=fts_only,
    )
    results_list = result if isinstance(result, list) else result.get("results", [])
    return json.dumps(results_list, indent=2)
@mcp.tool()
 async def kb_addnote(
    text: str,
    tags: list[str] | None = None,
    title: str | None = None,
 ) -> str:
    """Add a text note to the knowledge base for indexing and search.
    The note is queued for ingestion — it will be chunked, embedded, and made
    searchable. Use kb_jobs to check ingestion status.
    Args:
        text: The note text content.
        tags: Tags to apply to the note.
        title: Optional title (auto-derived from first line if omitted).
    """
    result = engine.add_note(text=text, tags=tags or None, title=title)
    return json.dumps(result, indent=2)
@mcp.tool()
 async def kb_update_note(
    document_id: int,
    text: str,
 ) -> str:
    """Update an existing note's content in place.
    Replaces the note text, re-chunks, and re-embeds while preserving the
    document ID, creation timestamp, and tags. Only works on documents with
    doc_type "note".
    Args:
        document_id: The ID of the note document to update.
        text: The new text content for the note.
    """
    result = engine.update_note(document_id, text)
    return json.dumps(result, indent=2)
@mcp.tool()
 async def kb_get(
    document_id: int | None = None,
    source_path: str | None = None,
 ) -> str:
    """Retrieve document details from the knowledge base.
    Look up a document by its ID or source path. Returns full document metadata,
    tags, and chunk contents.
    Args:
        document_id: The numeric document ID.
        source_path: The document's source path (alternative to document_id).
    """
    if document_id is not None:
        result = engine.get_document(document_id)
        return json.dumps(result, indent=2)
    elif source_path is not None:
        docs = engine.list_documents()
        matches = [d for d in docs if d.get("source_path") == source_path]
        if not matches:
            return json.dumps({"error": "No document found with that source_path"})
        doc = engine.get_document(matches[0]["id"])
        return json.dumps(doc, indent=2)
    else:
        return json.dumps({"error": "Provide either document_id or source_path"})
@mcp.tool()
 async def kb_status() -> str:
    """Get knowledge base engine status.
    Returns engine version, embedding model info, device info, document counts,
    database size, and ingestion queue state.
    """
    result = engine.get_status()
    result["authenticated"] = bool(config.KB_MCP_API_KEY)
    return json.dumps(result, indent=2)
@mcp.tool()
 async def kb_jobs(
    status: str | None = None,
 ) -> str:
    """List ingestion jobs and their status.
    Returns recent jobs showing what has been queued, is processing, completed,
    or failed.
    Args:
        status: Filter by job status ("queued", "processing", "done", "failed", "skipped").
    """
    result = engine.list_jobs(status=status)
    return json.dumps(result, indent=2)
@mcp.tool()
 async def kb_delete(
    document_id: int,
 ) -> str:
    """Permanently delete a document from the knowledge base.
    Removes the document and all associated data (chunks, embeddings, tags,
    stored files). This action cannot be undone.
    Args:
        document_id: The ID of the document to delete.
    """
    result = engine.delete_document(document_id)
    return json.dumps(result, indent=2)
@mcp.tool()
 async def kb_upload_start(
    filename: str,
    total_size: int,
    tags: list[str] | None = None,
 ) -> str:
    """Start a chunked file upload to the knowledge base.
    Use this for uploading files from a remote agent. The upload process is:
    1. Call kb_upload_start to get an upload_id
    2. Call kb_upload_chunk repeatedly with base64-encoded file chunks (recommended ~1MB each)
    3. Call kb_upload_finish to submit the file for ingestion
    Example for a 3MB file:
        upload = kb_upload_start(filename="report.pdf", total_size=3145728, tags=["project:x"])
        kb_upload_chunk(upload_id=upload["upload_id"], data="<base64 chunk 0>", chunk_index=0)
        kb_upload_chunk(upload_id=upload["upload_id"], data="<base64 chunk 1>", chunk_index=1)
        kb_upload_chunk(upload_id=upload["upload_id"], data="<base64 chunk 2>", chunk_index=2)
        result = kb_upload_finish(upload_id=upload["upload_id"])
    Args:
        filename: Original filename (used for type detection).
        total_size: Total file size in bytes.
        tags: Tags to apply to the uploaded document.
    """
    upload_id = uploads.start_upload(filename, total_size, tags or [])
    return json.dumps({"upload_id": upload_id})
@mcp.tool()
 async def kb_upload_chunk(
    upload_id: str,
    data: str,
    chunk_index: int,
 ) -> str:
    """Upload a base64-encoded chunk of a file.
    Part of the chunked upload flow started by kb_upload_start.
    Args:
        upload_id: The upload ID from kb_upload_start.
        data: Base64-encoded file data for this chunk.
        chunk_index: Zero-based index of this chunk.
    """
    try:
        uploads.add_chunk(upload_id, data, chunk_index)
        return json.dumps({"status": "ok", "chunk_index": chunk_index})
    except KeyError as e:
        return json.dumps({"error": str(e)})
@mcp.tool()
 async def kb_upload_finish(
    upload_id: str,
 ) -> str:
    """Finish a chunked upload and submit the file for ingestion.
    Reassembles all uploaded chunks and forwards the complete file to the
    engine for processing. Returns the ingestion job ID.
    Args:
        upload_id: The upload ID from kb_upload_start.
    """
    try:
        filename, file_bytes, tags = uploads.finish_upload(upload_id)
        result = engine.upload_file(filename, file_bytes, tags)
        return json.dumps(result, indent=2)
    except KeyError as e:
        return json.dumps({"error": str(e)})
 # ---------------------------------------------------------------------------
 # Bulk operation tools
 # ---------------------------------------------------------------------------
@mcp.tool()
 async def kb_bulk_delete(
    document_ids: list[int] | None = None,
    tags: list[str] | None = None,
    doc_type: str | None = None,
    from_id: int | None = None,
    to_id: int | None = None,
    force: bool = False,
 ) -> str:
    """Permanently delete multiple documents matching a filter.
    Removes matched documents and all associated data (chunks, embeddings, tags,
    stored files). This action cannot be undone.
    Selection filters combine with AND logic — at least one is required.
    A safety threshold applies: if the operation would affect more than 70% of
    all documents, it is rejected unless force=true.
    Args:
        document_ids: Delete documents with these specific IDs.
        tags: Delete documents that have ALL of these tags (selection filter).
        doc_type: Delete documents of this type (e.g. "note", "pdf").
        from_id: Delete documents with id >= this value.
        to_id: Delete documents with id <= this value.
        force: Override the safety threshold if it would block the operation.
    """
    result = engine.bulk_delete(
        document_ids=document_ids, tags=tags, doc_type=doc_type,
        from_id=from_id, to_id=to_id, force=force,
    )
    return json.dumps(result, indent=2)
@mcp.tool()
 async def kb_bulk_tags(
    document_ids: list[int] | None = None,
    tags: list[str] | None = None,
    doc_type: str | None = None,
    from_id: int | None = None,
    to_id: int | None = None,
    add: list[str] | None = None,
    remove: list[str] | None = None,
    force: bool = False,
 ) -> str:
    """Add and/or remove tags on multiple documents matching a filter.
    Selection filters combine with AND logic — at least one is required.
    Note: the 'tags' parameter is a SELECTION FILTER (which documents to target),
    while 'add' and 'remove' specify the TAG CHANGES to apply to those documents.
    Args:
        document_ids: Target documents with these specific IDs.
        tags: Target documents that have ALL of these tags (selection filter).
        doc_type: Target documents of this type.
        from_id: Target documents with id >= this value.
        to_id: Target documents with id <= this value.
        add: Tags to add to matched documents.
        remove: Tags to remove from matched documents.
        force: Override the safety threshold if it would block the operation.
    """
    result = engine.bulk_tags(
        document_ids=document_ids, tags=tags, doc_type=doc_type,
        from_id=from_id, to_id=to_id, add=add, remove=remove, force=force,
    )
    return json.dumps(result, indent=2)
@mcp.tool()
 async def kb_bulk_set_tags(
    document_ids: list[int] | None = None,
    tags: list[str] | None = None,
    doc_type: str | None = None,
    from_id: int | None = None,
    to_id: int | None = None,
    new_tags: list[str] | None = None,
    force: bool = False,
 ) -> str:
    """Replace all tags on multiple documents with a new set.
    Removes ALL existing tags from matched documents, then applies the new tag set.
    Selection filters combine with AND logic — at least one is required.
    Note: the 'tags' parameter is a SELECTION FILTER (which documents to target),
    while 'new_tags' is the REPLACEMENT tag set to apply.
    Args:
        document_ids: Target documents with these specific IDs.
        tags: Target documents that have ALL of these tags (selection filter).
        doc_type: Target documents of this type.
        from_id: Target documents with id >= this value.
        to_id: Target documents with id <= this value.
        new_tags: The replacement tag set to apply to all matched documents.
        force: Override the safety threshold if it would block the operation.
    """
    result = engine.bulk_set_tags(
        document_ids=document_ids, tags=tags, doc_type=doc_type,
        from_id=from_id, to_id=to_id, new_tags=new_tags, force=force,
    )
    return json.dumps(result, indent=2)
 # ---------------------------------------------------------------------------
 # Auth middleware
 # ---------------------------------------------------------------------------
 class BearerAuthMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        if not config.KB_MCP_API_KEY:
            return await call_next(request)
        auth_header = request.headers.get("authorization", "")
        if auth_header.startswith("Bearer ") and auth_header[7:] == config.KB_MCP_API_KEY:
            return await call_next(request)
        return JSONResponse(
            status_code=401,
            content={"error": "Unauthorized"},
        )
 # ---------------------------------------------------------------------------
 # ASGI app assembly
 # ---------------------------------------------------------------------------
 def create_app():
    """Create the ASGI app with auth middleware wrapping the MCP server."""
    from contextlib import asynccontextmanager
    mcp_app = mcp.streamable_http_app()
    @asynccontextmanager
    async def lifespan(app):
        uploads.start_cleanup_task()
        logger.info("Upload cleanup task started")
        # Delegate to the MCP app's lifespan if it has one
        if hasattr(mcp_app, 'router') and hasattr(mcp_app.router, 'lifespan_context'):
            async with mcp_app.router.lifespan_context(app):
                yield
        else:
            yield
    app = Starlette(
        routes=[Mount("/", app=mcp_app)],
        middleware=[Middleware(BearerAuthMiddleware)],
        lifespan=lifespan,
    )
    return app
 # ---------------------------------------------------------------------------
 # Entry point
 # ---------------------------------------------------------------------------
 if __name__ == "__main__":
    import uvicorn
    logger.info(
        "Starting kb MCP server on port %d, engine=%s",
        config.KB_MCP_PORT,
        config.KB_ENGINE_URL,
    )
    app = create_app()
    uvicorn.run(app, host="0.0.0.0", port=config.KB_MCP_PORT)
@@ -0,0 +1,96 @@
 """Chunked upload staging management."""
 import asyncio
 import base64
 import logging
 import shutil
 import tempfile
 import time
 import uuid
 from dataclasses import dataclass, field
 from pathlib import Path
 logger = logging.getLogger("kb.mcp.uploads")
 UPLOAD_TIMEOUT_SECONDS = 600  # 10 minutes
@dataclass
 class StagedUpload:
    upload_id: str
    filename: str
    total_size: int
    tags: list[str]
    staging_dir: Path
    created_at: float = field(default_factory=time.time)
    chunks: dict[int, Path] = field(default_factory=dict)
 _uploads: dict[str, StagedUpload] = {}
 _cleanup_task: asyncio.Task | None = None
 def start_upload(filename: str, total_size: int, tags: list[str]) -> str:
    upload_id = str(uuid.uuid4())
    staging_dir = Path(tempfile.mkdtemp(prefix=f"kb_upload_{upload_id[:8]}_"))
    _uploads[upload_id] = StagedUpload(
        upload_id=upload_id,
        filename=filename,
        total_size=total_size,
        tags=tags,
        staging_dir=staging_dir,
    )
    logger.info("Started upload %s for %s (%d bytes)", upload_id, filename, total_size)
    return upload_id
 def add_chunk(upload_id: str, data_b64: str, chunk_index: int) -> None:
    upload = _uploads.get(upload_id)
    if upload is None:
        raise KeyError(f"Upload ID not found: {upload_id}")
    chunk_bytes = base64.b64decode(data_b64)
    chunk_path = upload.staging_dir / f"chunk_{chunk_index:06d}"
    chunk_path.write_bytes(chunk_bytes)
    upload.chunks[chunk_index] = chunk_path
    logger.info("Added chunk %d to upload %s (%d bytes)", chunk_index, upload_id, len(chunk_bytes))
 def finish_upload(upload_id: str) -> tuple[str, bytes, list[str]]:
    """Reassemble chunks and return (filename, file_bytes, tags)."""
    upload = _uploads.get(upload_id)
    if upload is None:
        raise KeyError(f"Upload ID not found: {upload_id}")
    try:
        parts = []
        for idx in sorted(upload.chunks.keys()):
            parts.append(upload.chunks[idx].read_bytes())
        file_bytes = b"".join(parts)
        return upload.filename, file_bytes, upload.tags
    finally:
        _cleanup_upload(upload_id)
 def _cleanup_upload(upload_id: str) -> None:
    upload = _uploads.pop(upload_id, None)
    if upload and upload.staging_dir.exists():
        shutil.rmtree(upload.staging_dir, ignore_errors=True)
 async def cleanup_abandoned_uploads() -> None:
    """Background task that removes uploads older than the timeout."""
    while True:
        await asyncio.sleep(60)
        now = time.time()
        expired = [
            uid for uid, u in _uploads.items()
            if now - u.created_at > UPLOAD_TIMEOUT_SECONDS
        ]
        for uid in expired:
            logger.warning("Cleaning up abandoned upload %s", uid)
            _cleanup_upload(uid)
 def start_cleanup_task() -> None:
    global _cleanup_task
    if _cleanup_task is None or _cleanup_task.done():
        _cleanup_task = asyncio.create_task(cleanup_abandoned_uploads())
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-03-27
@@ -0,0 +1,84 @@
 ## Context
 Currently, uploaded files pass through a staging directory and are deleted after the worker extracts chunks and embeddings. The `documents.source_path` column stores the (now-stale) staging path. Users who want the original file must re-source it externally. The data directory structure today is:
 ```
 /data/
  kb.db
  hf_cache/
  staging/      # temporary, cleaned after processing
 ```
 ## Goals / Non-Goals
 **Goals:**
 - Persist every successfully-ingested original file for the lifetime of the document
 - Serve the original file via API (`GET /api/v1/documents/{id}/file`)
 - Clean up stored files when a document is deleted
 - Work transparently with the existing Docker volume mount (`/data`)
 **Non-Goals:**
 - Serving transformed/converted versions of documents (e.g. PDF→HTML)
 - De-duplicating file storage (same content hash = same row, so 1:1 is fine)
 - Compression or archival of stored files
 - Retroactive storage of files ingested before this change (they're already gone)
 ## Decisions
 ### 1. Storage layout: content-hash-based flat directory
 Store files at `{data_dir}/documents/{content_hash}{ext}` (e.g. `documents/a1b2c3...d4.pdf`).
 **Why over document-ID naming:** Content hash is available at staging time before the DB row exists, avoids race conditions, and makes dedup trivially safe (same hash = same file, overwrite is harmless). The hash is already computed for dedup checks.
 **Why flat over nested:** The KB is a personal tool — expected scale is hundreds to low-thousands of documents. A flat directory is simpler and sufficient. If needed later, a `ab/cd/` prefix scheme is easy to add.
 **Alternatives considered:**
 - *Store in SQLite as BLOBs*: Bloats the DB, complicates backups, and degrades WAL performance for large files. Rejected.
 - *Keep the staging path as-is*: Staging uses UUID prefixes which are meaningless; content-hash naming is deterministic and self-deduplicating.
 ### 2. Move file from staging to documents dir (not copy)
 Use `shutil.move()` from staging to documents dir after successful ingestion, before `staging.cleanup()`. This avoids doubling disk usage during processing.
 **Why not copy-then-delete:** Move is atomic on the same filesystem (which `/data/staging` and `/data/documents` share). Faster, no temporary disk spike.
 ### 3. New columns `stored_path` and `original_filename` on `documents` table
 Add two nullable columns:
 - `stored_path TEXT` — permanent file location on disk
 - `original_filename TEXT` — the exact filename from the upload (e.g. `report.pdf`)
 Both are nullable because existing documents (ingested before this change) won't have values.
 **Why `original_filename` separate from `title`:** The `title` field can be user-overridden (e.g. "Engine Manual" instead of `report.pdf`). When serving the file for download, the `Content-Disposition` header should use the original filename so the downloaded file has the correct name and extension. The `original_filename` is sourced from `jobs.filename` which is already captured at upload time.
 Keep `source_path` as-is for backward compatibility (it records what the staging path was). `stored_path` is the permanent location.
 **Migration:** Two `ALTER TABLE` statements — safe additive migrations, no data rewrite needed.
 ### 4. File download endpoint returns the file directly
 `GET /api/v1/documents/{id}/file` uses FastAPI's `FileResponse` with:
 - `media_type` derived from the file extension
 - `Content-Disposition: attachment; filename="{original_filename}"` (falls back to `{title}{ext}` if `original_filename` is NULL)
 - Returns 404 if `stored_path` is NULL or file is missing from disk
 ### 5. Delete cascades to file removal
 When `DELETE /api/v1/documents/{id}` is called, delete the stored file from disk after the DB delete succeeds. If file removal fails (already gone, permissions), log a warning but don't fail the API call — the DB is the source of truth.
 ## Risks / Trade-offs
 - **Disk usage increases** — every ingested file persists. For the personal-use scale this is expected and acceptable. Users manage this via document deletion.
  → Mitigation: Document the storage behavior; `GET /api/v1/status` already shows DB size, could add documents-dir size later.
 - **Pre-existing documents have no stored file** — `stored_path` will be NULL for documents ingested before this change.
  → Mitigation: The download endpoint returns 404 with a clear message ("original file not available — ingested before document storage was enabled"). No attempt to backfill.
 - **File-DB consistency** — crash between DB commit and file move could leave orphan staged files or missing stored files.
  → Mitigation: Move file first, then commit DB. If DB commit fails, the file in documents dir is harmless (orphan cleanup can be added later). If move fails, the job fails and staged file remains for retry.
 ## Open Questions
 None — the scope is straightforward enough to proceed.
@@ -0,0 +1,30 @@
 ## Why
 The knowledge base currently discards original files after chunking and embedding. Once a document is ingested, only the extracted text chunks and vectors remain — the original PDF, markdown, or code file is deleted from staging. Users cannot retrieve the source document from the KB, which limits its usefulness as a document store and prevents use cases like re-processing with a different model or serving the original file to downstream tools.
 ## What Changes
 - Add a persistent document storage directory (`{data_dir}/documents/`) alongside the SQLite database
 - After successful ingestion, copy the original file from staging to permanent storage instead of deleting it
 - Store the permanent file path in the `documents` table (`stored_path` column) and the original upload filename (`original_filename` column) so downloads use the correct name
 - Add an API endpoint to download the original file by document ID
 - Add a CLI command to export/retrieve the original document
 - **BREAKING**: Delete document now also removes the stored file from disk
 - Notes (text-only) are stored as `.note` files in the same directory for consistency
 ## Capabilities
 ### New Capabilities
 - `document-storage`: Persistent storage of original uploaded files on disk, lifecycle management (store on ingest, delete on document removal), and retrieval via API
 ### Modified Capabilities
 - `engine-api`: New endpoint `GET /api/v1/documents/{id}/file` to download the original file; delete endpoint must also clean up stored files; ingestion worker stores files instead of discarding them
 ## Impact
 - **Engine config**: New `documents_dir` property on Config, new directory created at startup via `ensure_dirs()`
 - **Worker**: After successful chunking, move/copy file from staging to documents dir; update `source_path` → `stored_path` with permanent location
 - **Database schema**: Add `stored_path` and `original_filename` columns to `documents` table (migration for existing DBs)
 - **Routes**: New file-download endpoint; update delete handler to remove stored file
 - **Go client**: New `export` / `get-file` subcommand to download original documents
 - **Docker**: `documents/` directory lives inside the existing `/data` volume — no new mounts needed
@@ -0,0 +1,83 @@
 ## ADDED Requirements
 ### Requirement: Persistent original file storage
 The engine SHALL persistently store the original uploaded file on disk after successful ingestion. Files SHALL be stored at `{data_dir}/documents/{content_hash}{extension}` where `content_hash` is the SHA-256 hex digest already computed for dedup and `extension` is preserved from the original filename. The `documents` table SHALL record the stored file path in a `stored_path` column and the original upload filename in an `original_filename` column.
 #### Scenario: File stored after successful ingestion
 - **WHEN** the background worker successfully processes an ingestion job for a PDF file
 - **THEN** the worker SHALL move the staged file to `{data_dir}/documents/{content_hash}.pdf`, store the permanent path in `documents.stored_path`, store the original filename in `documents.original_filename`, and delete the staging entry
 #### Scenario: Note stored after successful ingestion
 - **WHEN** the background worker successfully processes an ingestion job for a text note
 - **THEN** the worker SHALL move the staged `.note` file to `{data_dir}/documents/{content_hash}.note` and store the permanent path in `documents.stored_path`
 #### Scenario: Markdown file stored after successful ingestion
 - **WHEN** the background worker successfully processes an ingestion job for a markdown file
 - **THEN** the worker SHALL move the staged file to `{data_dir}/documents/{content_hash}.md` and store the permanent path in `documents.stored_path`
 #### Scenario: Code file stored after successful ingestion
 - **WHEN** the background worker successfully processes an ingestion job for a code file (e.g. `.py`, `.go`)
 - **THEN** the worker SHALL move the staged file to `{data_dir}/documents/{content_hash}{original_extension}` and store the permanent path in `documents.stored_path`
 #### Scenario: Documents directory created at startup
 - **WHEN** the engine starts up and calls `ensure_dirs()`
 - **THEN** the `{data_dir}/documents/` directory SHALL be created if it does not exist
 #### Scenario: Ingestion failure does not store file
 - **WHEN** the background worker fails to process an ingestion job
 - **THEN** the staged file SHALL be cleaned up as before and no file SHALL be written to the documents directory
 ---
 ### Requirement: File retrieval via API
 The engine SHALL serve the original stored file for any document that has a stored file on disk.
 #### Scenario: Download original file
 - **WHEN** a client sends `GET /api/v1/documents/{id}/file` for a document with a stored file
 - **THEN** the engine SHALL return the file with appropriate `Content-Type` based on file extension and `Content-Disposition: attachment; filename="{original_filename}"` header, falling back to `{title}{ext}` if `original_filename` is NULL
 #### Scenario: Download file for pre-existing document
 - **WHEN** a client sends `GET /api/v1/documents/{id}/file` for a document ingested before this feature was added (stored_path is NULL)
 - **THEN** the engine SHALL return HTTP 404 with `{"error": "Original file not available - ingested before document storage was enabled"}`
 #### Scenario: Download file when file missing from disk
 - **WHEN** a client sends `GET /api/v1/documents/{id}/file` for a document whose `stored_path` is set but the file no longer exists on disk
 - **THEN** the engine SHALL return HTTP 404 with `{"error": "Stored file not found on disk"}`
 #### Scenario: Download file for non-existent document
 - **WHEN** a client sends `GET /api/v1/documents/{id}/file` with a non-existent document ID
 - **THEN** the engine SHALL return HTTP 404 with `{"error": "Document not found"}`
 ---
 ### Requirement: File cleanup on document deletion
 The engine SHALL remove the stored original file from disk when a document is deleted.
 #### Scenario: Delete document with stored file
 - **WHEN** a client sends `DELETE /api/v1/documents/{id}` for a document with a stored file
 - **THEN** the engine SHALL delete the document from the database (cascading to chunks, embeddings, tags) AND delete the stored file from disk
 #### Scenario: Delete document when stored file already missing
 - **WHEN** a client sends `DELETE /api/v1/documents/{id}` for a document whose stored file has been manually removed from disk
 - **THEN** the engine SHALL delete the document from the database successfully and log a warning about the missing file
 #### Scenario: Delete document without stored file (pre-existing)
 - **WHEN** a client sends `DELETE /api/v1/documents/{id}` for a document with `stored_path` NULL
 - **THEN** the engine SHALL delete the document from the database without attempting file removal
 ---
 ### Requirement: Database schema migration for stored_path and original_filename
 The engine SHALL add `stored_path` and `original_filename` columns to the `documents` table for tracking permanent file locations and original upload filenames.
 #### Scenario: Fresh database initialization
 - **WHEN** the engine initializes a new database
 - **THEN** the `documents` table SHALL include `stored_path TEXT` and `original_filename TEXT` columns in its schema
 #### Scenario: Existing database migration
 - **WHEN** the engine starts with a database created before this feature
 - **THEN** the engine SHALL add `stored_path TEXT` and `original_filename TEXT` to the `documents` table via `ALTER TABLE` if the columns do not exist
@@ -0,0 +1,61 @@
 ## MODIFIED Requirements
 ### Requirement: Background ingestion worker
 The engine SHALL run a background worker that processes queued jobs. The worker SHALL process one job at a time. For each job, it SHALL: detect document type, run the appropriate chunking pipeline (Docling for PDFs, header-based for Markdown, AST-based for code, whole-text for notes), generate embeddings using the resident model, insert chunks and vectors into the database, and move the original file to persistent storage.
 #### Scenario: Successful PDF ingestion
 - **WHEN** the background worker picks up a queued PDF job
 - **THEN** it SHALL update the job status to `processing`, run Docling conversion and chunking, embed all chunks, insert document and chunks into the database, move the staged file to `{data_dir}/documents/{content_hash}.pdf`, update `documents.stored_path` with the permanent path, store the original filename in `documents.original_filename`, update the job status to `done` with the resulting document_id and chunk count, and clean up the staging entry
 #### Scenario: Ingestion failure
 - **WHEN** the background worker encounters an error during processing (e.g., corrupt PDF)
 - **THEN** it SHALL update the job status to `failed` with the error message, delete the staged file, and continue processing the next queued job
 #### Scenario: Search during active ingestion
 - **WHEN** a search request arrives while the background worker is processing a job
 - **THEN** the search SHALL execute without blocking (SQLite WAL mode) and return results from already-ingested documents
 ---
 ### Requirement: Document management
 The engine SHALL provide endpoints to list, inspect, remove, and download original files for ingested documents.
 #### Scenario: List documents
 - **WHEN** a client sends `GET /api/v1/documents`
 - **THEN** the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, and created_at
 #### Scenario: List documents with filters
 - **WHEN** a client sends `GET /api/v1/documents?type=pdf&tags=manual`
 - **THEN** the engine SHALL return only documents matching all specified filters
 #### Scenario: Get document details
 - **WHEN** a client sends `GET /api/v1/documents/{id}`
 - **THEN** the engine SHALL return the full document record including all chunks, their text content, and whether the original file is available (`has_file: true/false`)
 #### Scenario: Download original file
 - **WHEN** a client sends `GET /api/v1/documents/{id}/file`
 - **THEN** the engine SHALL return the original file with appropriate Content-Type and `Content-Disposition: attachment; filename="{original_filename}"` headers, or HTTP 404 if the file is not available
 #### Scenario: Remove a document
 - **WHEN** a client sends `DELETE /api/v1/documents/{id}`
 - **THEN** the engine SHALL delete the document, all its chunks, associated embeddings, tag associations, and the stored original file from disk, and return HTTP 200 with a confirmation
 #### Scenario: Remove non-existent document
 - **WHEN** a client sends `DELETE /api/v1/documents/{id}` with a non-existent ID
 - **THEN** the engine SHALL return HTTP 404
 ---
 ### Requirement: Engine configuration via environment variables
 The engine SHALL be configured via environment variables. No config file is read by the engine — all configuration comes from the environment (set via compose.yaml or Docker run).
 #### Scenario: Default configuration
 - **WHEN** the engine starts with no environment variables set
 - **THEN** it SHALL use defaults: data directory `/data`, model `all-MiniLM-L6-v2`, device `auto`, no API key required. It SHALL create `staging/` and `documents/` subdirectories under the data directory.
 #### Scenario: Custom model
 - **WHEN** `KB_MODEL` is set to `BAAI/bge-small-en-v1.5`
 - **THEN** the engine SHALL download and load that model instead of the default
@@ -0,0 +1,38 @@
 ## 1. Config and Schema
 - [x] 1.1 Add `documents_dir` property to `Config` in `engine/kb/config.py` returning `{data_dir}/documents`
 - [x] 1.2 Add `documents_dir.mkdir()` to `Config.ensure_dirs()`
 - [x] 1.3 Add `stored_path TEXT` and `original_filename TEXT` columns to `documents` table in `init_schema()` (both CREATE TABLE and ALTER TABLE migration for existing DBs)
 ## 2. Worker — File Persistence
 - [x] 2.1 In `worker._process_job()`, after successful DB commit, move staged file to `{documents_dir}/{content_hash}{ext}` using `shutil.move()`
 - [x] 2.2 Update `documents.stored_path` and `documents.original_filename` (from `jobs.filename`) after moving the file
 - [x] 2.3 Remove `staging.cleanup()` call for successful jobs (file is moved, not deleted); keep cleanup on failure path
 ## 3. API — File Download Endpoint
 - [x] 3.1 Add `GET /api/v1/documents/{id}/file` route in `engine/kb/routes/documents.py` using FastAPI `FileResponse`
 - [x] 3.2 Return appropriate `Content-Type` from file extension and `Content-Disposition: attachment; filename="{original_filename}"` (fall back to `{title}{ext}` if NULL)
 - [x] 3.3 Handle 404 cases: document not found, `stored_path` is NULL, file missing from disk
 ## 4. API — Delete Cleanup
 - [x] 4.1 Update `DELETE /api/v1/documents/{id}` in `engine/kb/routes/documents.py` to also delete the stored file from disk
 - [x] 4.2 Handle missing file gracefully (log warning, don't fail the request)
 ## 5. Document Details Enhancement
 - [x] 5.1 Add `has_file` boolean to `GET /api/v1/documents/{id}` response based on `stored_path` presence and file existence on disk
 ## 6. Go Client
 - [x] 6.1 Add `kb export <doc_id>` subcommand to the Go client that calls `GET /api/v1/documents/{id}/file` and writes to stdout or a specified output path
 ## 7. Testing
 - [x] 7.1 Test successful ingestion stores file at expected path
 - [x] 7.2 Test failed ingestion does not leave file in documents dir
 - [x] 7.3 Test file download endpoint returns correct content and headers
 - [x] 7.4 Test document deletion removes stored file
 - [x] 7.5 Test download returns 404 for documents without stored files
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-03-29
@@ -0,0 +1,23 @@
 ## Context
 The engine's `POST /api/v1/reindex` re-embeds all chunks synchronously and returns `{"chunks_reindexed": N, "model": "..."}`. The client has an established confirmation pattern in `remove.go` using `--yes`/`-y` flag.
 ## Goals / Non-Goals
 **Goals:**
 - Add `kb reindex` with confirmation prompt matching `kb remove` pattern
 - Display human-readable and JSON output
 **Non-Goals:**
 - Progress reporting during reindex (engine returns synchronously)
 - Model selection from the client (model is engine-side config)
 ## Decisions
 ### 1. Confirmation prompt before reindex
 Reindex drops and rebuilds the vector table — destructive if interrupted. Use the same `[y/N]` prompt pattern as `kb remove`, skippable with `--yes`/`-y`.
 ### 2. Warn that it may take a while
 The prompt should mention that reindex re-embeds all chunks, so the user knows it's not instant.
@@ -0,0 +1,22 @@
 ## Why
 The engine exposes `POST /api/v1/reindex` but there's no client command for it. Users switching embedding models must use curl directly. Adding `kb reindex` with a confirmation prompt keeps it consistent with other destructive commands like `kb remove`.
 ## What Changes
 - Add `kb reindex` command to the Go client with confirmation prompt (skip with `--yes`/`-y`)
 - Display reindex results (chunks reindexed, model used)
 ## Capabilities
 ### New Capabilities
 (none)
 ### Modified Capabilities
 - `go-client`: Add reindex command requirement
 ## Impact
 - New file: `client/cmd/reindex.go`
@@ -0,0 +1,25 @@
 ## ADDED Requirements
 ### Requirement: Reindex command
 The client SHALL provide a `kb reindex` command that triggers re-embedding of all chunks on the engine. The command SHALL prompt for confirmation before proceeding.
 #### Scenario: Reindex with confirmation
 - **WHEN** the user runs `kb reindex`
 - **THEN** the client SHALL display a warning that all chunks will be re-embedded and prompt `Reindex all chunks? This will re-embed everything. [y/N]`. If confirmed, it SHALL POST to `/api/v1/reindex` and display the result.
 #### Scenario: Reindex with skip confirmation
 - **WHEN** the user runs `kb reindex --yes`
 - **THEN** the client SHALL skip the confirmation prompt and POST to `/api/v1/reindex` immediately
 #### Scenario: Reindex cancelled
 - **WHEN** the user runs `kb reindex` and responds with anything other than `y` or `yes`
 - **THEN** the client SHALL print `Cancelled.` and exit with code 0
 #### Scenario: Reindex human output
 - **WHEN** the reindex completes successfully with default format
 - **THEN** the client SHALL print `Reindexed N chunks (model: <model_name>)`
 #### Scenario: Reindex JSON output
 - **WHEN** the user runs `kb reindex --yes --format json`
 - **THEN** the client SHALL output the raw JSON response from the engine
@@ -0,0 +1,5 @@
 ## 1. Implementation
 - [x] 1.1 Create `client/cmd/reindex.go` with `kb reindex` command, `--yes`/`-y` flag, confirmation prompt matching `remove.go` pattern
 - [x] 1.2 POST to `/api/v1/reindex`, handle human output (`Reindexed N chunks (model: ...)`) and JSON output
 - [x] 1.3 Verify build compiles and command appears in `kb --help`
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-03-29
@@ -0,0 +1,43 @@
 ## Context
 The `add` command currently handles both file uploads and notes via a `--note` string flag. This creates confusing flag parsing and a muddled help screen. The engine already auto-detects file type from extension (`detector.py`) and rejects unsupported ones, so the client's `--type` flag is redundant.
 ## Goals / Non-Goals
 **Goals:**
 - `kb "my note"` as the sole note entry path (replaces `kb add --note`)
 - `kb addfile <path>` as a file-only upload command (replaces `kb add`)
 - Client-side extension validation before uploading
 - Clean, unambiguous help text for both paths
 **Non-Goals:**
 - Engine changes — type detection stays server-side
 - Backward compatibility shim for `kb add` — clean break
 - Client-side MIME type detection — extension check is sufficient
 ## Decisions
 ### Rename add → addfile, strip note/type flags
 Rename the cobra command from `add` to `addfile`. Remove `--note`, `--title`, and `--type` flags. Keep `--tags`, `--recursive`. The command becomes purely about file uploads.
 **Why not keep `add` as an alias?** Clean break is simpler. The old form was confusing — better to force a quick migration than maintain two paths.
 ### Extension validation on single file uploads
 The `supportedExts` map already gates recursive walks. Apply the same check to single file uploads — reject with a clear error listing supported extensions. This gives instant feedback instead of a round-trip to the engine.
 ### Root command RunE for note shorthand
 Use cobra's `Args: cobra.ArbitraryArgs` and `RunE` on the root command. When args are present and no subcommand matched, join all args into a single note string and submit. `--tags` flag on root for tagging notes. No `--title` — keep it minimal.
 **Why join all args?** `kb remember to update dns` (unquoted) should work the same as `kb "remember to update dns"`.
 ### Reuse note submission logic via shared helper
 Extract `submitNote` from the current `runAdd` so both the root command and any future callers use the same POST + duplicate-handling + output logic.
 ## Risks / Trade-offs
 - **Breaking change** → Anyone with `kb add` in scripts needs to update to `kb addfile`. Acceptable for a personal tool.
 - **No `--type` override** → If a user ever needs to force a type, they'd have to go through the engine API directly. Low risk since the engine's auto-detection covers all supported formats.
@@ -0,0 +1,34 @@
 ## Why
 Adding a note requires `kb add --note "my note"` — too much ceremony for what should be instant. The `--note` flag taking a string value also creates confusing flag parsing (e.g. `kb add --note --tags foo` parses `--tags` as the note value). Meanwhile, `kb add` tries to do two things (files and notes) which muddies its help text and UX.
 Splitting these into distinct paths makes the CLI clearer:
 - **Notes**: `kb "my note"` — zero-friction, no subcommand needed
 - **Files**: `kb addfile report.pdf` — explicit, file-only command
 ## What Changes
 - **Add `kb "text"` shorthand**: bare string arguments without a subcommand are treated as notes, submitted via `POST /api/v1/jobs`
 - **Rename `add` → `addfile`**: the command becomes file-only, no more `--note`/`--title` flags
 - **Drop `--type` flag**: the engine already auto-detects type from file extension (`detector.py`); the client doesn't need to override this
 - **Add client-side extension validation**: reject unsupported file extensions with a clear error before uploading, using the same extension set as recursive directory walks
 - **Update README**: document the new shorthand and renamed command
 - **BREAKING**: `kb add` no longer exists; `kb add --note` no longer exists
 ## Capabilities
 ### New Capabilities
 _(none)_
 ### Modified Capabilities
 - `go-client`: Rename `add` to `addfile`, remove `--note`/`--title`/`--type` flags, add extension validation for single file uploads, add implicit note shorthand on root command
 ## Impact
 - `client/cmd/add.go` → renamed/refactored to `addfile` command, stripped of note logic, added extension check
 - `client/cmd/root.go` — bare args handling + `--tags` flag for note shorthand
 - `README.md` — updated usage examples
 - No engine changes — engine already detects type from extension and rejects unsupported files
 - Breaking change for any scripts using `kb add` or `kb add --note`
@@ -0,0 +1,95 @@
 ## ADDED Requirements
 ### Requirement: Implicit note shorthand
 The client SHALL treat bare string arguments (with no subcommand) as an implicit note. `kb "my note"` SHALL behave identically to submitting a note via `POST /api/v1/jobs`. All persistent flags (`--format`, `--engine`, `--api-key`) and the root `--tags` flag SHALL work with the shorthand form.
 #### Scenario: Quick note via bare argument
 - **WHEN** the user runs `kb "remember to update DNS"`
 - **THEN** the client SHALL submit the text as a note via `POST /api/v1/jobs` and print `Queued: note`
 #### Scenario: Bare argument with tags
 - **WHEN** the user runs `kb "server room is building 3" --tags ops`
 - **THEN** the client SHALL submit the note with the specified tags
 #### Scenario: Bare argument with JSON output
 - **WHEN** the user runs `kb "my note" --format json`
 - **THEN** the client SHALL output the raw JSON response from the engine
 #### Scenario: Bare argument duplicate detection
 - **WHEN** the user runs `kb "my note"` and the engine returns HTTP 409
 - **THEN** the client SHALL handle the duplicate response identically to the previous `kb add --note` behaviour
 #### Scenario: Multiple unquoted words
 - **WHEN** the user runs `kb remember to update dns` (without quotes)
 - **THEN** the client SHALL join all arguments into a single note string and submit it
 #### Scenario: No interference with subcommands
 - **WHEN** the user runs `kb search "query"` or any other existing subcommand
 - **THEN** the client SHALL route to the subcommand as before — the implicit note shorthand SHALL NOT interfere
 #### Scenario: No arguments
 - **WHEN** the user runs `kb` with no arguments
 - **THEN** the client SHALL display the help text
 ---
 ## MODIFIED Requirements
 ### Requirement: Add command (file and note ingestion)
 The client SHALL provide a `kb addfile` command that uploads files to the engine for async ingestion. The command SHALL validate file extensions before uploading and reject unsupported types. The client SHALL handle duplicate rejection (HTTP 409) and display the existing document information. The command SHALL NOT handle notes — notes are submitted via the implicit note shorthand (`kb "text"`).
 #### Scenario: Add a single file
 - **WHEN** the user runs `kb addfile report.pdf`
 - **THEN** the client SHALL validate the file extension, upload the file via `POST /api/v1/jobs` (multipart), print "Queued: report.pdf", and exit
 #### Scenario: Add a file with tags
 - **WHEN** the user runs `kb addfile manual.pdf --tags car,maintenance`
 - **THEN** the client SHALL include the tags in the multipart upload metadata
 #### Scenario: Add a directory recursively
 - **WHEN** the user runs `kb addfile ~/documents/ --recursive`
 - **THEN** the client SHALL discover all supported files in the directory tree, upload each one sequentially, and print "Queued: N files"
 #### Scenario: Unsupported file extension
 - **WHEN** the user runs `kb addfile photo.jpg`
 - **THEN** the client SHALL print an error listing supported extensions and exit with a non-zero code without making any API call
 #### Scenario: Duplicate file rejected (already ingested)
 - **WHEN** the user runs `kb addfile report.pdf` and the engine returns HTTP 409 with `{"error": "duplicate", "document_id": 42, "title": "report.pdf"}`
 - **THEN** the client SHALL print "Already imported: report.pdf (doc ID: 42)" and exit with code 0
 #### Scenario: Duplicate file rejected (in-flight job)
 - **WHEN** the user runs `kb addfile report.pdf` and the engine returns HTTP 409 with `{"error": "duplicate", "job_id": 7, "title": "report.pdf"}`
 - **THEN** the client SHALL print "Already queued: report.pdf (job ID: 7)" and exit with code 0
 #### Scenario: Duplicate file in recursive add
 - **WHEN** the user runs `kb addfile ~/documents/ --recursive` and some files are rejected as duplicates
 - **THEN** the client SHALL print the duplicate message for each rejected file, continue uploading remaining files, and include a summary (e.g., "Queued: 5 files, 2 duplicates skipped")
 #### Scenario: Duplicate with JSON output
 - **WHEN** the user runs `kb addfile report.pdf --format json` and the engine returns HTTP 409
 - **THEN** the client SHALL output the raw JSON response from the engine including the document_id and title
 #### Scenario: Add with JSON output
 - **WHEN** the user runs `kb addfile report.pdf --format json`
 - **THEN** the client SHALL output the JSON response from the engine including the job_id
 #### Scenario: File not found
 - **WHEN** the user runs `kb addfile nonexistent.pdf`
 - **THEN** the client SHALL print an error and exit with a non-zero code without making any API call
 #### Scenario: Upload failure
 - **WHEN** the upload fails (network error, engine returns 4xx/5xx other than 409)
 - **THEN** the client SHALL print the error and exit with a non-zero code
 ## REMOVED Requirements
 ### Requirement: Note ingestion via add command
 **Reason**: Notes are now submitted via the implicit note shorthand (`kb "text"`). The `--note` and `--title` flags on the add command are removed.
 **Migration**: Use `kb "my note"` or `kb "my note" --tags ops` instead of `kb add --note "my note" --tags ops`.
 ### Requirement: Document type override via add command
 **Reason**: The engine auto-detects document type from file extension (`detector.py`). The client `--type` flag is redundant.
 **Migration**: Remove `--type` from scripts. The engine handles type detection automatically.
@@ -0,0 +1,19 @@
 ## 1. Refactor note submission
 - [x] 1.1 Extract note submission logic from `runAdd` into a shared `submitNote` helper (multipart POST, duplicate detection, output formatting)
 ## 2. Root command shorthand
 - [x] 2.1 Add `Args: cobra.ArbitraryArgs` and `RunE` to the root command — join args into a note string, call `submitNote`; show help when no args
 - [x] 2.2 Add `--tags` flag on the root command for note tagging
 ## 3. Rename add → addfile
 - [x] 3.1 Rename command from `add` to `addfile` (`Use: "addfile <path>"`)
 - [x] 3.2 Remove `--note`, `--title`, and `--type` flags from the command
 - [x] 3.3 Add extension validation for single file uploads — reject unsupported extensions with a clear error listing supported types
 ## 4. Documentation and verification
 - [x] 4.1 Update README.md usage section: show `kb "text"` shorthand, rename `add` references to `addfile`
 - [x] 4.2 Verify build compiles, `kb --help` and `kb addfile --help` show expected output
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-03-28
@@ -0,0 +1,93 @@
 ## Context
 Currently the project uses a single version number shared between client and engine, managed by `release.sh`. Both `client/VERSION` and `engine/VERSION` are always bumped to the same value. A single git tag `vX.Y.Z` is created, and a single Gitea release bundles Go client binaries and Docker engine image references. This means any change to either component forces a full release of both.
 The client is a Go binary distributed as platform-specific downloads. The engine is a Python FastAPI server distributed as Docker images. They communicate over HTTP via `/api/v1/` endpoints. The engine already exposes its version via `GET /api/v1/status` → `{"version": "X.Y.Z", ...}`.
 ## Goals / Non-Goals
 **Goals:**
 - Allow client and engine to have independent version numbers and release cadences
 - Provide a runtime compatibility check so users get a clear error when their client is too new for their engine
 - Split release tooling so each component can be released without touching the other
 **Non-Goals:**
 - API versioning beyond the existing `/api/v1/` path prefix
 - Backward-compatible negotiation or feature detection (client either works or fails)
 - Automatic upgrades or update notifications
 - Version checking in the other direction (engine requiring minimum client)
 ## Decisions
 ### 1. Tag naming: `client-vX.Y.Z` and `engine-vX.Y.Z`
 Prefix-style tags clearly identify which component a release belongs to and sort well in git tag listings.
 **Why over path-style (`client/vX.Y.Z`):** Slashes in git tags can cause issues with some tooling and are less conventional. Prefix-style is simpler and widely used in monorepos.
 **Why over separate repos:** The project is small and tightly coupled at the API level. A monorepo with prefixed tags keeps everything together while allowing independent releases.
 ### 2. Two release scripts: `release-client.sh` and `release-engine.sh`
 Each script handles its own component end-to-end: version bump, build, tag, release, push.
 **Why over a single script with flags:** Two simple scripts are easier to understand and maintain than one script with component-selection logic. Each script is ~100 lines instead of one ~200-line script with branching. The shared logic (version helpers, pre-flight checks) is minimal and acceptable to duplicate.
 **Shared structure for both scripts:**
 1. Pre-flight checks (on main branch, tag doesn't exist)
 2. Version bump (reads/writes component's VERSION file only)
 3. Build artifacts (Go binaries or Docker images)
 4. Commit version bump, create prefixed tag, push
 5. Create Gitea release with assets
 6. (Engine only) Push Docker images
 ### 3. `MinEngineVersion` as a build-time constant in the Go client
 The client embeds a `MinEngineVersion` string constant alongside the existing `Version` constant. It is set via `-ldflags` at build time, sourced from a `client/MIN_ENGINE_VERSION` file.
 **Why a separate file over embedding in `VERSION`:** The two values have different lifecycles. `VERSION` changes every release; `MIN_ENGINE_VERSION` changes only when the client starts using a new engine feature. A separate file makes the intent clear.
 **Why ldflags over hardcoding in Go source:** Consistent with how `Version` is already injected. The value lives in a plain text file that's easy to bump manually.
 ### 4. Compatibility check on every API call via the `Client` struct
 The `api.Client` checks engine compatibility on its first HTTP call by hitting `GET /api/v1/status` and comparing the `version` field against `MinEngineVersion`. The result is cached on the `Client` instance — subsequent calls skip the check.
 **Flow:**
 1. First call to any `Client` method (Get/Post/Delete/Put)
 2. Before the actual request, call `GET /api/v1/status`
 3. Parse `version` from response
 4. Compare against `MinEngineVersion` using semver major.minor.patch comparison
 5. If engine version < min: print error to stderr, `os.Exit(1)`
 6. If check passes: set `versionChecked = true`, proceed with original request
 7. If status endpoint unreachable: proceed with original request (connectivity error will surface on the actual call)
 **Why hard fail, no skip flag:** This is a personal tool. If the client needs a newer engine, the user needs to update. A skip flag adds complexity for a scenario where the outcome (broken behavior) is worse than the error.
 **Why check on first API call, not at startup:** The `PersistentPreRunE` in cobra runs before every command, but some future commands might not need the engine (e.g. `kb version`, `kb help`). Checking in the `Client` ensures we only check when actually contacting the engine.
 **Why proceed when status endpoint is unreachable:** If we can't reach `/status`, the actual API call will also fail with a connection error. No point in double-failing. The compatibility check is for version mismatch, not connectivity.
 ### 5. Compose files: use `build:` context, not pinned image tags
 The compose files currently use `build:` directives, not pre-built image references. Users who build locally don't need pinned tags — they're building from source. Users pulling pre-built images will reference the image tag directly in their own compose file or `docker run` command.
 **Decision:** Leave compose files as-is. Release notes for engine releases will include the exact `docker pull` command with the versioned tag.
 ### 6. Semver comparison: major.minor.patch, no pre-release
 Compare versions as three integers. No support for pre-release suffixes (`-rc1`, `-beta`) — the project doesn't use them. If `MinEngineVersion` is `2.1.0` and engine reports `2.1.5`, the check passes. If engine reports `2.0.9`, it fails.
 ## Risks / Trade-offs
 - **Extra HTTP round-trip on first command** — One additional `GET /api/v1/status` call per client invocation. Negligible for a local-network tool.
  → Mitigation: Cached after first check within the Client instance.
 - **Developer must remember to bump `MIN_ENGINE_VERSION`** — When adding client code that depends on a new engine endpoint/field, the developer must manually update the file.
  → Mitigation: This is a conscious decision point. The file's existence serves as a reminder. Could add a CI check later if needed.
 - **Breaking change to git tag format** — Existing `v2.0.x` tags won't match the new `client-v*` / `engine-v*` convention. Old tags remain in history.
  → Mitigation: No migration needed. Old tags stay as historical artifacts. New convention starts from the first independent release.
 - **Two Gitea releases per coordinated release** — When both components change, two releases are created instead of one.
  → Mitigation: Acceptable trade-off. Each release is self-contained with its own assets and notes.
@@ -0,0 +1,32 @@
 ## Why
 Client and engine are currently locked to the same version number and released together via a single script. This means a client-only bug fix (e.g. output formatting) forces a full engine Docker image rebuild and push, and vice versa. Decoupling versions allows each component to be released independently on its own cadence, while a compatibility check ensures users don't run a client that requires engine features not yet deployed.
 ## What Changes
 - **Separate version files** — `client/VERSION` and `engine/VERSION` may diverge (they already exist as separate files, but are currently always set to the same value)
 - **Split release script** — Replace single `release.sh` with `release-client.sh` (builds Go binaries, tags `client-vX.Y.Z`, creates release) and `release-engine.sh` (builds Docker images, tags `engine-vX.Y.Z`, creates release, pushes images)
 - **Client compatibility check** — Client embeds a `MinEngineVersion` constant (set at build time or in code). On every command that contacts the engine, the client calls `GET /api/v1/status`, compares the engine's reported version against `MinEngineVersion`, and hard-fails with an actionable error if the engine is too old. No skip flag, no warning — just a clear error with upgrade instructions.
 - **Tag naming convention** — `client-vX.Y.Z` and `engine-vX.Y.Z` replace the current `vX.Y.Z` tag format. **BREAKING** — existing tag format changes.
 ## Capabilities
 ### New Capabilities
 (none)
 ### Modified Capabilities
 - `go-client`: Add engine version compatibility check requirement (hard fail if engine version < MinEngineVersion)
 - `engine-api`: Status endpoint already returns `version` — no change needed, but delta spec documents the contract that the version field is required for compatibility checking
 - `docker-deployment`: Compose files pin engine image tag; release script changes affect image tagging
 ## Impact
 - `release.sh` — replaced by `release-client.sh` + `release-engine.sh`
 - `client/cmd/root.go` — new `MinEngineVersion` constant
 - `client/internal/api/client.go` — version check on first API call
 - `client/Makefile` — may inject `MinEngineVersion` via ldflags alongside `Version`
 - Git tags — new naming convention (`client-v*`, `engine-v*`)
 - Gitea releases — two separate releases per independent release cycle
 - `engine/compose.nvidia.yaml`, `engine/compose.rocm.yaml` — add pinned image tag
@@ -0,0 +1,25 @@
 ## MODIFIED Requirements
 ### Requirement: Compose files for deployment
 The project SHALL provide Docker Compose files for single-command deployment. Compose files SHALL use `build:` context for local development. Release notes SHALL document the versioned image tag for users pulling pre-built images.
 #### Scenario: Start NVIDIA deployment
 - **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
 - **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
 #### Scenario: Start ROCm deployment
 - **WHEN** an admin runs `docker compose -f compose.rocm.yaml up -d`
 - **THEN** the engine SHALL start with GPU access via ROCm device passthrough, bind-mount the data directory, and be reachable on the configured port
 #### Scenario: Automatic restart
 - **WHEN** the engine process crashes or the host reboots
 - **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)
 #### Scenario: Configure via environment
 - **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, etc.)
 - **THEN** the engine SHALL use those values
 #### Scenario: Pre-built image deployment
 - **WHEN** an admin wants to use a pre-built engine image without building from source
 - **THEN** the engine release notes SHALL include the exact `docker pull` command with the versioned tag (e.g. `docker.dcglab.co.uk/dcg/kb/engine:engine-v2.1.0-nvidia`)
@@ -0,0 +1,13 @@
 ## MODIFIED Requirements
 ### Requirement: Engine status and reindex
 The engine SHALL provide status information and support re-embedding all chunks. The `version` field in the status response SHALL always be present and SHALL reflect the engine's release version as read from the `VERSION` file. This field is the contract used by clients for compatibility checking.
 #### Scenario: Get engine status
 - **WHEN** a client sends `GET /api/v1/status`
 - **THEN** the engine SHALL return JSON with `version` (string, from VERSION file), model_name, embedding_dim, GPU device info, database stats (document count by type, total chunks, DB size), and queue stats (queued/processing job count)
 #### Scenario: Trigger reindex
 - **WHEN** a client sends `POST /api/v1/reindex`
 - **THEN** the engine SHALL re-embed all existing chunks using the currently loaded model and return progress information. This operation SHALL NOT block search queries.
@@ -0,0 +1,45 @@
 ## ADDED Requirements
 ### Requirement: Engine version compatibility check
 The client SHALL verify that the connected engine meets a minimum version requirement before executing any API command. The minimum required engine version SHALL be embedded in the client binary at build time. If the engine version is below the minimum, the client SHALL print an error message and exit with a non-zero code. There SHALL be no flag to skip or suppress this check.
 #### Scenario: Compatible engine version
 - **WHEN** the client connects to an engine reporting version `2.1.5` and `MinEngineVersion` is `2.1.0`
 - **THEN** the client SHALL proceed with the command normally
 #### Scenario: Incompatible engine version
 - **WHEN** the client connects to an engine reporting version `2.0.3` and `MinEngineVersion` is `2.1.0`
 - **THEN** the client SHALL print to stderr: `Error: kb client vX.Y.Z requires engine v2.1.0+ (connected engine is v2.0.3)` followed by an upgrade hint, and exit with code 1
 #### Scenario: Engine unreachable during version check
 - **WHEN** the client cannot reach the engine's `/api/v1/status` endpoint
 - **THEN** the client SHALL skip the version check and proceed with the original command (the actual API call will surface the connectivity error)
 #### Scenario: Version check is cached per session
 - **WHEN** the client has already verified engine compatibility during the current invocation
 - **THEN** subsequent API calls within the same invocation SHALL NOT repeat the version check
 #### Scenario: Client version command does not check engine
 - **WHEN** the user runs `kb --version`
 - **THEN** the client SHALL print the client version without contacting the engine
 #### Scenario: MinEngineVersion not set
 - **WHEN** the client binary has `MinEngineVersion` set to empty string or `dev`
 - **THEN** the client SHALL skip the version check entirely (development builds)
 ---
 ## MODIFIED Requirements
 ### Requirement: Single static binary with zero runtime dependencies
 The Go client SHALL compile to a single static binary with no runtime dependencies. It SHALL support cross-compilation for Linux (amd64, arm64), macOS (amd64, arm64), and Windows (amd64). The build SHALL inject both `Version` and `MinEngineVersion` via ldflags.
 #### Scenario: Install on a clean machine
 - **WHEN** a user downloads the `kb` binary for their platform
 - **THEN** they SHALL be able to run it immediately with no additional installs (no Python, no Docker, no shared libraries)
 #### Scenario: Version and compatibility info embedded at build time
 - **WHEN** the client is built with `make all VERSION=2.1.0 MIN_ENGINE_VERSION=2.0.0`
 - **THEN** `kb --version` SHALL report `2.1.0` and the compatibility check SHALL use `2.0.0` as the minimum engine version
@@ -0,0 +1,35 @@
 ## 1. Client Compatibility Check
 - [x] 1.1 Create `client/MIN_ENGINE_VERSION` file with initial value `2.0.0`
 - [x] 1.2 Add `MinEngineVersion` variable to `client/cmd/root.go` (set via ldflags, default `dev`)
 - [x] 1.3 Update `client/Makefile` to read `MIN_ENGINE_VERSION` file and inject via `-ldflags "-X cmd.MinEngineVersion=..."` alongside existing `Version`
 - [x] 1.4 Add `CheckEngineVersion(minVersion string)` method to `client/internal/api/client.go` that calls `GET /api/v1/status`, parses `version` field, and compares against `minVersion` using semver major.minor.patch
 - [x] 1.5 Add `versionChecked bool` field to `Client` struct; guard `CheckEngineVersion` so it runs at most once per Client instance
 - [x] 1.6 Call `CheckEngineVersion` at the start of `Client.do()` (before executing the actual request); skip if `MinEngineVersion` is empty or `dev`
 - [x] 1.7 On version mismatch: print `Error: kb client vX.Y.Z requires engine vM.N.P+ (connected engine is vA.B.C)\nUpdate your engine image to engine-vM.N.P or later.` to stderr and `os.Exit(1)`
 - [x] 1.8 On status endpoint unreachable: skip version check silently (let the actual request surface the error)
 ## 2. Release Script — Client
 - [x] 2.1 Create `release-client.sh` extracting client-specific logic from `release.sh`: version bump of `client/VERSION`, Go binary build, git tag `client-vX.Y.Z`, Gitea release with binary assets
 - [x] 2.2 Release notes template: include `MinEngineVersion` requirement (e.g. "Requires engine v2.0.0+")
 - [x] 2.3 Pass `MIN_ENGINE_VERSION` to `make all` in the build step
 ## 3. Release Script — Engine
 - [x] 3.1 Create `release-engine.sh` extracting engine-specific logic from `release.sh`: version bump of `engine/VERSION`, Docker image build (nvidia + rocm), git tag `engine-vX.Y.Z`, Gitea release, image push
 - [x] 3.2 Release notes template: include Docker pull commands with `engine-vX.Y.Z` prefixed tags
 ## 4. Cleanup
 - [x] 4.1 Remove old `release.sh` (replaced by the two new scripts)
 - [x] 4.2 Update Docker image tag format in release scripts from `vX.Y.Z-nvidia` to `engine-vX.Y.Z-nvidia` (and same for rocm/latest)
 ## 5. Testing
 - [x] 5.1 Test client version check passes when engine version >= MinEngineVersion
 - [x] 5.2 Test client version check fails with correct error message when engine version < MinEngineVersion
 - [x] 5.3 Test client skips version check when MinEngineVersion is empty or `dev`
 - [x] 5.4 Test client skips version check when engine is unreachable
 - [x] 5.5 Dry-run `release-client.sh --dry-run --gitea` and verify correct tag format and build
 - [x] 5.6 Dry-run `release-engine.sh --dry-run --gitea` and verify correct tag format and image names
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-03-29
@@ -0,0 +1,69 @@
 ## Context
 When a document is ingested, the worker chunks its content and stores each chunk's text in the `chunks` table. FTS5 triggers index that text, and the embedding model embeds it. The document title is stored only in `documents.title` — it never participates in search. This means short documents (or documents whose content lacks the title keywords) are invisible to queries that match the title.
 The reindex endpoint (`POST /api/v1/reindex`) currently reads `chunks.text` and re-embeds it. Any fix must apply consistently at both ingestion and reindex time.
 ## Goals / Non-Goals
 **Goals:**
 - Document titles are searchable via both FTS5 and vector search
 - Section header breadcrumbs (when present in chunk metadata) are also searchable
 - Search results continue to return the original chunk text (no title prefix in the `text` field returned to clients)
 - Existing documents become searchable by title after a `kb reindex`
 - No schema-breaking migration — additive column only
 **Non-Goals:**
 - Changing the chunking strategies themselves (note, markdown, code, docling)
 - Adding a separate title-search endpoint or client-side title filtering
 - Changing the search result JSON structure
 ## Decisions
 ### 1. Add an `enriched_text` column to the `chunks` table
 Store the title-prefixed text in a new `chunks.enriched_text` column alongside the existing `chunks.text`. The `text` column remains the raw chunk content (used for display in search results). The `enriched_text` column holds `"{title}\n\n{section_header}\n\n{text}"` (with section_header omitted when absent).
 **Why not just modify `chunks.text`?** The title would then appear in every search result's text field, which is redundant (title is already a separate field) and would confuse consumers that display results.
 **Why not reconstruct enriched text on-the-fly at search time?** FTS5 uses an external content table and triggers — it needs a real column to index. Reconstructing via JOIN at FTS query time would defeat the purpose of the FTS index.
 ### 2. Point FTS5 at `enriched_text` instead of `text`
 Update the FTS5 virtual table definition and its sync triggers to index `enriched_text` rather than `text`. This is the core change that makes titles searchable via keyword search.
 Since FTS5 external content tables cannot be ALTERed, existing databases require a rebuild: drop and recreate `chunks_fts` and its triggers, then repopulate. This is handled as a schema migration in `init_schema`.
 ### 3. Embed `enriched_text` instead of `text`
 At ingestion time, pass `enriched_text` values to `embed_texts()` instead of raw chunk text. At reindex time, read `enriched_text` from the database. This makes titles searchable via vector similarity too.
 ### 4. Build enriched text in the worker, not in the ingest modules
 The enrichment format is: `"{title}\n\n{chunk_text}"` or `"{title} > {section_header}\n\n{chunk_text}"` when a section header exists in chunk metadata.
 This happens in `worker._process_job()` after chunking and before embedding/insertion. The ingest modules remain unchanged — they continue to return raw chunk text and metadata.
 ### 5. Schema migration adds `enriched_text` and rebuilds FTS
 The `init_schema` function will:
 1. Add `enriched_text TEXT` column to `chunks` if missing
 2. Backfill `enriched_text` from existing data (join with `documents.title` and chunk metadata)
 3. Drop and recreate `chunks_fts` to index `enriched_text` instead of `text`
 4. Recreate the FTS sync triggers
 This is safe because the migration only runs when the column is missing (first startup after upgrade). The backfill uses a single UPDATE...FROM query.
 ## Risks / Trade-offs
 **Slightly larger database** — Each chunk stores the title string twice (once in `enriched_text`, once via the document FK). For a typical KB with short titles this is negligible (< 1% size increase).
 → Acceptable for the search quality improvement.
 **FTS rebuild on upgrade** — First startup after upgrade will rebuild the FTS index, which takes a few seconds for large KBs.
 → This is a one-time cost and happens automatically.
 **Embedding drift** — Existing vector embeddings won't include title context until `kb reindex` is run. The FTS backfill happens automatically, but vectors require an explicit reindex.
 → Document this in release notes. The FTS improvement alone is a significant win even without reindexing vectors.
 **Title changes not propagated** — If a document's title were ever updated, `enriched_text` would be stale. Currently the engine has no title-update endpoint, so this is not a concern.
 → No mitigation needed now. If title editing is added later, it should update enriched_text.
@@ -0,0 +1,28 @@
 ## Why
 Short documents and notes are unsearchable when the user's query matches the document title but not the chunk content. For example, a document titled "Suitcase Locks" containing only "Steve = 1234 / Theresa = 4567" is invisible to both FTS and vector search for the query "suitcase locks". This is because chunk text — the only thing indexed and embedded — does not include the document title. This is a standard RAG deficiency that most pipelines solve by prepending title context to each chunk.
 ## What Changes
 - **Prepend document title to chunk text at ingestion time**: Before embedding and FTS indexing, each chunk's text will be prefixed with the document title (e.g., `"Suitcase Locks\n\n Steve = 363..."`). This ensures the title participates in both full-text and semantic search.
 - **Include section header context in chunk text**: For chunks that have a `section_header` in their metadata, prepend the header breadcrumb too (e.g., `"DCG Lab Hardware > GRIMDAWN > motherboard\n\nMSI X870 Tomahawk..."`). This improves search for queries that reference section names.
 - **Store the raw chunk text separately from the enriched text**: The original chunk text (without title prefix) must remain accessible so that search results don't display the prepended title redundantly — the title is already returned as a separate field.
 - **Reindex command must apply the same enrichment**: When `kb reindex` re-embeds all chunks, it must reconstruct the enriched text (title + section header + chunk text) from stored metadata.
 ## Capabilities
 ### New Capabilities
 - `chunk-enrichment`: Prepending document title and section context to chunk text before indexing and embedding, while preserving the original text for display.
 ### Modified Capabilities
 - `engine-api`: The search endpoint's returned `text` field must continue to show the original chunk text (without the prepended title), so no visible API change, but the internal indexing behaviour changes. The reindex endpoint must apply enrichment consistently.
 ## Impact
 - **Engine ingestion pipeline** (`worker.py`): The `_process_job` function must build enriched text from title + section headers + chunk text before passing to `embed_texts()` and `insert_chunk()`.
 - **Database schema** (`database.py`): Need to store both raw `text` (for display) and enriched `text` (for FTS/embedding), or reconstruct enriched text at index time. Simplest approach: store raw text in `chunks.text`, use enriched text only for FTS content and embedding vectors.
 - **FTS triggers** (`database.py`): The FTS5 external content table currently mirrors `chunks.text`. If we add an `enriched_text` column, the FTS index should be built from that instead.
 - **Reindex flow** (`worker.py` / `database.py`): Must reconstruct enriched text by joining chunk metadata with document title.
 - **Search result enrichment** (`routes/search.py`): No change needed — results already return `chunks.text` (raw) and `documents.title` separately.
 - **All four ingest modules** (`note.py`, `markdown.py`, `code.py`, `docling_pipeline.py`): No changes needed — enrichment happens after chunking, in the worker.
 - **Existing documents**: Require a `reindex` to benefit from the new enrichment. No data migration needed since the original text is preserved.
@@ -0,0 +1,75 @@
 ## ADDED Requirements
 ### Requirement: Chunk text enrichment with document title
 The engine SHALL prepend the document title to each chunk's text before FTS indexing and vector embedding. The enriched text SHALL be stored in a dedicated `enriched_text` column on the `chunks` table. The original chunk text SHALL remain in the `text` column for display purposes.
 The enrichment format SHALL be:
 - Without section header: `"{title}\n\n{chunk_text}"`
 - With section header: `"{title} > {section_header}\n\n{chunk_text}"`
 Where `section_header` is the value from the chunk's metadata `section_header` field, when present.
 #### Scenario: Note ingestion with title enrichment
 - **WHEN** a note titled "Suitcase Locks" with content "Steve = 363" is ingested
 - **THEN** the `chunks.text` column SHALL contain "Steve = 363" and the `chunks.enriched_text` column SHALL contain "Suitcase Locks\n\nSteve = 363"
 #### Scenario: Markdown chunk with section header enrichment
 - **WHEN** a markdown document titled "DCG Lab Hardware" produces a chunk with section_header "GRIMDAWN > motherboard" and text "MSI X870 Tomahawk"
 - **THEN** the `chunks.enriched_text` SHALL contain "DCG Lab Hardware > GRIMDAWN > motherboard\n\nMSI X870 Tomahawk"
 #### Scenario: Chunk without section header
 - **WHEN** a document titled "Docker Tips" produces a chunk with no section_header in metadata and text "dbash() { docker exec -it $1 bash; }"
 - **THEN** the `chunks.enriched_text` SHALL contain "Docker Tips\n\ndbash() { docker exec -it $1 bash; }"
 ---
 ### Requirement: FTS5 indexes enriched text
 The FTS5 virtual table `chunks_fts` SHALL index the `enriched_text` column instead of the `text` column. All FTS sync triggers (insert, update, delete) SHALL operate on `enriched_text`.
 #### Scenario: FTS search matches document title
 - **WHEN** a user searches for "suitcase locks" and a document titled "Suitcase Locks" exists with chunk text "Steve = 363"
 - **THEN** the FTS5 search SHALL return that chunk as a match
 #### Scenario: FTS search still matches chunk content
 - **WHEN** a user searches for "MSI X870" and a chunk contains that text in its body
 - **THEN** the FTS5 search SHALL return that chunk as a match (enrichment does not break content matching)
 ---
 ### Requirement: Vector embeddings use enriched text
 The embedding model SHALL receive `enriched_text` (not raw `text`) when generating vectors during both initial ingestion and reindex operations.
 #### Scenario: Vector search matches document title
 - **WHEN** a user searches semantically for "luggage combination codes" and a document titled "Suitcase Locks" exists
 - **THEN** the vector search SHALL return that chunk with higher similarity than it would without title enrichment
 #### Scenario: Reindex uses enriched text
 - **WHEN** `POST /api/v1/reindex` is called
 - **THEN** the engine SHALL read `enriched_text` from the chunks table and embed that (not `text`)
 ---
 ### Requirement: Schema migration adds enriched_text column
 On startup, `init_schema` SHALL add the `enriched_text` column to the `chunks` table if it does not exist. It SHALL then backfill `enriched_text` for all existing chunks by joining with `documents.title` and parsing chunk metadata for section headers. It SHALL rebuild the FTS5 table and triggers to index `enriched_text`.
 #### Scenario: First startup after upgrade
 - **WHEN** the engine starts and `chunks.enriched_text` column does not exist
 - **THEN** the engine SHALL add the column, backfill all rows, drop and recreate `chunks_fts` to index `enriched_text`, and recreate the FTS sync triggers
 #### Scenario: Subsequent startup
 - **WHEN** the engine starts and `chunks.enriched_text` column already exists
 - **THEN** the engine SHALL not perform any migration and start normally
 ---
 ### Requirement: Search results return raw text
 Search results SHALL continue to return the original chunk text (from `chunks.text`) in the `text` field, not the enriched text. The document title is already returned as a separate `title` field.
 #### Scenario: Search result text field
 - **WHEN** a search returns a chunk from document "Suitcase Locks" with raw text "Steve = 363"
 - **THEN** the result `text` field SHALL be "Steve = 363" (not "Suitcase Locks\n\nSteve = 363")
@@ -0,0 +1,31 @@
 ## MODIFIED Requirements
 ### Requirement: Background ingestion worker
 The engine SHALL run a background worker that processes queued jobs. The worker SHALL process one job at a time. For each job, it SHALL: detect document type, run the appropriate chunking pipeline (Docling for PDFs, header-based for Markdown, AST-based for code, whole-text for notes), build enriched text by prepending the document title (and section header when present) to each chunk's text, generate embeddings using the enriched text and the resident model, insert chunks (with both raw text and enriched text) and vectors into the database, and move the original file to persistent storage.
 #### Scenario: Successful PDF ingestion
 - **WHEN** the background worker picks up a queued PDF job
 - **THEN** it SHALL update the job status to `processing`, run Docling conversion and chunking, build enriched text for each chunk by prepending the document title, embed all chunks using enriched text, insert document and chunks into the database, move the staged file to `{data_dir}/documents/{content_hash}.pdf`, update `documents.stored_path` with the permanent path, store the original filename in `documents.original_filename`, update the job status to `done` with the resulting document_id and chunk count, and clean up the staging entry
 #### Scenario: Ingestion failure
 - **WHEN** the background worker encounters an error during processing (e.g., corrupt PDF)
 - **THEN** it SHALL update the job status to `failed` with the error message, delete the staged file, and continue processing the next queued job
 #### Scenario: Search during active ingestion
 - **WHEN** a search request arrives while the background worker is processing a job
 - **THEN** the search SHALL execute without blocking (SQLite WAL mode) and return results from already-ingested documents
 ---
 ### Requirement: Engine status and reindex
 The engine SHALL provide status information and support re-embedding all chunks. The `version` field in the status response SHALL always be present and SHALL reflect the engine's release version as read from the `VERSION` file. This field is the contract used by clients for compatibility checking.
 #### Scenario: Get engine status
 - **WHEN** a client sends `GET /api/v1/status`
 - **THEN** the engine SHALL return JSON with `version` (string, from VERSION file), model_name, embedding_dim, GPU device info, database stats (document count by type, total chunks, DB size), and queue stats (queued/processing job count)
 #### Scenario: Trigger reindex
 - **WHEN** a client sends `POST /api/v1/reindex`
 - **THEN** the engine SHALL re-embed all existing chunks using the `enriched_text` column and the currently loaded model, and return progress information. This operation SHALL NOT block search queries.
@@ -0,0 +1,33 @@
 ## 1. Schema Migration
 - [x] 1.1 Add `enriched_text TEXT` column to `chunks` table in `database.py:init_schema` (with migration check for existing DBs)
 - [x] 1.2 Write backfill query: `UPDATE chunks SET enriched_text = ... FROM documents` joining title and parsing chunk metadata for section_header
 - [x] 1.3 Drop and recreate `chunks_fts` virtual table to index `enriched_text` instead of `text`
 - [x] 1.4 Update FTS sync triggers (`chunks_ai`, `chunks_ad`, `chunks_au`) to use `enriched_text`
 ## 2. Enrichment Helper
 - [x] 2.1 Create `build_enriched_text(title: str, chunk_text: str, metadata: dict | None) -> str` helper function in `worker.py` (or a shared util) that formats `"{title} > {section_header}\n\n{chunk_text}"` or `"{title}\n\n{chunk_text}"`
 ## 3. Ingestion Pipeline
 - [x] 3.1 Update `worker._process_job()` to build enriched text for each chunk after chunking
 - [x] 3.2 Pass enriched text to `embed_texts()` instead of raw chunk text
 - [x] 3.3 Pass enriched text to `database.insert_chunk()` as the new `enriched_text` parameter
 - [x] 3.4 Update `database.insert_chunk()` to accept and store `enriched_text`
 ## 4. Reindex
 - [x] 4.1 Update `routes/reindex.py` to read `enriched_text` from chunks table and embed that instead of `text`
 ## 5. Search Results
 - [x] 5.1 Verify `search.py:_enrich()` returns `chunks.text` (raw) not `enriched_text` — no change expected, but confirm
 ## 6. Testing
 - [x] 6.1 Test: ingest a short note with a descriptive title, search by title keywords, confirm it is found
 - [x] 6.2 Test: ingest a markdown doc, search by section header, confirm chunks are found
 - [x] 6.3 Test: verify search result `text` field does not contain the prepended title
 - [x] 6.4 Test: run `reindex`, verify enriched text is used for new embeddings
 - [x] 6.5 Test: verify schema migration backfills enriched_text for pre-existing chunks on startup
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-03-29
@@ -0,0 +1,29 @@
 ## Context
 The root cobra command in `client/cmd/root.go` uses `cobra.ArbitraryArgs` and its `RunE` handler to catch any arguments not matching a subcommand. Currently, any non-empty args are joined and submitted as a note. This means a single mistyped word (e.g., `kb infow` instead of `kb info`) silently creates a junk note in the knowledge base.
 ## Goals / Non-Goals
 **Goals:**
 - Prevent single bare words from being silently ingested as notes
 - Provide a clear error message that helps the user correct their input
 - Preserve the multi-word implicit note shorthand (`kb remember to update dns`)
 **Non-Goals:**
 - Detecting "close matches" to real commands (fuzzy matching / did-you-mean)
 - Changing how quoted strings work at the shell level (we can't detect quotes after shell expansion)
 ## Decisions
 ### Guard on argument count in RunE
 When `len(args) == 1`, reject with an error message instead of submitting as a note. When `len(args) > 1`, continue treating as implicit note shorthand.
 **Rationale**: This is the simplest reliable heuristic. The shell strips quotes before cobra sees args, so we cannot distinguish `kb "singleword"` from `kb singleword`. However, single-word notes are rare in practice, and the error message tells the user how to work around it (use multiple words or the full note workflow). Multi-word input is almost certainly intentional note text, not a mistyped command.
 **Alternative considered**: Checking against a list of known subcommand names — rejected because it wouldn't catch typos of commands we don't know about and adds maintenance burden.
 ## Risks / Trade-offs
 - **Single-word notes no longer work via shorthand** → Users must use `kb add --note "singleword"` or include additional words. This is an acceptable trade-off since single-word notes are uncommon and the error message is clear.
 - **Shell quote stripping means we can't be perfect** → `kb "my note"` with exactly one word after quote removal will be rejected. This is a known limitation but very rare in practice.
@@ -0,0 +1,24 @@
 ## Why
 A single unquoted word passed to `kb` (e.g., `kb infow`) is silently treated as a note and ingested. This is almost always a mistyped command, not an intentional note. Users lose trust when typos pollute their knowledge base.
 ## What Changes
 - The implicit note shorthand will require **more than one argument** to be treated as a note. A single bare word will be rejected with a helpful error suggesting the user check their command or quote a multi-word note.
 - This is a **BREAKING** change to the implicit note shorthand: `kb singleword` no longer creates a note. Users must write `kb "singleword is important"` or use multiple words.
 ## Capabilities
 ### New Capabilities
 _(none)_
 ### Modified Capabilities
 - `go-client`: The "Implicit note shorthand" requirement changes to reject single-word bare arguments and print an error instead of submitting them as notes.
 ## Impact
 - **Code**: `client/cmd/root.go` — `RunE` handler for the root command
 - **Tests**: `client/cmd/root_test.go` or equivalent — add/update tests for single-word rejection
 - **Users**: Anyone who intentionally used `kb singleword` as a note shorthand will need to use multiple words or quotes
@@ -0,0 +1,37 @@
 ## MODIFIED Requirements
 ### Requirement: Implicit note shorthand
 The client SHALL treat bare string arguments (with no subcommand) as an implicit note only when **more than one argument** is provided. `kb "my note"` SHALL behave identically to submitting a note via `POST /api/v1/jobs`. All persistent flags (`--format`, `--engine`, `--api-key`) and the root `--tags` flag SHALL work with the shorthand form. A single bare word SHALL be rejected with an error message.
 #### Scenario: Quick note via bare argument
 - **WHEN** the user runs `kb "remember to update DNS"`
 - **THEN** the client SHALL submit the text as a note via `POST /api/v1/jobs` and print `Queued: note`
 #### Scenario: Bare argument with tags
 - **WHEN** the user runs `kb "server room is building 3" --tags ops`
 - **THEN** the client SHALL submit the note with the specified tags
 #### Scenario: Bare argument with JSON output
 - **WHEN** the user runs `kb "my note" --format json`
 - **THEN** the client SHALL output the raw JSON response from the engine
 #### Scenario: Bare argument duplicate detection
 - **WHEN** the user runs `kb "my note"` and the engine returns HTTP 409
 - **THEN** the client SHALL handle the duplicate response identically to the previous `kb add --note` behaviour
 #### Scenario: Multiple unquoted words
 - **WHEN** the user runs `kb remember to update dns` (without quotes)
 - **THEN** the client SHALL join all arguments into a single note string and submit it
 #### Scenario: Single bare word rejected
 - **WHEN** the user runs `kb infow` (a single unrecognized word)
 - **THEN** the client SHALL print to stderr: `Unknown command "infow". Run 'kb --help' for available commands.` followed by a hint about note usage, and exit with a non-zero code
 #### Scenario: No interference with subcommands
 - **WHEN** the user runs `kb search "query"` or any other existing subcommand
 - **THEN** the client SHALL route to the subcommand as before — the implicit note shorthand SHALL NOT interfere
 #### Scenario: No arguments
 - **WHEN** the user runs `kb` with no arguments
 - **THEN** the client SHALL display the help text
@@ -0,0 +1,10 @@
 ## 1. Core Implementation
 - [x] 1.1 Update `RunE` in `client/cmd/root.go` to reject single-word bare arguments with an error message and non-zero exit
 - [x] 1.2 Update usage template in `root.go` to reflect that note shorthand requires multiple words
 ## 2. Tests
 - [x] 2.1 Add test: single bare word prints error to stderr and exits non-zero
 - [x] 2.2 Add test: multiple bare words are submitted as a note (existing behavior preserved)
 - [x] 2.3 Add test: zero arguments shows help (existing behavior preserved)
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-03-31
@@ -0,0 +1,52 @@
 ## Context
 README.md currently serves as a single documentation file for both users and developers. It contains ~290 lines mixing installation/usage instructions with build-from-source steps, release scripts, Docker image internals, and developer notes (e.g., ROCm migration plans). There is no DEVELOPER.md or CONTRIBUTING.md file.
 ## Goals / Non-Goals
 **Goals:**
 - Separate user-facing documentation (README.md) from developer-facing documentation (DEVELOPER.md)
 - README.md should answer: "What is this? How do I install it? How do I use it?"
 - DEVELOPER.md should answer: "How do I build from source? How do I release? How do I contribute?"
 - Provide a clear cross-reference link between the two files
 **Non-Goals:**
 - Rewriting or improving documentation content itself (just moving it)
 - Creating additional docs files (CONTRIBUTING.md, architecture docs, etc.)
 - Changing any code, build scripts, or CI configuration
 ## Decisions
 ### 1. Single DEVELOPER.md file (not multiple docs files)
 All developer content goes into one top-level DEVELOPER.md rather than a `docs/` directory or separate CONTRIBUTING.md / BUILDING.md files. The total developer content is small enough (~80 lines) that splitting further would be unnecessary overhead. A single file at the repo root is immediately discoverable.
 **Alternative considered**: `docs/` directory with multiple files. Rejected because the content volume doesn't justify the structure, and root-level DEVELOPER.md is a well-known convention.
 ### 2. Content split boundary
 Content stays in README.md if it's needed by someone who just wants to **run** kb. Content moves to DEVELOPER.md if it's only needed by someone who wants to **build, modify, or release** kb.
 Specifically moving to DEVELOPER.md:
 - "From source" subsections under both engine and client install
 - Entire "Building and releasing" section (release scripts, version checking, Docker image tags, registry overrides)
 - "Future: ROCm runtime migration" developer note
 Staying in README.md:
 - Architecture overview (helps users understand what they're running)
 - Pre-built image / release install instructions
 - Client configuration
 - Usage examples
 - Engine configuration table
 - Data portability
 - API reference
 - Claude Code skill reference
 ### 3. Cross-reference approach
 A short note in README.md's Quick Start section pointing to DEVELOPER.md for building from source. No back-link needed from DEVELOPER.md since developers will naturally find README.md first.
 ## Risks / Trade-offs
 - **[Stale cross-references]** If DEVELOPER.md sections are renamed, the link from README.md could break. Mitigation: link to the file, not to a specific anchor.
 - **[Discoverability]** Some users who want to build from source might miss DEVELOPER.md. Mitigation: explicit "See DEVELOPER.md" callout in the Quick Start section where "from source" instructions used to be.
@@ -0,0 +1,28 @@
 ## Why
 README.md currently mixes user-facing content (what kb does, how to install and use it) with developer-facing content (building from source, releasing, Docker image internals, architecture deep-dives). Users looking for quick-start instructions have to scroll past release scripts and build commands. Developers looking for contribution/build info have to hunt through user docs. Splitting these into README.md (users) and DEVELOPER.md (developers/contributors) follows standard open-source convention and makes both audiences' experience cleaner.
 ## What Changes
 - **Trim README.md** to focus on user-facing content: what kb is, how to install (from pre-built images/releases), how to configure, how to use, engine configuration reference, data portability, and API reference.
 - **Remove "from source" build instructions** from README.md (both engine and client sections).
 - **Remove "Building and releasing" section** from README.md entirely.
 - **Remove "Future: ROCm runtime migration"** developer note from README.md.
 - **Create DEVELOPER.md** containing: building engine from source, building client from source, release process (client and engine), Docker image details, version checking, ROCm migration notes, and any other contributor-oriented content.
 - **Add a link** from README.md to DEVELOPER.md for developers who want to build from source or contribute.
 ## Capabilities
 ### New Capabilities
 - `developer-docs`: Developer-facing documentation covering building from source, releasing, and contributing.
 ### Modified Capabilities
 (none - no spec-level behavior changes, this is a documentation restructuring)
 ## Impact
 - **Files modified**: `README.md` (trimmed)
 - **Files created**: `DEVELOPER.md` (new)
 - **No code changes**: purely documentation restructuring
 - **No API changes**: no functional impact
@@ -0,0 +1,63 @@
 ## ADDED Requirements
 ### Requirement: DEVELOPER.md exists at repo root
 The repository SHALL have a `DEVELOPER.md` file at the project root containing all developer-facing documentation.
 #### Scenario: File exists
 - **WHEN** a developer navigates to the repository root
 - **THEN** a `DEVELOPER.md` file SHALL be present
 ### Requirement: DEVELOPER.md contains build-from-source instructions
 DEVELOPER.md SHALL contain instructions for building both the engine and client from source.
 #### Scenario: Engine build from source
 - **WHEN** a developer reads DEVELOPER.md
 - **THEN** it SHALL include instructions for starting the engine from source using compose files (both NVIDIA and ROCm)
 #### Scenario: Client build from source
 - **WHEN** a developer reads DEVELOPER.md
 - **THEN** it SHALL include instructions for building the client binary from source using `make build` and `make all`
 ### Requirement: DEVELOPER.md contains release process
 DEVELOPER.md SHALL document the release process for both client and engine, including release scripts, version bumping, and Docker image tagging.
 #### Scenario: Client release documentation
 - **WHEN** a developer reads DEVELOPER.md
 - **THEN** it SHALL include `release-client.sh` usage with flag options (--gitea, --github, --minor, --no-increment, --dry-run)
 #### Scenario: Engine release documentation
 - **WHEN** a developer reads DEVELOPER.md
 - **THEN** it SHALL include `release-engine.sh` usage with flag options and Docker image tag conventions
 #### Scenario: Version checking
 - **WHEN** a developer reads DEVELOPER.md
 - **THEN** it SHALL include how to check client and engine versions
 ### Requirement: DEVELOPER.md contains developer notes
 DEVELOPER.md SHALL include any forward-looking developer notes such as migration plans or technical debt items.
 #### Scenario: ROCm migration note
 - **WHEN** a developer reads DEVELOPER.md
 - **THEN** it SHALL include the ROCm runtime migration note about onnxruntime and MIGraphX
 ### Requirement: README.md excludes developer-only content
 README.md SHALL NOT contain build-from-source instructions, release processes, or developer-only notes.
 #### Scenario: No from-source build steps in README
 - **WHEN** a user reads README.md
 - **THEN** there SHALL be no "From source" subsections under engine or client installation
 #### Scenario: No release section in README
 - **WHEN** a user reads README.md
 - **THEN** there SHALL be no "Building and releasing" section
 #### Scenario: No developer notes in README
 - **WHEN** a user reads README.md
 - **THEN** there SHALL be no "Future: ROCm runtime migration" section
 ### Requirement: README.md cross-references DEVELOPER.md
 README.md SHALL include a link to DEVELOPER.md for users who want to build from source or contribute.
 #### Scenario: Developer link in quick start
 - **WHEN** a user reads the Quick Start section of README.md
 - **THEN** there SHALL be a note pointing to DEVELOPER.md for building from source
@@ -0,0 +1,17 @@
 ## 1. Create DEVELOPER.md
 - [x] 1.1 Create DEVELOPER.md at repo root with engine build-from-source instructions (compose.nvidia.yaml and compose.rocm.yaml)
 - [x] 1.2 Add client build-from-source instructions (make build, make all)
 - [x] 1.3 Add "Building and releasing" section: release-client.sh and release-engine.sh usage with all flag options
 - [x] 1.4 Add version checking instructions (kb --version, curl status endpoint)
 - [x] 1.5 Add Docker image tag conventions and registry override documentation
 - [x] 1.6 Add "Future: ROCm runtime migration" developer note
 ## 2. Trim README.md
 - [x] 2.1 Remove "From source (for development)" subsection under engine quick start
 - [x] 2.2 Remove "From source (for development)" subsection under client installation
 - [x] 2.3 Remove entire "Building and releasing" section
 - [x] 2.4 Remove "Future: ROCm runtime migration" section
 - [x] 2.5 Add cross-reference note to DEVELOPER.md in the Quick Start section for building from source
 - [x] 2.6 Move API reference section from README.md to DEVELOPER.md
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-03-31
@@ -0,0 +1,51 @@
 ## Context
 The kb client currently overloads the root Cobra command to handle both command dispatch and implicit note ingestion. Any unrecognized multi-word input is silently submitted as a note via `POST /api/v1/jobs`. This was introduced to reduce friction for note-taking but has proven error-prone — typos in commands create unwanted notes. A single-word guard was added but multi-word typos still slip through.
 The root command has: custom `ArbitraryArgs` validation, a `RunE` with arg-count branching, a `--tags` flag for the note shorthand, a custom usage template with `isRootCmd` template function, and `submitNote()` living in `add.go`.
 ## Goals / Non-Goals
 **Goals:**
 - Eliminate accidental note creation from mistyped commands
 - Provide a clean, explicit `addnote` command that pairs with existing `addfile`
 - Revert root command to standard Cobra behaviour (no custom args, no custom template)
 - Keep the same API contract — `POST /api/v1/jobs` with `note` field unchanged
 **Non-Goals:**
 - Changing the engine API
 - Modifying `addfile` behaviour
 - Adding new content types (url, bookmark, etc.)
 - Backward compatibility shim for `kb "text"` syntax
 ## Decisions
 ### 1. New `addnote` command in its own file
 Create `client/cmd/addnote.go` with a `cobra.Command` that takes `ExactArgs(1)` — a single quoted string. This mirrors `addfile` which also takes `ExactArgs(1)`.
 **Rationale**: Keeps each command in its own file (consistent with the existing pattern). `ExactArgs(1)` means the user must quote multi-word notes, which is unambiguous and avoids the flag-parsing edge cases that plagued the implicit shorthand.
 **Alternative considered**: Joining `ArbitraryArgs` like the old shorthand. Rejected — this is exactly the ambiguity we're removing.
 ### 2. Move `submitNote()` from `add.go` to `addnote.go`
 The function is only used by the addnote command, so it belongs in the same file.
 **Rationale**: `add.go` becomes purely about file operations (it already is, aside from hosting `submitNote()`). Clean separation.
 ### 3. Fully revert root command to Cobra defaults
 Remove: `ArbitraryArgs`, custom `RunE` (replace with nil — Cobra shows help by default), `--tags` flag on root, custom usage template, `isRootCmd` template function.
 **Rationale**: The root command should do one thing — dispatch to subcommands. All the custom logic was there to support the implicit shorthand which is being removed.
 ### 4. `addnote` gets its own `--tags` flag
 The `--tags` flag moves from the root command to `addnote`, matching how `addfile` already has its own `--tags` flag.
 ## Risks / Trade-offs
 - **Breaking change for existing users** → Mitigated by clear error messaging. If someone types `kb "some text"`, Cobra will say "unknown command". The `examples` command will show the new syntax.
 - **Slightly more typing for notes** (`kb addnote "text"` vs `kb "text"`) → Acceptable trade-off for eliminating accidental ingestion. Tab-completion helps.
 - **Scripts using old syntax will break** → This is intentional. The old syntax was a foot-gun.
@@ -0,0 +1,32 @@
 ## Why
 The implicit note shorthand (`kb "some text"`) makes it too easy to accidentally add notes when mistyping commands. Despite the single-word guard, any multi-word typo (e.g. `kb lisst --type pdf`) silently creates a note. The root command doing double-duty as both command dispatcher and note ingester undermines user trust. Reverting to explicit, structured add commands eliminates accidental ingestion and gives every content type a clear, discoverable verb.
 ## What Changes
 - **New `addnote` command**: `kb addnote <text>` takes a single quoted positional argument and submits it as a note. Supports `--tags`. The `submitNote()` logic moves from `root.go` to a new `addnote.go` command file.
 - **Remove implicit note shorthand**: The root command reverts to standard Cobra behaviour — no `ArbitraryArgs`, no special arg-count logic, no `--tags` flag on root. Unknown input gets Cobra's default "unknown command" error.
 - **Remove custom usage template**: The root command no longer needs the `isRootCmd` template logic. Standard Cobra usage template for all commands.
 - **Update examples**: `examples.go` updated to show `kb addnote` instead of bare `kb "text"`.
 - **Update tests**: Remove implicit note shorthand tests, add `addnote` command tests.
 - **`addfile` unchanged**: Stays exactly as-is.
 - **BREAKING**: `kb "note text"` no longer works. Users must use `kb addnote "note text"`.
 ## Capabilities
 ### New Capabilities
 _(none)_
 ### Modified Capabilities
 - `go-client`: The "Implicit note shorthand" requirement is removed entirely and replaced by a new "Add note command" requirement. The "Add command (file and note ingestion)" requirement description is updated to reflect `addnote` / `addfile` as the two ingestion commands. The root command reverts to standard Cobra behaviour with no custom arg handling or usage template.
 ## Impact
 - `client/cmd/root.go` — remove `ArbitraryArgs`, `RunE` note logic, `--tags` flag, custom usage template, `isRootCmd` template func
 - `client/cmd/add.go` — `submitNote()` function moves to new `addnote.go` (or stays in `add.go` alongside `addfile` — design decision)
 - `client/cmd/addnote.go` — new file defining the `addnote` command
 - `client/cmd/examples.go` — update example text
 - `client/cmd/root_test.go` — remove implicit note shorthand tests, add standard Cobra behaviour tests
 - No engine changes — the API contract (`POST /api/v1/jobs` with `note` field) is unchanged
@@ -0,0 +1,87 @@
 ## ADDED Requirements
 ### Requirement: Add note command
 The client SHALL provide a `kb addnote <text>` command that submits a text note to the engine for ingestion. The command SHALL take exactly one positional argument (the note text) and support a `--tags` flag for comma-separated tags. The note SHALL be submitted via `POST /api/v1/jobs` with the `note` field in a multipart request.
 #### Scenario: Add a note
 - **WHEN** the user runs `kb addnote "remember to update DNS records"`
 - **THEN** the client SHALL submit the text as a note via `POST /api/v1/jobs` and print `Queued: note`
 #### Scenario: Add a note with tags
 - **WHEN** the user runs `kb addnote "server room is building 3" --tags ops`
 - **THEN** the client SHALL submit the note with the specified tags
 #### Scenario: Add a note with JSON output
 - **WHEN** the user runs `kb addnote "my note" --format json`
 - **THEN** the client SHALL output the raw JSON response from the engine
 #### Scenario: Duplicate note detection
 - **WHEN** the user runs `kb addnote "my note"` and the engine returns HTTP 409
 - **THEN** the client SHALL display the duplicate information (document ID or job ID) and exit with code 0
 #### Scenario: Missing argument
 - **WHEN** the user runs `kb addnote` with no arguments
 - **THEN** the client SHALL display an error indicating that the note text argument is required
 #### Scenario: Too many arguments
 - **WHEN** the user runs `kb addnote remember to update dns` (unquoted, multiple args)
 - **THEN** the client SHALL display an error indicating that exactly one argument is required, with a hint to quote the text
 ## MODIFIED Requirements
 ### Requirement: Add command (file and note ingestion)
 The client SHALL provide a `kb addfile` command that uploads files to the engine for async ingestion. The command SHALL validate file extensions before uploading and reject unsupported types. The client SHALL handle duplicate rejection (HTTP 409) and display the existing document information. Notes are handled by the separate `addnote` command — `addfile` is exclusively for file uploads.
 #### Scenario: Add a single file
 - **WHEN** the user runs `kb addfile report.pdf`
 - **THEN** the client SHALL validate the file extension, upload the file via `POST /api/v1/jobs` (multipart), print "Queued: report.pdf", and exit
 #### Scenario: Add a file with tags
 - **WHEN** the user runs `kb addfile manual.pdf --tags car,maintenance`
 - **THEN** the client SHALL include the tags in the multipart upload metadata
 #### Scenario: Add a directory recursively
 - **WHEN** the user runs `kb addfile ~/documents/ --recursive`
 - **THEN** the client SHALL discover all supported files in the directory tree, upload each one sequentially, and print "Queued: N files"
 #### Scenario: Unsupported file extension
 - **WHEN** the user runs `kb addfile photo.jpg`
 - **THEN** the client SHALL print an error listing supported extensions and exit with a non-zero code without making any API call
 #### Scenario: Duplicate file rejected (already ingested)
 - **WHEN** the user runs `kb addfile report.pdf` and the engine returns HTTP 409 with `{"error": "duplicate", "document_id": 42, "title": "report.pdf"}`
 - **THEN** the client SHALL print "Already imported: report.pdf (doc ID: 42)" and exit with code 0
 #### Scenario: Duplicate file rejected (in-flight job)
 - **WHEN** the user runs `kb addfile report.pdf` and the engine returns HTTP 409 with `{"error": "duplicate", "job_id": 7, "title": "report.pdf"}`
 - **THEN** the client SHALL print "Already queued: report.pdf (job ID: 7)" and exit with code 0
 #### Scenario: Duplicate file in recursive add
 - **WHEN** the user runs `kb addfile ~/documents/ --recursive` and some files are rejected as duplicates
 - **THEN** the client SHALL print the duplicate message for each rejected file, continue uploading remaining files, and include a summary (e.g., "Queued: 5 files, 2 duplicates skipped")
 #### Scenario: Duplicate with JSON output
 - **WHEN** the user runs `kb addfile report.pdf --format json` and the engine returns HTTP 409
 - **THEN** the client SHALL output the raw JSON response from the engine including the document_id and title
 #### Scenario: Add with JSON output
 - **WHEN** the user runs `kb addfile report.pdf --format json`
 - **THEN** the client SHALL output the JSON response from the engine including the job_id
 #### Scenario: File not found
 - **WHEN** the user runs `kb addfile nonexistent.pdf`
 - **THEN** the client SHALL print an error and exit with a non-zero code without making any API call
 #### Scenario: Upload failure
 - **WHEN** the upload fails (network error, engine returns 4xx/5xx other than 409)
 - **THEN** the client SHALL print the error and exit with a non-zero code
 ## REMOVED Requirements
 ### Requirement: Implicit note shorthand
 **Reason**: The implicit shorthand caused accidental note creation from mistyped commands. Any unrecognized multi-word input was silently ingested as a note. Replaced by the explicit `addnote` command.
 **Migration**: Replace `kb "note text"` with `kb addnote "note text"`. Replace `kb "note text" --tags foo` with `kb addnote "note text" --tags foo`.
@@ -0,0 +1,29 @@
 ## 1. Create addnote command
 - [x] 1.1 Create `client/cmd/addnote.go` with `addnoteCmd` using `ExactArgs(1)`, `--tags` flag, and `RunE` calling `submitNote()`
 - [x] 1.2 Move `submitNote()` function from `client/cmd/add.go` to `client/cmd/addnote.go`
 ## 2. Revert root command to standard Cobra behaviour
 - [x] 2.1 Remove `ArbitraryArgs`, custom `RunE` logic, and `--tags` flag from root command in `client/cmd/root.go`
 - [x] 2.2 Remove custom usage template and `isRootCmd` template function — let Cobra use its default template
 - [x] 2.3 Set root command to show help when called with no args (standard Cobra `RunE` returning `cmd.Help()` or nil)
 ## 3. Update examples and help text
 - [x] 3.1 Update `client/cmd/examples.go` to show `kb addnote` syntax instead of `kb "text"` shorthand
 - [x] 3.2 Update root command `Long` description to remove reference to note shorthand
 ## 4. Update tests
 - [x] 4.1 Remove implicit note shorthand tests from `client/cmd/root_test.go` (`TestRootCmd_SingleWordRejected`, `TestRootCmd_MultipleWordsNotRejected`)
 - [x] 4.2 Add test for `addnote` command (verify it wires up correctly, takes exactly one arg)
 - [x] 4.3 Add test that root command with unknown args returns an error (standard Cobra behaviour)
 - [x] 4.4 Verify `addfile` tests still pass (no changes expected)
 ## 5. Build and verify
 - [x] 5.1 Run `go build` and verify all commands appear in `kb --help`
 - [x] 5.2 Run `go test ./...` and verify all tests pass
 - [x] 5.3 Verify `kb addnote --help` shows correct usage line and flags
 - [x] 5.4 Verify `kb addfile --help` is unchanged
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-04-02
@@ -0,0 +1,41 @@
 ## Context
 The engine's `/api/v1/search` endpoint returns flat result objects:
 ```json
 {
  "chunk_id": 123,
  "score": 0.031,
  "text": "...",
  "chunk_index": 3,
  "chunk_metadata": {"page": 12, "section_header": "Installation"},
  "title": "Git Admin Guide",
  "doc_type": "pdf",
  "source_path": "/home/user/docs/git-admin.pdf",
  "created_at": "2026-03-15T10:30:00",
  "tags": ["git", "admin"]
 }
 ```
 The Go client's human-mode struct in `client/cmd/search.go` incorrectly expects a nested `document` object and top-level `page`/`section` fields. This causes all metadata to display as zero values.
 ## Goals / Non-Goals
 **Goals:**
 - Fix the search result struct to match the flat engine response
 - Extract `page` and `section_header` from `chunk_metadata` for human display
 - Maintain identical JSON output (already passes through raw response)
 **Non-Goals:**
 - Changing the engine API response format
 - Adding new display fields beyond what was originally intended
 ## Decisions
 **Flatten the struct to match API response.** The result struct will have `Title`, `DocType`, `Tags` as top-level fields (matching `title`, `doc_type`, `tags` JSON keys). `ChunkMetadata` will be decoded as `map[string]interface{}` to extract `page` and `section_header` dynamically, since its contents vary by document type.
 **Why not a typed ChunkMetadata struct?** The metadata keys depend on the ingestion pipeline (PDFs have `page`, markdown has `section_header`, code may have others in future). A map is more resilient to engine-side additions.
 ## Risks / Trade-offs
 - [Minimal risk] If the engine adds new top-level fields, the Go struct silently ignores them — this is existing behavior and acceptable for human-mode display.
@@ -0,0 +1,24 @@
 ## Why
 The Go client's human-mode search output struct expects a nested `document` object and top-level `page`/`section` fields, but the engine API returns flat results with `title`, `doc_type`, `tags` at the result level and `page`/`section_header` inside `chunk_metadata`. This means human-mode display shows empty values for title, type, tags, page, and section.
 ## What Changes
 - Fix the Go client search result struct to match the flat engine API response format
 - Extract `page` and `section_header` from the `chunk_metadata` map instead of expecting them as top-level fields
 - Human-mode output will correctly display document title, type, tags, page number, and section header
 ## Capabilities
 ### New Capabilities
 (none)
 ### Modified Capabilities
 - `go-client`: Fix search result parsing to match actual engine API response shape
 ## Impact
 - `client/cmd/search.go` — struct definition and display logic
 - No API changes, no breaking changes — this is a bug fix aligning the client with the existing API contract
@@ -0,0 +1,40 @@
 ## MODIFIED Requirements
 ### Requirement: Search command
 The client SHALL provide a `kb search <query>` command that sends the query to the engine and displays results.
 #### Scenario: Human-readable search output
 - **WHEN** the user runs `kb search "how to change oil"`
 - **THEN** the client SHALL POST to `/api/v1/search`, and display results in a human-readable format showing rank, score, document title, page/section, doc type, tags, and a text snippet
 - **THEN** the client SHALL parse search results as flat objects with top-level `title`, `doc_type`, `tags`, `score`, `text`, `chunk_index` fields
 - **THEN** the client SHALL extract `page` from `chunk_metadata` when present (PDF documents)
 - **THEN** the client SHALL extract `section_header` from `chunk_metadata` when present (markdown documents)
 #### Scenario: JSON search output
 - **WHEN** the user runs `kb search "query" --format json`
 - **THEN** the client SHALL output the raw JSON response from the engine
 #### Scenario: Search with filters
 - **WHEN** the user runs `kb search "brakes" --tags maintenance --type pdf --top 3`
 - **THEN** the client SHALL include the filters in the API request body
 #### Scenario: Search mode flags
 - **WHEN** the user runs `kb search "error" --fts-only`
 - **THEN** the client SHALL set `fts_only: true` in the request body
 #### Scenario: PDF result with page number
 - **WHEN** a search result has `chunk_metadata` containing `{"page": 12}`
 - **THEN** the human output SHALL display "Page 12" in the location line
 #### Scenario: Markdown result with section header
 - **WHEN** a search result has `chunk_metadata` containing `{"section_header": "Installation > Prerequisites"}`
 - **THEN** the human output SHALL display "Installation > Prerequisites" in the location line
 #### Scenario: Result with both page and section
 - **WHEN** a search result has `chunk_metadata` containing both `page` and `section_header`
 - **THEN** the human output SHALL display both separated by " / "
 #### Scenario: Result with no location metadata
 - **WHEN** a search result has empty `chunk_metadata` or no page/section keys
 - **THEN** the human output SHALL omit the location line entirely
@@ -0,0 +1,14 @@
 ## 1. Fix search result struct
 - [x] 1.1 Replace nested `Document` struct with flat fields (`Title`, `DocType`, `Tags`) matching engine JSON keys
 - [x] 1.2 Add `ChunkMetadata map[string]interface{}` field to capture `chunk_metadata`
 ## 2. Fix display logic
 - [x] 2.1 Update title/type/tags references in the display loop to use the new flat fields
 - [x] 2.2 Extract `page` from `ChunkMetadata` map (replacing top-level `Page` field)
 - [x] 2.3 Extract `section_header` from `ChunkMetadata` map (replacing top-level `Section` field)
 ## 3. Verify
 - [x] 3.1 Build the client and verify it compiles cleanly
@@ -0,0 +1,2 @@
 schema: spec-driven
 created: 2026-04-04
@@ -0,0 +1,194 @@
 ## Context
 The engine API (`engine/kb/routes/`) provides single-document operations for delete (`DELETE /api/v1/documents/{id}`) and tag management (`PUT /api/v1/documents/{id}/tags`). The MCP server (`mcp/server.py`) wraps these and adds a "collection" abstraction via `collection:`-prefixed tags — ~70 lines of helpers and translation logic that only the MCP layer understands.
 The database is SQLite with WAL mode, FTS5 for full-text search, and sqlite-vec for embeddings. Foreign keys with `ON DELETE CASCADE` handle chunk cleanup when documents are deleted. Stored files on disk must be cleaned up separately.
 ## Goals / Non-Goals
 **Goals:**
 - Bulk delete, bulk tag add/remove, and bulk set-tags (replace) via engine API, MCP tools, and CLI
 - Filter-based selection: by tag, doc_type, ID list, and ID range
 - Safety threshold to prevent accidental mass operations
 - Audit trail via jobs table
 - Remove collection abstraction from MCP server
 **Non-Goals:**
 - Async/queued bulk operations (SQLite handles thousands of rows synchronously in <1s)
 - Bulk document retrieval or bulk note creation
 - Undo/recycle bin for bulk deletes
 - Adding collection concept to engine or CLI (collections are being removed, not moved)
 ## Decisions
 ### 1. Common selection filter for all bulk endpoints
 All three bulk endpoints accept the same selection body:
 ```json
 {
  "document_ids": [1, 5, 12],
  "tags": ["agent:mybot", "draft"],
  "doc_type": "note",
  "from_id": 10,
  "to_id": 50
 }
 ```
 Filters combine with AND logic. At least one filter is required — the engine rejects requests with no selection criteria (400).
 **Selection SQL generation**: A shared helper in `database.py` builds the WHERE clause from the filter. The `tags` filter uses the same JOIN pattern as `list_documents` (all specified tags must match). The `document_ids` filter uses `IN (?)`. The `from_id`/`to_id` filter uses `id >= ? AND id <= ?`.
 **Alternative considered**: Separate endpoints per filter type. Rejected — combinable filters are more powerful and the SQL generation is straightforward.
 ### 2. Safety threshold with configurable percentage
 Before executing, the engine counts matched documents and total documents. If `matched / total > threshold`, the request is rejected:
 ```
 HTTP 409 Conflict
 {
  "error": "safety_threshold_exceeded",
  "message": "Operation would affect 750 of 1000 documents (75.0%). Exceeds safety threshold of 70%. Use force: true to proceed.",
  "matched": 750,
  "total": 1000,
  "percent": 75.0,
  "threshold": 70
 }
 ```
 - Default threshold: 70% (env var `KB_BULK_SAFETY_PERCENT`, integer 0-100)
 - Override per-request: `"force": true` in the request body
 - Threshold of 0 effectively disables the safety check
 - CLI maps this to `--force` / `-f` flag
 The check is a SELECT COUNT before the operation — minimal overhead.
 **Alternative considered**: Dry-run mode (preview what would be affected, then confirm). Rejected — adds a two-step flow that doesn't help LLM callers (they'd just always confirm) and the safety threshold covers the dangerous case.
 ### 3. Synchronous execution with audit logging
 Bulk operations execute synchronously and return a summary response:
 ```json
 {
  "job_id": 42,
  "status": "done",
  "matched": 750,
  "succeeded": 748,
  "failed": 2,
  "errors": [
    {"document_id": 42, "error": "file locked"},
    {"document_id": 99, "error": "not found"}
  ]
 }
 ```
 A job record is created in the `jobs` table with a new `bulk_delete` / `bulk_tags` / `bulk_set_tags` status type. This requires extending the jobs table:
 - Add `job_type` column: `"ingest"` (default, for existing jobs) or `"bulk_delete"` / `"bulk_tags"` / `"bulk_set_tags"`
 - The job's `filename` field stores a JSON summary of the selection filter for auditability
 - `document_id` field stores the count of affected documents
 - `error` field stores JSON array of individual errors if any
 **Alternative considered**: Full async with job polling. Rejected — SQLite bulk operations are fast enough synchronously and async would require extra polling calls (defeating the purpose of reducing token usage).
 ### 4. Bulk delete implementation
 For each matched document:
 1. Collect chunk IDs
 2. Delete embeddings from `chunks_vec`
 3. Delete the document row (cascades to chunks, document_tags)
 4. Delete stored file from disk
 This follows the same logic as the existing `delete_document` endpoint but batched in a single transaction (except file deletion, which happens after commit). If a file deletion fails, the document is still counted as succeeded (the DB record is gone) but a warning is logged.
 The operation processes documents within a single SQLite transaction for atomicity of the DB changes. File deletions happen post-commit and are best-effort.
 ### 5. Bulk tags implementation
 Two distinct operations:
 **`POST /api/v1/bulk/tags`** — Add and/or remove tags:
 ```json
 {
  "add": ["reviewed", "approved"],
  "remove": ["draft"],
  ...selection filters...
 }
 ```
 **`POST /api/v1/bulk/set-tags`** — Replace all tags:
 ```json
 {
  "tags": ["final", "approved"],
  ...selection filters...
 }
 ```
 The `set-tags` operation removes all existing tags from matched documents, then applies the new set. This is useful for cleaning up tag clutter or migrating tagging schemes.
 Both update `updated_at` on affected documents.
 ### 6. Remove collection abstraction from MCP
 Remove from `mcp/server.py`:
 - Constants: `COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`
 - Functions: `_collection_tag`, `_strip_collection_tags`, `_process_document`, `_process_search_results`, `_ensure_exclusive_collection`
 - Tool: `kb_set_collection` (entire tool removed)
 - Parameters: `collection` from `kb_search`, `kb_addnote`, `kb_upload_start`
 The `_process_document` and `_process_search_results` calls in remaining tools are removed — documents are returned as-is from the engine, with all tags visible.
 Users/agents that need namespace isolation use a tag convention (e.g. `agent:claude-code`) communicated via system prompt or tool instructions.
 ### 7. Engine bulk route module
 New file: `engine/kb/routes/bulk.py`
 Three endpoints sharing common infrastructure:
 - `_resolve_selection(conn, filters)` → list of document IDs + count
 - `_check_safety_threshold(matched, total, force)` → raises HTTPException if exceeded
 - `_log_bulk_job(conn, job_type, filters, matched, succeeded, failed, errors)` → job_id
 ### 8. MCP bulk tools
 Three new tools in `mcp/server.py`, thin wrappers calling new `engine.py` methods:
 - `kb_bulk_delete(document_ids?, tags?, doc_type?, from_id?, to_id?, force?)` → str (JSON)
 - `kb_bulk_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, add?, remove?, force?)` → str (JSON)
 - `kb_bulk_set_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, new_tags?, force?)` → str (JSON)
 Note: The `tags` parameter on bulk tools serves as a **selection filter** (which documents to target), while `add`/`remove` (on bulk_tags) and `new_tags` (on bulk_set_tags) are the **operation** (what to do to the tags). Tool descriptions must make this distinction clear.
 ### 9. CLI bulk commands
 Three new commands under `client/cmd/`:
 ```
 kb bulk-remove --tags "draft,old" --type note --force --yes
 kb bulk-tag --tags "agent:mybot" --add "reviewed" --remove "pending" --yes
 kb bulk-set-tags --ids "1,5,12" --tags "clean,final" --yes
 ```
 Filter flags (shared): `--tags`, `--type`, `--ids` (comma-separated), `--from-id`, `--to-id`, `--force`
 Confirmation: `--yes` / `-y` to skip interactive prompt.
 Without `--yes`, the CLI first shows the match count and asks for confirmation:
 ```
 This will delete 47 documents matching: tags=[draft,old] type=note
 Proceed? [y/N]
 ```
 ### 10. Engine config for safety threshold
 New env var: `KB_BULK_SAFETY_PERCENT` (integer, default 70). Added to `engine/kb/config.py`.
 ## Risks / Trade-offs
 - **[Bulk delete is irreversible]** → Safety threshold mitigates accidental mass deletion. CLI requires interactive confirmation. No undo mechanism — this is deliberate to keep the system simple.
 - **[Naming collision: `tags` as filter vs operation]** → The `tags` parameter in bulk_tags selects documents, while `add`/`remove` specifies the tag changes. Clear naming and tool descriptions mitigate confusion. Engine request model uses the same field name as the existing list/search filter.
 - **[SQLite lock during large bulk ops]** → A single transaction deleting 5000 documents will hold a write lock. With WAL mode, readers are not blocked. The lock duration should be under a few seconds for typical workloads.
 - **[Breaking change: collection removal]** → Any MCP client relying on `collection` parameters will break. Since collections were only recently added and are not widely deployed, this is acceptable. Existing `collection:*` tags in the database remain as regular tags — they still work as filters, just without special treatment.
 - **[Jobs table overload]** → Bulk operations add a new job type to a table designed for ingestion jobs. The schema change is minimal (one new column) and the audit trail value outweighs the mixing of concerns.
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
steve	574370e8d1	Remove AMD ROCm support — CPU and NVIDIA only BREAKING: Remove Dockerfile.rocm, compose.rocm.yaml, and ROCm image build/push from the release pipeline. Remove AMD quick-start and ROCm references from README and DEVELOPER docs. Update docker-deployment and developer-docs specs to reflect CPU + NVIDIA only. The ROCm variant added significant complexity (4.2GB torch wheel, >20GB container) with limited usage. Users on AMD GPUs should stay on engine v3.2.x or switch to CPU mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:39:37 +01:00
steve	17b19999de	Switch nvidia and rocm Dockerfiles from onnxruntime to torch Nvidia: install torch+torchvision from PyTorch cu130 index, drop onnxruntime-gpu. ROCm: use local torch wheel with rocm6.4 index for torchvision, clean up nvidia remnants from the venv. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:13:41 +01:00
steve	bb78f4ea80	Fix 500 error on notes with slashes in title, bump engine to 3.2.1 Sanitize / and \ in note titles and filenames when writing to the staging directory — a title like "/reset skill" was interpreted as a path separator, causing a FileNotFoundError and a 500 from the jobs endpoint. Also add PRAGMA busy_timeout=5000 to SQLite connections to prevent immediate failure under concurrent write load. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:12:58 +01:00
steve	223ff2cf5d	Latest changes all archived	2026-04-04 22:50:19 +01:00
steve	e9a282ddb1	Document KB_BULK_SAFETY_PERCENT in README, DEVELOPER, MCP, and SKILL docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 22:43:42 +01:00
steve	b5a203d2aa	Add bulk operations and remove collections abstraction - Add bulk delete, bulk tags, and bulk set-tags engine endpoints (POST /api/v1/bulk/delete, /bulk/tags, /bulk/set-tags) - Filter-based selection: by tags, doc_type, ID list, ID range - Safety threshold (KB_BULK_SAFETY_PERCENT, default 70%) prevents accidental mass operations unless force=true - Synchronous execution with audit trail via jobs table - Add kb_bulk_delete, kb_bulk_tags, kb_bulk_set_tags MCP tools - Add kb bulk-remove, bulk-tag, bulk-set-tags CLI commands - Remove collection abstraction from MCP server (use tags instead) - Remove kb_set_collection MCP tool - Update SKILL.md, MCP.md, README.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 22:34:47 +01:00
steve	0c124c4ab7	Bump engine version to 3.0.1	2026-04-04 12:42:32 +01:00
steve	da5b8435bc	Add configurable allowed hosts for MCP remote access (KB_MCP_ALLOWED_HOSTS) The MCP SDK's DNS rebinding protection rejects remote clients with 421 when the Host header isn't in the allowlist. Add KB_MCP_ALLOWED_HOSTS env var (comma-separated IPs/FQDNs) to configure additional allowed hosts while keeping localhost always permitted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 12:39:43 +01:00
steve	e39e00a2c0	Add MCP auth status to kb_status and update server instructions - kb_status now returns authenticated: true/false so clients can verify auth - Server instructions mention Bearer token auth requirement - Add .env, .venv/, test_mcp_client.py to .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 12:04:12 +01:00
steve	d078af9ad3	Split MCP docs into MCP.md with AI tool setup examples Move MCP server documentation from README into dedicated MCP.md. Add configuration examples for Claude Code, VS Code, Cursor, Windsurf, and JetBrains IDEs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 22:03:41 +01:00
steve	b3dce188e1	Fix version check failing on non-200 status responses When the engine returns 401 (auth required) or other non-200 responses, the version check was parsing the error body, getting an empty version string, and fatally exiting. Now skips the check on non-200 responses and lets the actual API call surface the real error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 21:52:24 +01:00
steve	0dc3065979	Update README for v3.0.0 — add MCP server docs, updatenote, fix version refs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 21:45:31 +01:00
steve	e7136a4a20	Add MCP server, note mutation endpoint, and updated_at tracking (v3.0.0) New MCP server (mcp/) exposes kb operations as native MCP tools over Streamable HTTP with Bearer token auth. Supports collections via tag conventions, chunked file uploads, and agent-side search patterns. Engine gains PATCH /api/v1/notes/{id} for in-place note updates with transactional re-chunk/re-embed, and updated_at column on documents. Go client adds updatenote command and Patch HTTP method. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 21:34:55 +01:00
steve	adeba21712	Bump client version to 2.2.1	2026-04-02 16:18:06 +01:00
steve	2d179af557	Fix search human-mode output to match engine API response The Go client struct expected a nested document object and top-level page/section fields, but the engine returns flat results with metadata in chunk_metadata. This caused empty display for title, type, tags, page, and section in human output mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 16:17:35 +01:00
steve	a6bab5e55e	Add CPU-only Docker image and fix release tag naming - Add Dockerfile.cpu and compose.cpu.yaml for CPU-only deployments - Use sentence-transformers[onnx] + CPU-only torch for ~4x smaller image - Fix release script: separate git tags (engine-v) from Docker tags (v) - Add CPU image to release build/push pipeline - Update README with CPU deployment instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 16:02:00 +01:00
steve	c5191df9c0	Bump client version to 2.2.0	2026-03-31 20:50:17 +01:00
steve	afbe270181	Replace implicit note shorthand with explicit addnote command and split README Two changes: 1. structured-add-commands: The implicit note shorthand (kb "text") caused accidental note creation from mistyped commands. Replaced with explicit kb addnote <text> command. Root command reverts to standard Cobra behaviour. Updated examples, tests, SKILL.md, and specs. 2. split-readme-developer-docs: Moved build-from-source instructions, release process, API reference, and ROCm migration notes from README.md into a new DEVELOPER.md. README now links to DEVELOPER.md for dev workflows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 20:48:22 +01:00
steve	9e957f1a9a	Added pycache to gitignore	2026-03-30 07:26:16 +01:00
steve	bbe6a5e909	Add dev-up script and archive kb-title-in-chunks change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 07:25:22 +01:00
steve	743102aee4	Bump client version to 2.1.1	2026-03-29 21:09:55 +01:00
steve	0f3b3be59f	Bump engine version to 2.1.0	2026-03-29 21:06:04 +01:00
steve	2fa2ac1134	Reject single bare word as implicit note shorthand Single unrecognized words now print an error with usage hint instead of being submitted as a note. Prevents typos from creating junk notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 21:03:52 +01:00
steve	b2176c36ea	Chunk enrichment: prepend document title to embeddings Adds enriched_text column to chunks table that prepends document title (and section header when present) to chunk text. Embeddings and FTS now use enriched text for better search relevance. Includes schema migration with backfill for existing data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 21:03:48 +01:00
steve	5f9946efc9	Added manual download to README	2026-03-29 14:03:35 +01:00
steve	ea3d5707e1	Bump client version to 2.1.0	2026-03-29 13:58:42 +01:00
steve	7f4decee26	Reindex command, implicit note shorthand, add→addfile rename - Add `kb reindex` command with confirmation prompt and --yes flag - Add implicit note shorthand: `kb "my note"` submits a note directly - Rename `add` to `addfile`, remove --note/--title/--type flags - Add client-side file extension validation before upload - Add `kb examples` command for common usage patterns - Update README, SKILL.md, and main specs - Archive completed changes and sync delta specs BREAKING: `kb add` renamed to `kb addfile`, `kb add --note` replaced by `kb "text"` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 13:58:04 +01:00
steve	528a09ca90	Independent client/engine versioning with compatibility check Split release.sh into release-client.sh and release-engine.sh for independent release cadences. Client checks engine version on first API call and hard-fails if engine is below MinEngineVersion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 15:59:16 +00:00
steve	b04823e67b	Store original documents for download after ingestion Persist uploaded files to {data_dir}/documents/{content_hash}{ext} after successful ingestion. Add GET /documents/{id}/file endpoint for retrieval, delete stored files on document deletion, and add `kb export` client command. Includes schema migration, tests, and spec updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 15:16:27 +00:00
`@@ -1 +1 @@`
	`from kb.routes import health, search, jobs, documents, tags, status, reindex, auth`	`from kb.routes import health, search, jobs, documents, tags, status, reindex, auth, notes`