Bump engine version to 3.2.2

Remove AMD ROCm support — CPU and NVIDIA only
BREAKING: Remove Dockerfile.rocm, compose.rocm.yaml, and ROCm image build/push from the release pipeline. Remove AMD quick-start and ROCm references from README and DEVELOPER docs. Update docker-deployment and developer-docs specs to reflect CPU + NVIDIA only. The ROCm variant added significant complexity (4.2GB torch wheel, >20GB container) with limited usage. Users on AMD GPUs should stay on engine v3.2.x or switch to CPU mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 21:48:55 +01:00 · 2026-04-06 16:39:37 +01:00 · 2026-04-06 16:13:41 +01:00 · 2026-04-06 16:12:58 +01:00 · 2026-04-04 22:50:19 +01:00 · 2026-04-04 22:43:42 +01:00
100 changed files with 5671 additions and 472 deletions
@@ -1,2 +1,9 @@
 examples/
 .claude/
+__pycache__/
+engine/data/
+
+TMP/
+.env
+.venv/
+test_mcp_client.py
@@ -0,0 +1,96 @@
+# Developer Guide
+
+Instructions for building from source, releasing, and contributing to kb.
+
+## Building from source
+
+### Engine
+
+```bash
+cd engine
+
+# NVIDIA GPU
+KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
+```
+
+### Client
+
+```bash
+cd client
+make build    # produces ./kb binary
+make all      # or cross-compile: dist/kb-{os}-{arch}
+```
+
+## Building and releasing
+
+Client and engine are versioned independently via `client/VERSION` and `engine/VERSION`. Each has its own release script and git tag prefix.
+
+### Release client
+
+```bash
+./release-client.sh --gitea              # patch bump, release via Gitea
+./release-client.sh --github --minor     # minor bump, release via GitHub
+./release-client.sh --gitea --no-increment  # release current version as-is
+./release-client.sh --gitea --dry-run    # preview without doing anything
+```
+
+Creates tag `client-vX.Y.Z`, builds Go binaries for all platforms, and creates a Gitea/GitHub release with binaries attached.
+
+The client embeds a `MinEngineVersion` (from `client/MIN_ENGINE_VERSION`) and will hard-fail if the connected engine is too old.
+
+### Release engine
+
+```bash
+./release-engine.sh --gitea              # patch bump, release via Gitea
+./release-engine.sh --github --minor     # minor bump, release via GitHub
+./release-engine.sh --gitea --no-increment  # release current version as-is
+./release-engine.sh --gitea --dry-run    # preview without doing anything
+```
+
+Creates tag `engine-vX.Y.Z`, builds NVIDIA and CPU Docker images, creates a Gitea/GitHub release, and pushes images to the registry.
+
+### Checking versions
+
+```bash
+# Client
+kb --version
+
+# Engine
+curl http://localhost:8000/api/v1/status | jq .version
+```
+
+### Docker images
+
+Images are pushed to `docker.dcglab.co.uk/dcg/kb/engine` with tags:
+
+- `engine-v2.0.6-nvidia` / `engine-v2.0.6-cpu` — versioned
+- `latest-nvidia` / `latest-cpu` — latest release
+
+Override the registry and org via environment variables:
+
+```bash
+REGISTRY=ghcr.io IMAGE_ORG=myorg ./release-engine.sh --github
+```
+
+## API reference
+
+All endpoints are under `/api/v1/`. Requires `Authorization: Bearer <key>` header when `KB_API_KEY` is set.
+
+| Method | Endpoint | Description |
+|---|---|---|
+| `GET` | `/health` | Health check (bypasses auth) |
+| `POST` | `/search` | Hybrid search (JSON body) |
+| `POST` | `/jobs` | Upload file/note for ingestion (multipart, returns 202 or 409 if duplicate) |
+| `GET` | `/jobs` | List ingestion jobs |
+| `GET` | `/jobs/{id}` | Job details |
+| `GET` | `/documents` | List documents |
+| `GET` | `/documents/{id}` | Document details with chunks |
+| `GET` | `/documents/{id}/file` | Download original file |
+| `DELETE` | `/documents/{id}` | Remove a document (and stored file) |
+| `PUT` | `/documents/{id}/tags` | Add/remove tags |
+| `GET` | `/tags` | List all tags |
+| `GET` | `/status` | Engine status, GPU info, DB stats |
+| `POST` | `/reindex` | Re-embed all chunks |
+| `POST` | `/bulk/delete` | Bulk delete documents by filter |
+| `POST` | `/bulk/tags` | Bulk add/remove tags by filter |
+| `POST` | `/bulk/set-tags` | Bulk replace tags by filter |
@@ -0,0 +1,174 @@
+# MCP Server (Agent Integration)
+
+The MCP server exposes kb operations as native MCP tools, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI.
+
+## Start the MCP server
+
+The compose files include a `kb-mcp` service alongside the engine. Set `KB_MCP_API_KEY` to require Bearer token auth from connecting agents:
+
+```bash
+KB_API_KEY=your-engine-key KB_MCP_API_KEY=your-agent-key \
+  docker compose -f engine/compose.nvidia.yaml up -d
+```
+
+Or run the MCP server standalone:
+
+```bash
+docker run -d --name kb-mcp \
+  -p 3000:3000 \
+  -e KB_ENGINE_URL=http://your-engine-host:8000 \
+  -e KB_API_KEY=your-engine-key \
+  -e KB_MCP_API_KEY=your-agent-key \
+  --restart unless-stopped \
+  docker.dcglab.co.uk/dcg/kb/mcp:latest
+```
+
+## MCP tools
+
+| Tool | Description |
+|---|---|
+| `kb_search` | Hybrid search with optional tag/type filters |
+| `kb_addnote` | Add a text note (queued for async ingestion) |
+| `kb_update_note` | Update an existing note in place |
+| `kb_get` | Get document details by ID or source path |
+| `kb_delete` | Permanently delete a document by ID |
+| `kb_status` | Engine health and statistics |
+| `kb_jobs` | Ingestion queue status |
+| `kb_upload_start` | Start a chunked file upload |
+| `kb_upload_chunk` | Upload a base64-encoded file chunk |
+| `kb_upload_finish` | Finish upload and submit for ingestion |
+| `kb_bulk_delete` | Delete multiple documents matching a filter |
+| `kb_bulk_tags` | Add/remove tags on multiple documents |
+| `kb_bulk_set_tags` | Replace all tags on multiple documents |
+
+## Organising with tags
+
+Use tags to separate agent data from user documents. For example, an agent can tag all its notes with `agent:mybot` and filter by that tag when searching. This is a naming convention — configure it in your agent's system prompt. No special server-side enforcement is needed.
+
+Bulk tools accept filter-based selection (by tags, doc_type, ID list, or ID range) so agents can manage thousands of documents in a single call instead of looping. A safety threshold (default 70%, configurable via engine env var `KB_BULK_SAFETY_PERCENT`) prevents accidental mass operations unless `force: true` is set.
+
+## MCP server configuration
+
+| Variable | Default | Description |
+|---|---|---|
+| `KB_ENGINE_URL` | `http://localhost:8000` | Engine API URL |
+| `KB_API_KEY` | (none) | Engine API key |
+| `KB_MCP_API_KEY` | (none) | Bearer token required from agents (disabled if unset) |
+| `KB_MCP_PORT` | `3000` | Port to listen on |
+
+## Connecting AI coding tools
+
+The kb MCP server uses **Streamable HTTP** transport at `http://your-host:3000/mcp`. Below are configuration examples for popular AI coding tools.
+
+### Claude Code (CLI / Desktop / Web)
+
+Add the server to your project or user settings:
+
+```bash
+claude mcp add kb-server --transport http http://localhost:3000/mcp
+```
+
+Or add it manually to `.claude/settings.json` (project) or `~/.claude/settings.json` (global):
+
+```json
+{
+  "mcpServers": {
+    "kb-server": {
+      "type": "http",
+      "url": "http://localhost:3000/mcp",
+      "headers": {
+        "Authorization": "Bearer your-agent-key"
+      }
+    }
+  }
+}
+```
+
+### VS Code (GitHub Copilot)
+
+Add to your `.vscode/settings.json` (workspace) or user settings:
+
+```json
+{
+  "mcp": {
+    "servers": {
+      "kb-server": {
+        "type": "http",
+        "url": "http://localhost:3000/mcp",
+        "headers": {
+          "Authorization": "Bearer your-agent-key"
+        }
+      }
+    }
+  }
+}
+```
+
+Or add to `.vscode/mcp.json` in your workspace:
+
+```json
+{
+  "servers": {
+    "kb-server": {
+      "type": "http",
+      "url": "http://localhost:3000/mcp",
+      "headers": {
+        "Authorization": "Bearer your-agent-key"
+      }
+    }
+  }
+}
+```
+
+### Cursor
+
+Add to `.cursor/mcp.json` in your project root:
+
+```json
+{
+  "mcpServers": {
+    "kb-server": {
+      "type": "streamable-http",
+      "url": "http://localhost:3000/mcp",
+      "headers": {
+        "Authorization": "Bearer your-agent-key"
+      }
+    }
+  }
+}
+```
+
+### Windsurf
+
+Add to `~/.codeium/windsurf/mcp_config.json`:
+
+```json
+{
+  "mcpServers": {
+    "kb-server": {
+      "serverUrl": "http://localhost:3000/mcp",
+      "headers": {
+        "Authorization": "Bearer your-agent-key"
+      }
+    }
+  }
+}
+```
+
+### JetBrains IDEs (IntelliJ, WebStorm, PyCharm, etc.)
+
+Add to `.junie/mcp.json` in your project root, or configure via **Settings > Tools > AI Assistant > MCP Servers**:
+
+```json
+{
+  "servers": {
+    "kb-server": {
+      "type": "http",
+      "url": "http://localhost:3000/mcp",
+      "headers": {
+        "Authorization": "Bearer your-agent-key"
+      }
+    }
+  }
+}
+```
@@ -2,16 +2,19 @@

 Personal knowledge base with hybrid search (full-text + semantic vector search).

-v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP.
+Client-server architecture: a **FastAPI engine** running in Docker (with optional GPU acceleration), a lightweight **Go CLI client**, and an **MCP server** for native agent integration.

 ## Architecture

 ```
 Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
+                            ▲
+MCP Agents  ──MCP/HTTP──▶ MCP Server (Docker) ──┘
 ```

- **Engine**: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
+- **Engine**: Keeps the embedding model warm in memory. Handles search, ingestion, document management, and note mutation via REST API. Runs in Docker with NVIDIA GPU or CPU-only support.
 - **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
+- **MCP Server**: Exposes kb operations as native MCP tools over Streamable HTTP. Runs as a separate Docker container alongside the engine. Use tags to scope agent data from user documents.
 - **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.

 ## Quick start
@@ -32,70 +35,29 @@ docker run -d --name kb-engine \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia

-# AMD GPU (ROCm)
+# CPU only (no GPU required — smaller image)
 docker run -d --name kb-engine \
-  --device /dev/kfd --device /dev/dri \
-  --group-add video \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
-  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
-  docker.dcglab.co.uk/dcg/kb/engine:latest-rocm
+  docker.dcglab.co.uk/dcg/kb/engine:latest-cpu
 ```

-Or use a compose file — create `compose.yaml`:
-
-```yaml
-services:
-  kb-engine:
-    image: docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia  # or latest-rocm
-    runtime: nvidia  # remove for ROCm
-    deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: 1
-              capabilities: [gpu]
-    # For ROCm, replace the above runtime/deploy block with:
-    # devices:
-    #   - "/dev/kfd"
-    #   - "/dev/dri"
-    # group_add:
-    #   - "video"
-    ports:
-      - "${KB_PORT:-8000}:8000"
-    volumes:
-      - ${KB_DATA_PATH:-./data}:/data
-    environment:
-      - KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
-      - KB_DEVICE=${KB_DEVICE:-auto}
-      - KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
-      - KB_API_KEY=${KB_API_KEY:-}
-      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
-      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
-    restart: unless-stopped
-```
+Or use a compose file from the repo:

 ```bash
-KB_DATA_PATH=~/kb-data docker compose up -d
-```
-
-**From source** (for development):
-
-```bash
-cd engine
-
 # NVIDIA GPU
-KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
+KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d

-# AMD GPU (ROCm)
-KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d
+# CPU only
+KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d
 ```

-The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:
+See [DEVELOPER.md](DEVELOPER.md) to run the engine from source.
+
+The engine will download the embedding model on first start (~90MB) and load it into memory (GPU or CPU). Check readiness:

 ```bash
 curl http://localhost:8000/api/v1/health
@@ -110,7 +72,7 @@ Check [releases](https://gitea.dcglab.co.uk/steve/kb/releases) for the latest cl

 ```bash
 # Set the version tag
-TAG=client-v2.1.0
+TAG=client-v3.0.0

 # Linux (amd64)
 curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-amd64
@@ -129,13 +91,7 @@ chmod +x kb
 sudo mv kb /usr/local/bin/
 ```

-**From source** (for development):
-
-```bash
-cd client
-make build    # produces ./kb binary
-make all      # or cross-compile: dist/kb-{os}-{arch}
-```
+See [DEVELOPER.md](DEVELOPER.md) to build the client from source.

 ### 3. Configure the client

@@ -152,9 +108,9 @@ Override via environment variables (`KB_ENGINE_URL`, `KB_API_KEY`) or CLI flags
 ### 4. Use it

 ```bash
-# Quick notes (shorthand — no subcommand needed)
-kb "Always restart nginx after config changes"
-kb "Server room is building 3, floor 2" --tags ops
+# Add notes
+kb addnote "Always restart nginx after config changes"
+kb addnote "Server room is building 3, floor 2" --tags ops

 # Add files (async — uploads and exits immediately)
 kb addfile ~/docs/manual.pdf --tags admin
@@ -167,6 +123,9 @@ kb jobs
 kb search "how to install git"
 kb search "deploy process" --tags ops --type pdf

+# Update a note in place
+kb updatenote 42 "revised note content"
+
 # Manage
 kb list
 kb info 1
@@ -175,6 +134,11 @@ kb tag 1 --add important
 kb export 1 -o manual.pdf    # download original file
 kb remove 3 --yes
 kb status
+
+# Bulk operations
+kb bulk-remove --tags "draft,old" --type note --yes
+kb bulk-tag --type note --add "archived" --yes
+kb bulk-set-tags --tags "old-scheme" --set "new-scheme" --yes
 ```

 ## How it works
@@ -195,6 +159,7 @@ The engine is configured via environment variables (set in the compose file or v
 | `KB_INGEST_DEVICE` | `auto` | Docling layout detection device: `auto`, `cpu`, or `cuda` |
 | `KB_API_KEY` | (none) | Optional Bearer token for API authentication |
 | `KB_SEARCH_THRESHOLD` | `0.01` | Minimum score for search results (filters noise) |
+| `KB_BULK_SAFETY_PERCENT` | `70` | Bulk operations affecting more than this % of documents are rejected unless `force` is set (0 disables) |
 | `KB_PORT` | `8000` | Port to expose |
 | `KB_HOST` | `0.0.0.0` | Host to bind to |
 | `HF_HUB_OFFLINE` | (none) | Set to `1` to prevent model downloads (use cached only) |
@@ -212,83 +177,14 @@ rsync -a ~/kb-data/ user@target:/home/user/kb-data/
 KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
 ```

-Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.
+Data is device-agnostic — you can ingest on NVIDIA and serve from CPU (or vice versa) with the same data directory.

-## API reference
+## MCP server (agent integration)

-All endpoints are under `/api/v1/`. Requires `Authorization: Bearer <key>` header when `KB_API_KEY` is set.
+The MCP server exposes kb operations as native MCP tools over Streamable HTTP, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI. Includes setup guides for Claude Code, VS Code, Cursor, Windsurf, and JetBrains IDEs.

-| Method | Endpoint | Description |
-|---|---|---|
-| `GET` | `/health` | Health check (bypasses auth) |
-| `POST` | `/search` | Hybrid search (JSON body) |
-| `POST` | `/jobs` | Upload file/note for ingestion (multipart, returns 202 or 409 if duplicate) |
-| `GET` | `/jobs` | List ingestion jobs |
-| `GET` | `/jobs/{id}` | Job details |
-| `GET` | `/documents` | List documents |
-| `GET` | `/documents/{id}` | Document details with chunks |
-| `GET` | `/documents/{id}/file` | Download original file |
-| `DELETE` | `/documents/{id}` | Remove a document (and stored file) |
-| `PUT` | `/documents/{id}/tags` | Add/remove tags |
-| `GET` | `/tags` | List all tags |
-| `GET` | `/status` | Engine status, GPU info, DB stats |
-| `POST` | `/reindex` | Re-embed all chunks |
+See **[MCP.md](MCP.md)** for full details — server setup, available tools, tag-based organisation, configuration, and client examples.

-## Building and releasing
+## Agent skill

-Client and engine are versioned independently via `client/VERSION` and `engine/VERSION`. Each has its own release script and git tag prefix.
-
-### Release client
-
-```bash
-./release-client.sh --gitea              # patch bump, release via Gitea
-./release-client.sh --github --minor     # minor bump, release via GitHub
-./release-client.sh --gitea --no-increment  # release current version as-is
-./release-client.sh --gitea --dry-run    # preview without doing anything
-```
-
-Creates tag `client-vX.Y.Z`, builds Go binaries for all platforms, and creates a Gitea/GitHub release with binaries attached.
-
-The client embeds a `MinEngineVersion` (from `client/MIN_ENGINE_VERSION`) and will hard-fail if the connected engine is too old.
-
-### Release engine
-
-```bash
-./release-engine.sh --gitea              # patch bump, release via Gitea
-./release-engine.sh --github --minor     # minor bump, release via GitHub
-./release-engine.sh --gitea --no-increment  # release current version as-is
-./release-engine.sh --gitea --dry-run    # preview without doing anything
-```
-
-Creates tag `engine-vX.Y.Z`, builds NVIDIA and ROCm Docker images, creates a Gitea/GitHub release, and pushes images to the registry.
-
-### Checking versions
-
-```bash
-# Client
-kb --version
-
-# Engine
-curl http://localhost:8000/api/v1/status | jq .version
-```
-
-### Docker images
-
-Images are pushed to `docker.dcglab.co.uk/dcg/kb/engine` with tags:
-
- `engine-v2.0.6-nvidia` / `engine-v2.0.6-rocm` — versioned
- `latest-nvidia` / `latest-rocm` — latest release
-
-Override the registry and org via environment variables:
-
-```bash
-REGISTRY=ghcr.io IMAGE_ORG=myorg ./release-engine.sh --github
-```
-
-## Future: ROCm runtime migration
-
-The `onnxruntime-rocm` execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the **MIGraphX execution provider** as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from `onnxruntime-rocm` to `onnxruntime` with the MIGraphX EP and install the `migraphx` runtime libraries instead.
-
-## Claude Code skill
-
-This tool is designed to be wrapped as a Claude Code skill. See `SKILL.md` for the skill definition.
+If you are restricted from using MCP server, or you just prefer to utilise Agent SKILLS, please also see `SKILL.md` for the skill definition.
@@ -10,14 +10,14 @@ Search, manage, and add to the user's personal knowledge base containing PDFs, W
 - User asks "how do I..." style questions that their knowledge base likely covers
 - User wants to save a note, add a file, or manage their knowledge base

-## Quick notes
+## Adding notes

 ```bash
-kb "remember to update DNS records"                # add a note
-kb "server room is building 3, floor 2" --tags ops # add a tagged note
+kb addnote "remember to update DNS records"                # add a note
+kb addnote "server room is building 3, floor 2" --tags ops # add a tagged note
 ```

-Bare text without a subcommand is treated as a note and submitted for ingestion.
+The note text must be a single quoted argument.

 ## Search (primary use case)

@@ -71,6 +71,37 @@ kb tag <doc_id> --add important,ops      # add tags to a document
 kb tag <doc_id> --remove draft           # remove tags from a document
 ```

+## Bulk operations
+
+Operate on multiple documents at once using filter-based selection. Filters combine with AND logic.
+
+**Filter flags (shared across all bulk commands):**
+- `--tags tag1,tag2` — match documents with ALL specified tags
+- `--type pdf|note|...` — match by document type
+- `--ids 1,5,12` — match specific document IDs
+- `--from-id N` — match documents with id >= N
+- `--to-id N` — match documents with id <= N
+- `--force` / `-f` — override safety threshold (blocks operations affecting >70% of all documents)
+- `--yes` / `-y` — skip confirmation prompt
+
+```bash
+# Bulk delete
+kb bulk-remove --tags "draft,old" --type note --yes             # delete matching docs
+kb bulk-remove --from-id 10 --to-id 50 --yes                   # delete by ID range
+kb bulk-remove --ids "3,7,12" --yes                             # delete specific IDs
+
+# Bulk tag add/remove
+kb bulk-tag --tags "agent:mybot" --add "reviewed" --remove "pending" --yes
+kb bulk-tag --type note --add "archived" --yes                  # tag all notes
+
+# Bulk replace tags
+kb bulk-set-tags --tags "old-scheme" --set "new-scheme,migrated" --yes
+```
+
+All bulk commands return a summary: matched count, succeeded count, failed count, and errors.
+A safety threshold prevents accidentally affecting more than 70% of documents unless `--force` is used.
+The threshold is configurable on the engine via `KB_BULK_SAFETY_PERCENT` (integer 0-100, default 70; 0 disables).
+
 ## Jobs (ingestion queue)

 ```bash
@@ -79,6 +110,12 @@ kb jobs --status failed --format json    # filter by status
 kb jobs <job_id> --format json           # job details
 ```

+## Examples
+
+```bash
+kb examples                              # show common usage examples
+```
+
 ## Engine status and maintenance

 ```bash
@@ -102,19 +139,15 @@ All commands support:
    {
      "chunk_id": 1423,
      "score": 0.031,
-      "score_breakdown": {"fts": 0.016, "vector": 0.015},
      "text": "To install the latest version of git from source...",
-      "source": {
-        "document_id": 42,
-        "title": "Git Admin Guide",
-        "path": "/home/user/docs/git-admin.pdf",
-        "type": "pdf",
-        "page": 12,
      "chunk_index": 3,
-        "total_chunks": 28,
+      "chunk_metadata": {"page": 12},
+      "title": "Git Admin Guide",
+      "doc_type": "pdf",
+      "source_path": "/home/user/docs/git-admin.pdf",
+      "created_at": "2026-03-15T10:30:00",
      "tags": ["git", "admin"]
    }
-    }
  ],
  "total_matches": 47,
  "returned": 10
@@ -156,11 +189,36 @@ Use filters when the question implies a specific domain:
 - From a specific topic → `--tags <topic>`
 - Check available tags first: `kb tags --format json`

+## Updating notes
+
+```bash
+kb updatenote 42 "revised note content"           # update note by ID
+```
+
+Updates the text of an existing note in place, preserving its ID, creation timestamp, and tags. Re-chunks and re-embeds the new text.
+
+## MCP server (agent integration)
+
+For agent-to-agent integration, kb provides an MCP server alongside the CLI. The MCP server
+exposes the same operations as native MCP tools over Streamable HTTP transport, which agents
+can connect to directly without subprocess overhead.
+
+**MCP tools:** `kb_search`, `kb_addnote`, `kb_update_note`, `kb_get`, `kb_delete`, `kb_status`,
+`kb_jobs`, `kb_upload_start`, `kb_upload_chunk`, `kb_upload_finish`, `kb_bulk_delete`,
+`kb_bulk_tags`, `kb_bulk_set_tags`.
+
+Use tags to separate agent data from user documents (e.g. tag all agent notes with
+`agent:mybot` and filter by that tag when searching). This convention is communicated
+via system prompt — no special server-side enforcement needed.
+
+If the kb engine is already running via Docker Compose, add the MCP server by deploying the
+`kb-mcp` service from the same compose file. Agents connect to it on port 3000 (default).
+
 ## Important notes

 - Always use `--format json` for machine parsing
 - The `score` field is relative, not absolute — compare scores within a result set
- `source.page` is only present for PDF documents
- `source.section_header` is only present for markdown documents with headers
+- `chunk_metadata.page` is only present for PDF documents
+- `chunk_metadata.section_header` is only present for markdown documents with headers
 - Results are already ranked by relevance (hybrid FTS + vector search)
 - Duplicate files are detected at upload time (HTTP 409) — the client handles this gracefully
@@ -1 +1 @@
-2.0.0
+3.2.0
@@ -1 +1 @@
-2.1.1
+3.2.0
@@ -206,53 +206,3 @@ func uploadFile(client *api.Client, path, tags string) (*uploadResult, error) {
 	return &uploadResult{Raw: result}, nil
 }

-func submitNote(client *api.Client, note, tags string) error {
-	fields := map[string]string{
-		"note": note,
-	}
-	if tags != "" {
-		fields["tags"] = tags
-	}
-
-	resp, err := client.PostMultipart("/api/v1/jobs", fields, nil)
-	if err != nil {
-		fmt.Fprintln(os.Stderr, err)
-		os.Exit(1)
-	}
-
-	if resp.StatusCode == http.StatusConflict {
-		var result interface{}
-		if err := api.DecodeJSON(resp, &result); err != nil {
-			return fmt.Errorf("failed to decode response: %w", err)
-		}
-		if output.IsJSON() {
-			output.PrintJSON(result)
-		} else {
-			if m, ok := result.(map[string]interface{}); ok {
-				if docID, ok := m["document_id"].(float64); ok {
-					fmt.Printf("Already imported: %s (doc ID: %.0f)\n", m["title"], docID)
-				} else if jobID, ok := m["job_id"].(float64); ok {
-					fmt.Printf("Already queued: %s (job ID: %.0f)\n", m["title"], jobID)
-				}
-			}
-		}
-		return nil
-	}
-
-	if err := api.CheckError(resp); err != nil {
-		fmt.Fprintln(os.Stderr, err)
-		os.Exit(1)
-	}
-
-	var result interface{}
-	if err := api.DecodeJSON(resp, &result); err != nil {
-		return fmt.Errorf("failed to decode response: %w", err)
-	}
-
-	if output.IsJSON() {
-		output.PrintJSON(result)
-	} else {
-		fmt.Println("Queued: note")
-	}
-	return nil
-}
@@ -0,0 +1,88 @@
+package cmd
+
+import (
+	"fmt"
+	"net/http"
+	"os"
+
+	"github.com/kb-search/kb/internal/api"
+	"github.com/kb-search/kb/internal/output"
+	"github.com/spf13/cobra"
+)
+
+var addnoteCmd = &cobra.Command{
+	Use:   "addnote <text>",
+	Short: "Add a text note to the knowledge base",
+	Args: func(cmd *cobra.Command, args []string) error {
+		if len(args) == 0 {
+			return fmt.Errorf("requires a note text argument\n\n  Usage: kb addnote \"your note text here\"")
+		}
+		if len(args) > 1 {
+			return fmt.Errorf("accepts 1 arg but received %d — quote your note text, e.g. kb addnote \"your note text here\"", len(args))
+		}
+		return nil
+	},
+	RunE:  runAddnote,
+}
+
+func init() {
+	addnoteCmd.Flags().String("tags", "", "tags (comma-separated)")
+	rootCmd.AddCommand(addnoteCmd)
+}
+
+func runAddnote(cmd *cobra.Command, args []string) error {
+	tags, _ := cmd.Flags().GetString("tags")
+	client := api.NewClient()
+	return submitNote(client, args[0], tags)
+}
+
+func submitNote(client *api.Client, note, tags string) error {
+	fields := map[string]string{
+		"note": note,
+	}
+	if tags != "" {
+		fields["tags"] = tags
+	}
+
+	resp, err := client.PostMultipart("/api/v1/jobs", fields, nil)
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+
+	if resp.StatusCode == http.StatusConflict {
+		var result interface{}
+		if err := api.DecodeJSON(resp, &result); err != nil {
+			return fmt.Errorf("failed to decode response: %w", err)
+		}
+		if output.IsJSON() {
+			output.PrintJSON(result)
+		} else {
+			if m, ok := result.(map[string]interface{}); ok {
+				if docID, ok := m["document_id"].(float64); ok {
+					fmt.Printf("Already imported: %s (doc ID: %.0f)\n", m["title"], docID)
+				} else if jobID, ok := m["job_id"].(float64); ok {
+					fmt.Printf("Already queued: %s (job ID: %.0f)\n", m["title"], jobID)
+				}
+			}
+		}
+		return nil
+	}
+
+	if err := api.CheckError(resp); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+
+	var result interface{}
+	if err := api.DecodeJSON(resp, &result); err != nil {
+		return fmt.Errorf("failed to decode response: %w", err)
+	}
+
+	if output.IsJSON() {
+		output.PrintJSON(result)
+	} else {
+		fmt.Println("Queued: note")
+	}
+	return nil
+}
@@ -0,0 +1,186 @@
+package cmd
+
+import (
+	"bufio"
+	"fmt"
+	"os"
+	"strconv"
+	"strings"
+
+	"github.com/kb-search/kb/internal/api"
+	"github.com/kb-search/kb/internal/output"
+	"github.com/spf13/cobra"
+)
+
+var bulkRemoveCmd = &cobra.Command{
+	Use:   "bulk-remove",
+	Short: "Delete multiple documents matching a filter",
+	RunE:  runBulkRemove,
+}
+
+func init() {
+	addBulkFilterFlags(bulkRemoveCmd)
+	rootCmd.AddCommand(bulkRemoveCmd)
+}
+
+func runBulkRemove(cmd *cobra.Command, args []string) error {
+	body, err := buildBulkBody(cmd)
+	if err != nil {
+		return err
+	}
+
+	yes, _ := cmd.Flags().GetBool("yes")
+	if !yes {
+		desc := describeBulkFilter(cmd)
+		fmt.Printf("This will delete documents matching: %s\nProceed? [y/N] ", desc)
+		reader := bufio.NewReader(os.Stdin)
+		answer, _ := reader.ReadString('\n')
+		answer = strings.TrimSpace(strings.ToLower(answer))
+		if answer != "y" && answer != "yes" {
+			fmt.Println("Cancelled.")
+			return nil
+		}
+	}
+
+	client := api.NewClient()
+	resp, err := client.Post("/api/v1/bulk/delete", body)
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+	if err := api.CheckError(resp); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+
+	var result map[string]interface{}
+	if err := api.DecodeJSON(resp, &result); err != nil {
+		return fmt.Errorf("failed to decode response: %w", err)
+	}
+
+	if output.IsJSON() {
+		output.PrintJSON(result)
+	} else {
+		printBulkResult("Deleted", result)
+	}
+	return nil
+}
+
+// ---------------------------------------------------------------------------
+// Shared helpers for all bulk commands
+// ---------------------------------------------------------------------------
+
+func addBulkFilterFlags(cmd *cobra.Command) {
+	cmd.Flags().String("tags", "", "filter by tags (comma-separated)")
+	cmd.Flags().String("type", "", "filter by document type")
+	cmd.Flags().String("ids", "", "filter by document IDs (comma-separated)")
+	cmd.Flags().Int("from-id", 0, "filter by id >= value")
+	cmd.Flags().Int("to-id", 0, "filter by id <= value")
+	cmd.Flags().BoolP("force", "f", false, "override safety threshold")
+	cmd.Flags().BoolP("yes", "y", false, "skip confirmation prompt")
+}
+
+func buildBulkBody(cmd *cobra.Command) (map[string]interface{}, error) {
+	body := map[string]interface{}{}
+
+	tagsStr, _ := cmd.Flags().GetString("tags")
+	if tagsStr != "" {
+		body["tags"] = splitTags(tagsStr)
+	}
+
+	docType, _ := cmd.Flags().GetString("type")
+	if docType != "" {
+		body["doc_type"] = docType
+	}
+
+	idsStr, _ := cmd.Flags().GetString("ids")
+	if idsStr != "" {
+		ids, err := parseIntList(idsStr)
+		if err != nil {
+			return nil, fmt.Errorf("invalid --ids: %w", err)
+		}
+		body["document_ids"] = ids
+	}
+
+	fromID, _ := cmd.Flags().GetInt("from-id")
+	if fromID > 0 {
+		body["from_id"] = fromID
+	}
+
+	toID, _ := cmd.Flags().GetInt("to-id")
+	if toID > 0 {
+		body["to_id"] = toID
+	}
+
+	force, _ := cmd.Flags().GetBool("force")
+	if force {
+		body["force"] = true
+	}
+
+	// Ensure at least one filter
+	hasFilter := tagsStr != "" || docType != "" || idsStr != "" || fromID > 0 || toID > 0
+	if !hasFilter {
+		return nil, fmt.Errorf("at least one filter is required (--tags, --type, --ids, --from-id, --to-id)")
+	}
+
+	return body, nil
+}
+
+func describeBulkFilter(cmd *cobra.Command) string {
+	var parts []string
+
+	tagsStr, _ := cmd.Flags().GetString("tags")
+	if tagsStr != "" {
+		parts = append(parts, fmt.Sprintf("tags=[%s]", tagsStr))
+	}
+
+	docType, _ := cmd.Flags().GetString("type")
+	if docType != "" {
+		parts = append(parts, fmt.Sprintf("type=%s", docType))
+	}
+
+	idsStr, _ := cmd.Flags().GetString("ids")
+	if idsStr != "" {
+		parts = append(parts, fmt.Sprintf("ids=[%s]", idsStr))
+	}
+
+	fromID, _ := cmd.Flags().GetInt("from-id")
+	if fromID > 0 {
+		parts = append(parts, fmt.Sprintf("from_id=%d", fromID))
+	}
+
+	toID, _ := cmd.Flags().GetInt("to-id")
+	if toID > 0 {
+		parts = append(parts, fmt.Sprintf("to_id=%d", toID))
+	}
+
+	return strings.Join(parts, " ")
+}
+
+func printBulkResult(action string, result map[string]interface{}) {
+	matched := int(result["matched"].(float64))
+	succeeded := int(result["succeeded"].(float64))
+	failed := int(result["failed"].(float64))
+
+	fmt.Printf("%s %d of %d documents", action, succeeded, matched)
+	if failed > 0 {
+		fmt.Printf(" (%d failed)", failed)
+	}
+	fmt.Println()
+}
+
+func parseIntList(s string) ([]int, error) {
+	var ids []int
+	for _, part := range strings.Split(s, ",") {
+		part = strings.TrimSpace(part)
+		if part == "" {
+			continue
+		}
+		id, err := strconv.Atoi(part)
+		if err != nil {
+			return nil, fmt.Errorf("invalid ID %q: %w", part, err)
+		}
+		ids = append(ids, id)
+	}
+	return ids, nil
+}
@@ -0,0 +1,73 @@
+package cmd
+
+import (
+	"bufio"
+	"fmt"
+	"os"
+	"strings"
+
+	"github.com/kb-search/kb/internal/api"
+	"github.com/kb-search/kb/internal/output"
+	"github.com/spf13/cobra"
+)
+
+var bulkSetTagsCmd = &cobra.Command{
+	Use:   "bulk-set-tags",
+	Short: "Replace all tags on multiple documents matching a filter",
+	RunE:  runBulkSetTags,
+}
+
+func init() {
+	addBulkFilterFlags(bulkSetTagsCmd)
+	bulkSetTagsCmd.Flags().String("set", "", "replacement tags (comma-separated)")
+	rootCmd.AddCommand(bulkSetTagsCmd)
+}
+
+func runBulkSetTags(cmd *cobra.Command, args []string) error {
+	body, err := buildBulkBody(cmd)
+	if err != nil {
+		return err
+	}
+
+	setStr, _ := cmd.Flags().GetString("set")
+	if setStr == "" {
+		return fmt.Errorf("--set is required (comma-separated list of replacement tags)")
+	}
+	body["new_tags"] = splitTags(setStr)
+
+	yes, _ := cmd.Flags().GetBool("yes")
+	if !yes {
+		desc := describeBulkFilter(cmd)
+		fmt.Printf("This will replace all tags with [%s] on documents matching: %s\nProceed? [y/N] ", setStr, desc)
+		reader := bufio.NewReader(os.Stdin)
+		answer, _ := reader.ReadString('\n')
+		answer = strings.TrimSpace(strings.ToLower(answer))
+		if answer != "y" && answer != "yes" {
+			fmt.Println("Cancelled.")
+			return nil
+		}
+	}
+
+	client := api.NewClient()
+	resp, err := client.Post("/api/v1/bulk/set-tags", body)
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+	if err := api.CheckError(resp); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+
+	var result map[string]interface{}
+	if err := api.DecodeJSON(resp, &result); err != nil {
+		return fmt.Errorf("failed to decode response: %w", err)
+	}
+
+	if output.IsJSON() {
+		output.PrintJSON(result)
+	} else {
+		printBulkResult("Set tags on", result)
+	}
+	return nil
+}
@@ -0,0 +1,92 @@
+package cmd
+
+import (
+	"bufio"
+	"fmt"
+	"os"
+	"strings"
+
+	"github.com/kb-search/kb/internal/api"
+	"github.com/kb-search/kb/internal/output"
+	"github.com/spf13/cobra"
+)
+
+var bulkTagCmd = &cobra.Command{
+	Use:   "bulk-tag",
+	Short: "Add or remove tags on multiple documents matching a filter",
+	RunE:  runBulkTag,
+}
+
+func init() {
+	addBulkFilterFlags(bulkTagCmd)
+	bulkTagCmd.Flags().String("add", "", "tags to add (comma-separated)")
+	bulkTagCmd.Flags().String("remove", "", "tags to remove (comma-separated)")
+	rootCmd.AddCommand(bulkTagCmd)
+}
+
+func runBulkTag(cmd *cobra.Command, args []string) error {
+	body, err := buildBulkBody(cmd)
+	if err != nil {
+		return err
+	}
+
+	addStr, _ := cmd.Flags().GetString("add")
+	removeStr, _ := cmd.Flags().GetString("remove")
+
+	if addStr == "" && removeStr == "" {
+		return fmt.Errorf("specify --add and/or --remove")
+	}
+
+	if addStr != "" {
+		body["add"] = splitTags(addStr)
+	}
+	if removeStr != "" {
+		body["remove"] = splitTags(removeStr)
+	}
+
+	yes, _ := cmd.Flags().GetBool("yes")
+	if !yes {
+		desc := describeBulkFilter(cmd)
+		action := ""
+		if addStr != "" {
+			action += fmt.Sprintf("add=[%s]", addStr)
+		}
+		if removeStr != "" {
+			if action != "" {
+				action += " "
+			}
+			action += fmt.Sprintf("remove=[%s]", removeStr)
+		}
+		fmt.Printf("This will update tags (%s) on documents matching: %s\nProceed? [y/N] ", action, desc)
+		reader := bufio.NewReader(os.Stdin)
+		answer, _ := reader.ReadString('\n')
+		answer = strings.TrimSpace(strings.ToLower(answer))
+		if answer != "y" && answer != "yes" {
+			fmt.Println("Cancelled.")
+			return nil
+		}
+	}
+
+	client := api.NewClient()
+	resp, err := client.Post("/api/v1/bulk/tags", body)
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+	if err := api.CheckError(resp); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+
+	var result map[string]interface{}
+	if err := api.DecodeJSON(resp, &result); err != nil {
+		return fmt.Errorf("failed to decode response: %w", err)
+	}
+
+	if output.IsJSON() {
+		output.PrintJSON(result)
+	} else {
+		printBulkResult("Tagged", result)
+	}
+	return nil
+}
@@ -11,9 +11,9 @@ var examplesCmd = &cobra.Command{
 	Short: "Show common usage examples",
 	Args:  cobra.NoArgs,
 	Run: func(cmd *cobra.Command, args []string) {
-		fmt.Print(`Quick notes:
-  kb "Remember to update DNS records"
-  kb "Server room is building 3" --tags ops
+		fmt.Print(`Add notes:
+  kb addnote "Remember to update DNS records"
+  kb addnote "Server room is building 3" --tags ops

 Add files:
  kb addfile report.pdf
@@ -23,6 +23,9 @@ Search:
  kb search "how to restart nginx"
  kb search "deploy" --tags ops --top 5

+Update notes:
+  kb updatenote 42 "revised note content"
+
 Manage documents:
  kb list --type pdf
  kb info 3
@@ -3,7 +3,6 @@ package cmd
 import (
 	"fmt"
 	"os"
-	"strings"

 	"github.com/kb-search/kb/internal/api"
 	"github.com/kb-search/kb/internal/config"
@@ -23,10 +22,9 @@ var (
 )

 var rootCmd = &cobra.Command{
-	Use:   "kb [\"note text\" | command]",
+	Use:   "kb [command]",
 	Short: "kb-search CLI client",
 	Long:  "A CLI client for the kb-search v2 engine API.\nRun 'kb examples' for common usage patterns.",
-	Args: cobra.ArbitraryArgs,
 	PersistentPreRunE: func(cmd *cobra.Command, args []string) error {
 		if err := config.Load(); err != nil {
 			return err
@@ -34,44 +32,14 @@ var rootCmd = &cobra.Command{
 		config.ApplyFlags(flagEngine, flagFormat, flagAPIKey)
 		return nil
 	},
-	RunE: func(cmd *cobra.Command, args []string) error {
-		if len(args) == 0 {
-			return cmd.Help()
-		}
-		if len(args) == 1 {
-			return fmt.Errorf("unknown command %q\nTo add a note, use: kb \"%s ...\" or pass multiple words", args[0], args[0])
-		}
-		note := strings.Join(args, " ")
-		tags, _ := cmd.Flags().GetString("tags")
-		client := api.NewClient()
-		return submitNote(client, note, tags)
-	},
 }

 func init() {
 	api.SetVersionInfo(Version, MinEngineVersion)
 	rootCmd.Version = Version
-	rootCmd.SetUsageTemplate(`Quick note taking (must be more than one word):
-  kb "note text here" [flags]
-
-Normal usage:
-  kb [command] [flags]{{if .HasAvailableSubCommands}}
-
-Available Commands:{{range .Commands}}{{if (or .IsAvailableCommand (eq .Name "help"))}}
-  {{rpad .Name .NamePadding }} {{.Short}}{{end}}{{end}}{{end}}{{if .HasAvailableLocalFlags}}
-
-Flags:
-{{.LocalFlags.FlagUsages | trimTrailingWhitespaces}}{{end}}{{if .HasAvailableInheritedFlags}}
-
-Global Flags:
-{{.InheritedFlags.FlagUsages | trimTrailingWhitespaces}}{{end}}
-
-Use "{{.CommandPath}} [command] --help" for more information about a command.
-`)
 	rootCmd.PersistentFlags().StringVar(&flagEngine, "engine", "", "engine API URL")
 	rootCmd.PersistentFlags().StringVar(&flagFormat, "format", "", "output format (human|json)")
 	rootCmd.PersistentFlags().StringVar(&flagAPIKey, "api-key", "", "API key for authentication")
-	rootCmd.Flags().String("tags", "", "tags for note shorthand (comma-separated)")
 }

 // Execute runs the root command.
@@ -6,36 +6,6 @@ import (
 	"testing"
 )

-func TestRootCmd_SingleWordRejected(t *testing.T) {
-	rootCmd.SetArgs([]string{"infow"})
-
-	var stderr bytes.Buffer
-	rootCmd.SetErr(&stderr)
-
-	err := rootCmd.Execute()
-	if err == nil {
-		t.Fatal("expected error for single bare word, got nil")
-	}
-
-	errMsg := err.Error()
-	if !strings.Contains(errMsg, `unknown command "infow"`) {
-		t.Errorf("expected error to mention unknown command, got: %s", errMsg)
-	}
-	if !strings.Contains(errMsg, "multiple words") {
-		t.Errorf("expected error to suggest multiple words, got: %s", errMsg)
-	}
-}
-
-func TestRootCmd_MultipleWordsNotRejected(t *testing.T) {
-	rootCmd.SetArgs([]string{"remember", "to", "update", "dns"})
-
-	err := rootCmd.Execute()
-	// Will fail at API call (no server), but should NOT be the "unknown command" error
-	if err != nil && strings.Contains(err.Error(), "unknown command") {
-		t.Errorf("multi-word input should not be rejected as unknown command, got: %s", err.Error())
-	}
-}
-
 func TestRootCmd_NoArgs_ShowsHelp(t *testing.T) {
 	rootCmd.SetArgs([]string{})

@@ -52,3 +22,48 @@ func TestRootCmd_NoArgs_ShowsHelp(t *testing.T) {
 		t.Errorf("expected help output, got: %s", output)
 	}
 }
+
+func TestRootCmd_UnknownCommand_ReturnsError(t *testing.T) {
+	rootCmd.SetArgs([]string{"notacommand"})
+
+	var stderr bytes.Buffer
+	rootCmd.SetErr(&stderr)
+
+	err := rootCmd.Execute()
+	if err == nil {
+		t.Fatal("expected error for unknown command, got nil")
+	}
+
+	errMsg := err.Error()
+	if !strings.Contains(errMsg, "unknown command") {
+		t.Errorf("expected 'unknown command' error, got: %s", errMsg)
+	}
+}
+
+func TestAddnoteCmd_NoArgs_ReturnsError(t *testing.T) {
+	rootCmd.SetArgs([]string{"addnote"})
+
+	err := rootCmd.Execute()
+	if err == nil {
+		t.Fatal("expected error for addnote with no args, got nil")
+	}
+
+	errMsg := err.Error()
+	if !strings.Contains(errMsg, "requires a note text argument") {
+		t.Errorf("expected 'requires a note text argument' error, got: %s", errMsg)
+	}
+}
+
+func TestAddnoteCmd_TooManyArgs_ReturnsError(t *testing.T) {
+	rootCmd.SetArgs([]string{"addnote", "hello", "world"})
+
+	err := rootCmd.Execute()
+	if err == nil {
+		t.Fatal("expected error for addnote with too many args, got nil")
+	}
+
+	errMsg := err.Error()
+	if !strings.Contains(errMsg, "quote your note text") {
+		t.Errorf("expected 'accepts 1 arg' error, got: %s", errMsg)
+	}
+}
@@ -68,13 +68,10 @@ func runSearch(cmd *cobra.Command, args []string) error {
 	var result struct {
 		Results []struct {
 			Score         float64                `json:"score"`
-			Document struct {
 			Title         string                 `json:"title"`
-				Type  string `json:"doc_type"`
+			DocType       string                 `json:"doc_type"`
 			Tags          []string               `json:"tags"`
-			} `json:"document"`
-			Page    interface{} `json:"page"`
-			Section string      `json:"section"`
+			ChunkMetadata map[string]interface{} `json:"chunk_metadata"`
 			Text          string                 `json:"text"`
 		} `json:"results"`
 	}
@@ -103,26 +100,28 @@ func runSearch(cmd *cobra.Command, args []string) error {
 			snippet = snippet[:200] + "..."
 		}

-		fmt.Printf("\n%d. [%.4f] %s\n", i+1, r.Score, r.Document.Title)
+		fmt.Printf("\n%d. [%.4f] %s\n", i+1, r.Score, r.Title)

 		location := ""
-		if r.Page != nil {
-			location = fmt.Sprintf("Page %v", r.Page)
+		if page, ok := r.ChunkMetadata["page"]; ok && page != nil {
+			location = fmt.Sprintf("Page %v", page)
 		}
-		if r.Section != "" {
+		if section, ok := r.ChunkMetadata["section_header"]; ok && section != nil {
+			if s, ok := section.(string); ok && s != "" {
 				if location != "" {
 					location += " / "
 				}
-			location += r.Section
+				location += s
+			}
 		}
 		if location != "" {
 			fmt.Printf("   Location: %s\n", location)
 		}
-		if r.Document.Type != "" {
-			fmt.Printf("   Type: %s\n", r.Document.Type)
+		if r.DocType != "" {
+			fmt.Printf("   Type: %s\n", r.DocType)
 		}
-		if len(r.Document.Tags) > 0 {
-			fmt.Printf("   Tags: %s\n", joinStrings(r.Document.Tags))
+		if len(r.Tags) > 0 {
+			fmt.Printf("   Tags: %s\n", joinStrings(r.Tags))
 		}
 		fmt.Printf("   %s\n", snippet)
 	}
@@ -0,0 +1,61 @@
+package cmd
+
+import (
+	"fmt"
+	"os"
+	"strconv"
+
+	"github.com/kb-search/kb/internal/api"
+	"github.com/kb-search/kb/internal/output"
+	"github.com/spf13/cobra"
+)
+
+var updatenoteCmd = &cobra.Command{
+	Use:   "updatenote <id> <text>",
+	Short: "Update an existing note's content",
+	Args: func(cmd *cobra.Command, args []string) error {
+		if len(args) < 2 {
+			return fmt.Errorf("requires document ID and text arguments\n\n  Usage: kb updatenote 42 \"updated note text\"")
+		}
+		if _, err := strconv.Atoi(args[0]); err != nil {
+			return fmt.Errorf("document ID must be an integer, got %q", args[0])
+		}
+		return nil
+	},
+	RunE: runUpdatenote,
+}
+
+func init() {
+	rootCmd.AddCommand(updatenoteCmd)
+}
+
+func runUpdatenote(cmd *cobra.Command, args []string) error {
+	docID := args[0]
+	text := args[1]
+
+	client := api.NewClient()
+
+	body := map[string]string{"text": text}
+	resp, err := client.Patch(fmt.Sprintf("/api/v1/notes/%s", docID), body)
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+
+	if err := api.CheckError(resp); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(1)
+	}
+
+	var result interface{}
+	if err := api.DecodeJSON(resp, &result); err != nil {
+		return fmt.Errorf("failed to decode response: %w", err)
+	}
+
+	if output.IsJSON() {
+		output.PrintJSON(result)
+	} else {
+		fmt.Printf("Updated note %s\n", docID)
+	}
+	return nil
+}
@@ -94,6 +94,10 @@ func (c *Client) checkEngineVersion() {
 	}
 	defer resp.Body.Close()

+	if resp.StatusCode != http.StatusOK {
+		return // auth error or other issue — let the actual request surface it
+	}
+
 	var status struct {
 		Version string `json:"version"`
 	}
@@ -217,6 +221,20 @@ func (c *Client) Put(path string, body interface{}) (*http.Response, error) {
 	return c.do(req)
 }

+// Patch performs a PATCH request with a JSON body.
+func (c *Client) Patch(path string, body interface{}) (*http.Response, error) {
+	data, err := json.Marshal(body)
+	if err != nil {
+		return nil, fmt.Errorf("failed to marshal request body: %w", err)
+	}
+	req, err := c.newRequest(http.MethodPatch, path, bytes.NewReader(data))
+	if err != nil {
+		return nil, err
+	}
+	req.Header.Set("Content-Type", "application/json")
+	return c.do(req)
+}
+
 // DecodeJSON reads the response body and decodes it into target.
 func DecodeJSON(resp *http.Response, target interface{}) error {
 	defer resp.Body.Close()
@@ -0,0 +1,36 @@
+FROM ubuntu:24.04
+
+ENV DEBIAN_FRONTEND=noninteractive
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3.12 python3.12-venv python3.12-dev python3-pip \
+    libpoppler-cpp-dev poppler-utils \
+    libgl1 libglib2.0-0 \
+    build-essential curl \
+    && rm -rf /var/lib/apt/lists/*
+
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
+
+WORKDIR /app
+
+COPY pyproject.toml ./
+COPY kb/ kb/
+COPY main.py ./
+COPY VERSION ./
+
+RUN uv venv .venv && \
+    . .venv/bin/activate && \
+    uv pip install -e . && \
+    uv pip install "sentence-transformers[onnx]" && \
+    uv pip install --reinstall torch torchvision --index-url https://download.pytorch.org/whl/cpu
+
+ENV PATH="/app/.venv/bin:$PATH"
+ENV VIRTUAL_ENV="/app/.venv"
+ENV KB_DEVICE=cpu
+ENV KB_INGEST_DEVICE=cpu
+ENV KB_DATA_DIR=/data
+
+EXPOSE 8000
+VOLUME ["/data"]
+
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
@@ -20,8 +20,8 @@ COPY VERSION ./

 RUN uv venv .venv && \
    . .venv/bin/activate && \
-    uv pip install -e . && \
-    uv pip install --no-deps onnxruntime-gpu
+    UV_HTTP_TIMEOUT=600 uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130 && \
+    uv pip install -e .

 ENV PATH="/app/.venv/bin:$PATH"
 ENV VIRTUAL_ENV="/app/.venv"
@@ -1,68 +0,0 @@
-# Stage 1: Build — install Python deps with dev tools available
-FROM rocm/dev-ubuntu-24.04:6.4-complete AS builder
-
-ENV DEBIAN_FRONTEND=noninteractive
-
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    python3.12 python3.12-venv python3.12-dev python3-pip \
-    libpoppler-cpp-dev poppler-utils \
-    build-essential curl \
-    && rm -rf /var/lib/apt/lists/*
-
-COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
-
-WORKDIR /app
-
-COPY pyproject.toml ./
-COPY kb/ kb/
-COPY main.py ./
-COPY VERSION ./
-
-RUN uv venv .venv && \
-    . .venv/bin/activate && \
-    uv pip install -e . && \
-    uv pip install --no-deps onnxruntime-rocm
-
-# Stage 2: Runtime — minimal ROCm runtime libs only
-FROM ubuntu:24.04
-
-ENV DEBIAN_FRONTEND=noninteractive
-
-# Add ROCm apt repository
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    ca-certificates curl gnupg \
-    && mkdir -p /etc/apt/keyrings \
-    && curl -fsSL https://repo.radeon.com/rocm/rocm.gpg.key \
-       | gpg --dearmor -o /etc/apt/keyrings/rocm.gpg \
-    && echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4.1 noble main" \
-       > /etc/apt/sources.list.d/rocm.list \
-    && printf 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600\n' \
-       > /etc/apt/preferences.d/rocm-pin-600 \
-    && apt-get update && apt-get install -y --no-install-recommends \
-    python3.12 python3.12-venv \
-    libpoppler-cpp0t64 poppler-utils \
-    libgl1 libglib2.0-0 \
-    rocm-hip-runtime \
-    rocm-hip-libraries \
-    miopen-hip \
-    && rm -rf /var/lib/apt/lists/*
-
-WORKDIR /app
-
-# Copy built venv and application from builder
-COPY --from=builder /app/.venv .venv
-COPY --from=builder /app/kb kb
-COPY --from=builder /app/main.py .
-COPY --from=builder /app/pyproject.toml .
-COPY --from=builder /app/VERSION .
-
-ENV PATH="/app/.venv/bin:$PATH"
-ENV VIRTUAL_ENV="/app/.venv"
-ENV KB_DEVICE=auto
-ENV KB_INGEST_DEVICE=auto
-ENV KB_DATA_DIR=/data
-
-EXPOSE 8000
-VOLUME ["/data"]
-
-CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
@@ -1 +1 @@
-2.1.0
+3.2.2
@@ -0,0 +1,33 @@
+services:
+  kb-engine:
+    build:
+      context: .
+      dockerfile: Dockerfile.cpu
+    ports:
+      - "${KB_PORT:-8000}:8000"
+    volumes:
+      - ${KB_DATA_PATH:-./data}:/data
+    environment:
+      - KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
+      - KB_DEVICE=cpu
+      - KB_INGEST_DEVICE=cpu
+      - KB_API_KEY=${KB_API_KEY:-}
+      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
+      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
+    restart: unless-stopped
+
+  kb-mcp:
+    build:
+      context: ../mcp
+      dockerfile: Dockerfile
+    ports:
+      - "${KB_MCP_PORT:-3000}:3000"
+    environment:
+      - KB_ENGINE_URL=http://kb-engine:8000
+      - KB_API_KEY=${KB_API_KEY:-}
+      - KB_MCP_API_KEY=${KB_MCP_API_KEY:-}
+      # Comma-separated IPs/FQDNs allowed to connect remotely (e.g. 192.168.1.50,kb.example.com)
+      - KB_MCP_ALLOWED_HOSTS=${KB_MCP_ALLOWED_HOSTS:-}
+    depends_on:
+      - kb-engine
+    restart: unless-stopped
@@ -23,3 +23,19 @@ services:
      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
    restart: unless-stopped
+
+  kb-mcp:
+    build:
+      context: ../mcp
+      dockerfile: Dockerfile
+    ports:
+      - "${KB_MCP_PORT:-3000}:3000"
+    environment:
+      - KB_ENGINE_URL=http://kb-engine:8000
+      - KB_API_KEY=${KB_API_KEY:-}
+      - KB_MCP_API_KEY=${KB_MCP_API_KEY:-}
+      # Comma-separated IPs/FQDNs allowed to connect remotely (e.g. 192.168.1.50,kb.example.com)
+      - KB_MCP_ALLOWED_HOSTS=${KB_MCP_ALLOWED_HOSTS:-}
+    depends_on:
+      - kb-engine
+    restart: unless-stopped
@@ -1,22 +0,0 @@
-services:
-  kb-engine:
-    build:
-      context: .
-      dockerfile: Dockerfile.rocm
-    devices:
-      - "/dev/kfd"
-      - "/dev/dri"
-    group_add:
-      - "video"
-    ports:
-      - "${KB_PORT:-8000}:8000"
-    volumes:
-      - ${KB_DATA_PATH:-./data}:/data
-    environment:
-      - KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
-      - KB_DEVICE=${KB_DEVICE:-auto}
-      - KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
-      - KB_API_KEY=${KB_API_KEY:-}
-      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
-      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
-    restart: unless-stopped
@@ -0,0 +1,4 @@
+#!/bin/bash
+
+docker stop engine-kb-engine-1
+KB_MODEL=BAAI/bge-base-en-v1.5 KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d --build
@@ -20,6 +20,7 @@ class Config:
        self.ingest_device = os.environ.get("KB_INGEST_DEVICE", "auto")
        self.api_key = os.environ.get("KB_API_KEY") or None
        self.search_threshold = float(os.environ.get("KB_SEARCH_THRESHOLD", "0.01"))
+        self.bulk_safety_percent = int(os.environ.get("KB_BULK_SAFETY_PERCENT", "70"))
        self.host = os.environ.get("KB_HOST", "0.0.0.0")
        self.port = int(os.environ.get("KB_PORT", "8000"))

@@ -74,6 +74,7 @@ def get_connection(db_path: str) -> sqlite3.Connection:
    conn.enable_load_extension(False)
    conn.row_factory = sqlite3.Row
    conn.execute("PRAGMA journal_mode=WAL")
+    conn.execute("PRAGMA busy_timeout=5000")
    conn.execute("PRAGMA foreign_keys=ON")
    return conn

@@ -185,6 +186,15 @@ def init_schema(conn: sqlite3.Connection, embedding_dim: int) -> None:
        _backfill_enriched_text(conn)
        _rebuild_fts(conn)

+    # Migrate: add updated_at to documents if missing (v3.0.0)
+    if "updated_at" not in doc_cols:
+        conn.execute("ALTER TABLE documents ADD COLUMN updated_at TEXT")
+
+    # Migrate: add job_type to jobs if missing (bulk operations)
+    job_cols = {row[1] for row in conn.execute("PRAGMA table_info(jobs)").fetchall()}
+    if "job_type" not in job_cols:
+        conn.execute("ALTER TABLE jobs ADD COLUMN job_type TEXT DEFAULT 'ingest'")
+
    conn.commit()


@@ -325,6 +335,92 @@ def untag_document(conn: sqlite3.Connection, document_id: int, tag_names: list[s
    conn.commit()


+# ---------------------------------------------------------------------------
+# Bulk operation helpers
+# ---------------------------------------------------------------------------
+
+def resolve_bulk_selection(
+    conn: sqlite3.Connection,
+    document_ids: list[int] | None = None,
+    tags: list[str] | None = None,
+    doc_type: str | None = None,
+    from_id: int | None = None,
+    to_id: int | None = None,
+) -> list[int]:
+    """Return document IDs matching the bulk selection filter.
+
+    Filters combine with AND logic. At least one filter must be provided.
+    """
+    sql = "SELECT DISTINCT d.id FROM documents d"
+    joins: list[str] = []
+    where: list[str] = []
+    params: list = []
+
+    if tags:
+        for i, tag in enumerate(tags):
+            joins.append(f"JOIN document_tags dt{i} ON d.id = dt{i}.document_id")
+            joins.append(f"JOIN tags t{i} ON dt{i}.tag_id = t{i}.id")
+            where.append(f"t{i}.name = ?")
+            params.append(tag)
+
+    if doc_type:
+        where.append("d.doc_type = ?")
+        params.append(doc_type)
+
+    if document_ids:
+        placeholders = ",".join("?" for _ in document_ids)
+        where.append(f"d.id IN ({placeholders})")
+        params.extend(document_ids)
+
+    if from_id is not None:
+        where.append("d.id >= ?")
+        params.append(from_id)
+
+    if to_id is not None:
+        where.append("d.id <= ?")
+        params.append(to_id)
+
+    if joins:
+        sql += " " + " ".join(joins)
+    if where:
+        sql += " WHERE " + " AND ".join(where)
+
+    rows = conn.execute(sql, params).fetchall()
+    return [row["id"] for row in rows]
+
+
+def create_bulk_job(
+    conn: sqlite3.Connection,
+    job_type: str,
+    filters_json: str,
+    matched: int,
+    succeeded: int,
+    failed: int,
+    errors_json: str = "[]",
+) -> int:
+    """Create an audit log entry for a bulk operation and return its id."""
+    cur = conn.execute(
+        """INSERT INTO jobs(filename, status, job_type, document_id, chunk_count, error, completed_at)
+           VALUES (?, ?, ?, ?, ?, ?, current_timestamp)""",
+        (
+            filters_json,
+            "done" if failed == 0 else "partial_failure",
+            job_type,
+            matched,
+            succeeded,
+            errors_json if failed > 0 else None,
+        ),
+    )
+    conn.commit()
+    return cur.lastrowid
+
+
+def count_documents(conn: sqlite3.Connection) -> int:
+    """Return total number of documents in the database."""
+    row = conn.execute("SELECT COUNT(*) AS cnt FROM documents").fetchone()
+    return row["cnt"]
+
+
 # ---------------------------------------------------------------------------
 # Vec table management
 # ---------------------------------------------------------------------------
@@ -1 +1 @@
-from kb.routes import health, search, jobs, documents, tags, status, reindex, auth
+from kb.routes import health, search, jobs, documents, tags, status, reindex, auth, notes
@@ -0,0 +1,281 @@
+"""Bulk operation endpoints — delete, tag, and set-tags on multiple documents."""
+
+import json
+import logging
+from pathlib import Path
+from typing import Optional
+
+from fastapi import HTTPException
+from pydantic import BaseModel, model_validator
+
+from main import app
+from kb.config import cfg
+from kb.database import (
+    get_connection,
+    resolve_bulk_selection,
+    count_documents,
+    create_bulk_job,
+    tag_document,
+    untag_document,
+)
+
+logger = logging.getLogger("kb.routes.bulk")
+
+
+# ---------------------------------------------------------------------------
+# Request models
+# ---------------------------------------------------------------------------
+
+class BulkSelectionRequest(BaseModel):
+    document_ids: Optional[list[int]] = None
+    tags: Optional[list[str]] = None
+    doc_type: Optional[str] = None
+    from_id: Optional[int] = None
+    to_id: Optional[int] = None
+    force: bool = False
+
+    @model_validator(mode="after")
+    def require_at_least_one_filter(self):
+        if not any([self.document_ids, self.tags, self.doc_type,
+                     self.from_id is not None, self.to_id is not None]):
+            raise ValueError("At least one selection filter is required")
+        return self
+
+
+class BulkDeleteRequest(BulkSelectionRequest):
+    pass
+
+
+class BulkTagsRequest(BulkSelectionRequest):
+    add: Optional[list[str]] = None
+    remove: Optional[list[str]] = None
+
+    @model_validator(mode="after")
+    def require_add_or_remove(self):
+        if not self.add and not self.remove:
+            raise ValueError("At least one of 'add' or 'remove' is required")
+        return self
+
+
+class BulkSetTagsRequest(BulkSelectionRequest):
+    new_tags: list[str]
+
+
+# ---------------------------------------------------------------------------
+# Shared helpers
+# ---------------------------------------------------------------------------
+
+def _check_safety_threshold(matched: int, total: int, force: bool) -> None:
+    """Raise 409 if the operation would affect too many documents."""
+    threshold = cfg.bulk_safety_percent
+    if threshold <= 0 or force or total == 0:
+        return
+    percent = (matched / total) * 100
+    if percent > threshold:
+        raise HTTPException(
+            status_code=409,
+            detail={
+                "error": "safety_threshold_exceeded",
+                "message": (
+                    f"Operation would affect {matched} of {total} documents "
+                    f"({percent:.1f}%). Exceeds safety threshold of {threshold}%. "
+                    f"Use force: true to proceed."
+                ),
+                "matched": matched,
+                "total": total,
+                "percent": round(percent, 1),
+                "threshold": threshold,
+            },
+        )
+
+
+def _filters_dict(req: BulkSelectionRequest) -> str:
+    """Build a JSON string of the selection filter for audit logging."""
+    d = {}
+    if req.document_ids:
+        d["document_ids"] = req.document_ids
+    if req.tags:
+        d["tags"] = req.tags
+    if req.doc_type:
+        d["doc_type"] = req.doc_type
+    if req.from_id is not None:
+        d["from_id"] = req.from_id
+    if req.to_id is not None:
+        d["to_id"] = req.to_id
+    return json.dumps(d)
+
+
+# ---------------------------------------------------------------------------
+# Endpoints
+# ---------------------------------------------------------------------------
+
+@app.post("/api/v1/bulk/delete")
+async def bulk_delete(req: BulkDeleteRequest):
+    conn = get_connection(cfg.db_path)
+    try:
+        doc_ids = resolve_bulk_selection(
+            conn, req.document_ids, req.tags, req.doc_type, req.from_id, req.to_id,
+        )
+        total = count_documents(conn)
+        _check_safety_threshold(len(doc_ids), total, req.force)
+
+        succeeded = 0
+        failed = 0
+        errors = []
+        stored_files: list[str] = []
+
+        for doc_id in doc_ids:
+            try:
+                doc = conn.execute(
+                    "SELECT id, stored_path FROM documents WHERE id = ?", (doc_id,)
+                ).fetchone()
+                if not doc:
+                    failed += 1
+                    errors.append({"document_id": doc_id, "error": "not found"})
+                    continue
+
+                if doc["stored_path"]:
+                    stored_files.append(doc["stored_path"])
+
+                # Delete embeddings
+                chunk_ids = conn.execute(
+                    "SELECT id FROM chunks WHERE document_id = ?", (doc_id,)
+                ).fetchall()
+                for row in chunk_ids:
+                    conn.execute("DELETE FROM chunks_vec WHERE chunk_id = ?", (row["id"],))
+
+                # Delete document (cascades to chunks, document_tags)
+                conn.execute("DELETE FROM documents WHERE id = ?", (doc_id,))
+                succeeded += 1
+            except Exception as exc:
+                failed += 1
+                errors.append({"document_id": doc_id, "error": str(exc)})
+
+        conn.commit()
+
+        # Best-effort file cleanup after commit
+        for path in stored_files:
+            try:
+                f = Path(path)
+                if f.exists():
+                    f.unlink()
+            except OSError as exc:
+                logger.warning("Failed to delete stored file %s: %s", path, exc)
+
+        errors_json = json.dumps(errors) if errors else "[]"
+        job_id = create_bulk_job(
+            conn, "bulk_delete", _filters_dict(req),
+            len(doc_ids), succeeded, failed, errors_json,
+        )
+
+        return {
+            "job_id": job_id,
+            "status": "done" if failed == 0 else "partial_failure",
+            "matched": len(doc_ids),
+            "succeeded": succeeded,
+            "failed": failed,
+            "errors": errors,
+        }
+    finally:
+        conn.close()
+
+
+@app.post("/api/v1/bulk/tags")
+async def bulk_tags(req: BulkTagsRequest):
+    conn = get_connection(cfg.db_path)
+    try:
+        doc_ids = resolve_bulk_selection(
+            conn, req.document_ids, req.tags, req.doc_type, req.from_id, req.to_id,
+        )
+        total = count_documents(conn)
+        _check_safety_threshold(len(doc_ids), total, req.force)
+
+        succeeded = 0
+        failed = 0
+        errors = []
+
+        for doc_id in doc_ids:
+            try:
+                if req.add:
+                    tag_document(conn, doc_id, req.add)
+                if req.remove:
+                    untag_document(conn, doc_id, req.remove)
+                conn.execute(
+                    "UPDATE documents SET updated_at = current_timestamp WHERE id = ?",
+                    (doc_id,),
+                )
+                succeeded += 1
+            except Exception as exc:
+                failed += 1
+                errors.append({"document_id": doc_id, "error": str(exc)})
+
+        conn.commit()
+
+        errors_json = json.dumps(errors) if errors else "[]"
+        job_id = create_bulk_job(
+            conn, "bulk_tags", _filters_dict(req),
+            len(doc_ids), succeeded, failed, errors_json,
+        )
+
+        return {
+            "job_id": job_id,
+            "status": "done" if failed == 0 else "partial_failure",
+            "matched": len(doc_ids),
+            "succeeded": succeeded,
+            "failed": failed,
+            "errors": errors,
+        }
+    finally:
+        conn.close()
+
+
+@app.post("/api/v1/bulk/set-tags")
+async def bulk_set_tags(req: BulkSetTagsRequest):
+    conn = get_connection(cfg.db_path)
+    try:
+        doc_ids = resolve_bulk_selection(
+            conn, req.document_ids, req.tags, req.doc_type, req.from_id, req.to_id,
+        )
+        total = count_documents(conn)
+        _check_safety_threshold(len(doc_ids), total, req.force)
+
+        succeeded = 0
+        failed = 0
+        errors = []
+
+        for doc_id in doc_ids:
+            try:
+                # Remove all existing tags
+                conn.execute(
+                    "DELETE FROM document_tags WHERE document_id = ?", (doc_id,)
+                )
+                # Apply new tag set
+                if req.new_tags:
+                    tag_document(conn, doc_id, req.new_tags)
+                conn.execute(
+                    "UPDATE documents SET updated_at = current_timestamp WHERE id = ?",
+                    (doc_id,),
+                )
+                succeeded += 1
+            except Exception as exc:
+                failed += 1
+                errors.append({"document_id": doc_id, "error": str(exc)})
+
+        conn.commit()
+
+        errors_json = json.dumps(errors) if errors else "[]"
+        job_id = create_bulk_job(
+            conn, "bulk_set_tags", _filters_dict(req),
+            len(doc_ids), succeeded, failed, errors_json,
+        )
+
+        return {
+            "job_id": job_id,
+            "status": "done" if failed == 0 else "partial_failure",
+            "matched": len(doc_ids),
+            "succeeded": succeeded,
+            "failed": failed,
+            "errors": errors,
+        }
+    finally:
+        conn.close()
@@ -26,7 +26,7 @@ async def list_documents(
        sql = """
            SELECT d.id, d.title, d.doc_type,
                   (SELECT COUNT(*) FROM chunks c WHERE c.document_id = d.id) AS chunk_count,
-                   d.created_at
+                   d.created_at, d.updated_at
            FROM documents d
        """
        joins: list[str] = []
@@ -50,7 +50,7 @@ async def list_documents(
        if where:
            sql += " WHERE " + " AND ".join(where)

-        sql += " ORDER BY d.created_at DESC"
+        sql += " ORDER BY COALESCE(d.updated_at, d.created_at) DESC"

        rows = conn.execute(sql, params).fetchall()

@@ -74,6 +74,7 @@ async def list_documents(
                "tags": [t["name"] for t in tag_rows],
                "chunk_count": row["chunk_count"],
                "created_at": row["created_at"],
+                "updated_at": row["updated_at"],
            })

        return results
@@ -0,0 +1,120 @@
+"""Note mutation endpoint — update existing notes in place."""
+
+import hashlib
+import logging
+
+from fastapi import HTTPException
+from pydantic import BaseModel
+
+from main import app
+from kb.config import cfg
+from kb.database import (
+    get_connection,
+    build_enriched_text,
+    insert_chunk,
+    insert_embedding,
+)
+from kb.embeddings import embed_texts
+from kb.ingest.note import chunk_note
+
+logger = logging.getLogger("kb.routes.notes")
+
+
+class NoteUpdateRequest(BaseModel):
+    text: str
+
+
+@app.patch("/api/v1/notes/{doc_id}")
+async def update_note(doc_id: int, req: NoteUpdateRequest):
+    conn = get_connection(cfg.db_path)
+    try:
+        doc = conn.execute(
+            "SELECT id, title, doc_type FROM documents WHERE id = ?", (doc_id,)
+        ).fetchone()
+        if not doc:
+            raise HTTPException(status_code=404, detail="Document not found.")
+        if doc["doc_type"] != "note":
+            raise HTTPException(
+                status_code=422,
+                detail="Only notes can be updated via this endpoint.",
+            )
+
+        title = doc["title"]
+
+        # Delete existing chunks and their embeddings
+        chunk_ids = conn.execute(
+            "SELECT id FROM chunks WHERE document_id = ?", (doc_id,)
+        ).fetchall()
+        for row in chunk_ids:
+            conn.execute("DELETE FROM chunks_vec WHERE chunk_id = ?", (row["id"],))
+        conn.execute("DELETE FROM chunks WHERE document_id = ?", (doc_id,))
+
+        # Run note chunking pipeline on new text
+        chunks = chunk_note(req.text)
+        chunk_texts = [c["text"] for c in chunks]
+        chunk_metas = [
+            {k: v for k, v in c.items() if k != "text"} or None for c in chunks
+        ]
+
+        enriched_texts = [
+            build_enriched_text(title, ct, cm)
+            for ct, cm in zip(chunk_texts, chunk_metas)
+        ]
+
+        # Embed — if this fails, the transaction rolls back
+        vectors = embed_texts(enriched_texts)
+
+        for idx, (chunk_text, enriched, vector) in enumerate(
+            zip(chunk_texts, enriched_texts, vectors)
+        ):
+            chunk_id = insert_chunk(
+                conn,
+                document_id=doc_id,
+                chunk_index=idx,
+                text=chunk_text,
+                enriched_text=enriched,
+                metadata=chunk_metas[idx],
+            )
+            insert_embedding(conn, chunk_id, vector)
+
+        # Update content_hash and updated_at
+        content_hash = hashlib.sha256(req.text.encode("utf-8")).hexdigest()
+        conn.execute(
+            "UPDATE documents SET content_hash = ?, updated_at = current_timestamp WHERE id = ?",
+            (content_hash, doc_id),
+        )
+        conn.commit()
+
+        # Return updated document
+        updated_doc = conn.execute(
+            "SELECT * FROM documents WHERE id = ?", (doc_id,)
+        ).fetchone()
+
+        new_chunks = conn.execute(
+            "SELECT * FROM chunks WHERE document_id = ? ORDER BY chunk_index",
+            (doc_id,),
+        ).fetchall()
+
+        tag_rows = conn.execute(
+            """
+            SELECT t.name FROM tags t
+            JOIN document_tags dt ON t.id = dt.tag_id
+            WHERE dt.document_id = ?
+            ORDER BY t.name
+            """,
+            (doc_id,),
+        ).fetchall()
+
+        return {
+            **dict(updated_doc),
+            "tags": [t["name"] for t in tag_rows],
+            "chunks": [dict(c) for c in new_chunks],
+        }
+    except HTTPException:
+        raise
+    except Exception:
+        conn.rollback()
+        logger.exception("Failed to update note %d", doc_id)
+        raise HTTPException(status_code=500, detail="Failed to update note.")
+    finally:
+        conn.close()
@@ -48,6 +48,13 @@ async def update_document_tags(doc_id: int, req: TagUpdateRequest):
        if req.remove:
            untag_document(conn, doc_id, req.remove)

+        if req.add or req.remove:
+            conn.execute(
+                "UPDATE documents SET updated_at = current_timestamp WHERE id = ?",
+                (doc_id,),
+            )
+            conn.commit()
+
        tag_rows = conn.execute(
            """
            SELECT t.name FROM tags t
@@ -16,7 +16,8 @@ def stage_file(staging_dir: Path, filename: str, content: bytes) -> Path:
        The path to the newly created staged file.
    """
    staging_dir.mkdir(parents=True, exist_ok=True)
-    dest = staging_dir / f"{uuid.uuid4()}_{filename}"
+    safe_filename = filename.replace("/", "_").replace("\\", "_")
+    dest = staging_dir / f"{uuid.uuid4()}_{safe_filename}"
    dest.write_bytes(content)
    logger.debug("Staged file: %s (%d bytes)", dest, len(content))
    return dest
@@ -31,7 +32,8 @@ def stage_note(staging_dir: Path, title: str, text: str) -> Path:
        The path to the newly created staged note file.
    """
    staging_dir.mkdir(parents=True, exist_ok=True)
-    dest = staging_dir / f"{uuid.uuid4()}_{title}.note"
+    safe_title = title.replace("/", "_").replace("\\", "_")
+    dest = staging_dir / f"{uuid.uuid4()}_{safe_title}.note"
    dest.write_text(text, encoding="utf-8")
    logger.debug("Staged note: %s (%d chars)", dest, len(text))
    return dest
@@ -62,7 +62,7 @@ async def lifespan(app: FastAPI):
 app = FastAPI(title="kb-engine", version=__version__, lifespan=lifespan)

 # Import routes after app is created
-from kb.routes import health, search, jobs, documents, tags, status, reindex, auth  # noqa: E402, F401
+from kb.routes import health, search, jobs, documents, tags, status, reindex, auth, notes, bulk  # noqa: E402, F401

 if __name__ == "__main__":
    import uvicorn
@@ -0,0 +1,17 @@
+FROM python:3.12-slim
+
+WORKDIR /app
+
+COPY requirements.txt ./
+RUN pip install --no-cache-dir -r requirements.txt
+
+COPY *.py ./
+
+ENV KB_ENGINE_URL=http://engine:8000
+ENV KB_API_KEY=
+ENV KB_MCP_API_KEY=
+ENV KB_MCP_PORT=3000
+
+EXPOSE 3000
+
+CMD ["python", "server.py"]
@@ -0,0 +1,17 @@
+"""Configuration from environment variables."""
+
+import os
+
+
+KB_ENGINE_URL = os.environ.get("KB_ENGINE_URL", "http://localhost:8000")
+KB_API_KEY = os.environ.get("KB_API_KEY", "")
+KB_MCP_API_KEY = os.environ.get("KB_MCP_API_KEY", "")
+KB_MCP_PORT = int(os.environ.get("KB_MCP_PORT", "3000"))
+KB_MCP_ALLOWED_HOSTS = os.environ.get("KB_MCP_ALLOWED_HOSTS", "")
+
+
+def parse_allowed_hosts() -> list[str]:
+    """Parse KB_MCP_ALLOWED_HOSTS into a list of host strings."""
+    if not KB_MCP_ALLOWED_HOSTS:
+        return []
+    return [h.strip() for h in KB_MCP_ALLOWED_HOSTS.split(",") if h.strip()]
@@ -0,0 +1,208 @@
+"""HTTP client for the kb engine API."""
+
+import httpx
+
+from config import KB_ENGINE_URL, KB_API_KEY
+
+
+def _auth_headers() -> dict[str, str]:
+    h: dict[str, str] = {}
+    if KB_API_KEY:
+        h["Authorization"] = f"Bearer {KB_API_KEY}"
+    return h
+
+
+def _client() -> httpx.Client:
+    return httpx.Client(base_url=KB_ENGINE_URL, headers=_auth_headers(), timeout=60.0)
+
+
+def search(query: str, top: int = 10, tags: list[str] | None = None,
+           doc_type: str | None = None, fts_only: bool = False,
+           vec_only: bool = False, threshold: float | None = None) -> dict:
+    body: dict = {"query": query, "top": top}
+    if tags:
+        body["tags"] = tags
+    if doc_type:
+        body["doc_type"] = doc_type
+    if fts_only:
+        body["fts_only"] = True
+    if vec_only:
+        body["vec_only"] = True
+    if threshold is not None:
+        body["threshold"] = threshold
+    with _client() as c:
+        r = c.post("/api/v1/search", json=body)
+        r.raise_for_status()
+        return r.json()
+
+
+def add_note(text: str, tags: list[str] | None = None,
+             title: str | None = None) -> dict:
+    fields = {"note": text}
+    if tags:
+        fields["tags"] = ",".join(tags)
+    if title:
+        fields["title"] = title
+    with _client() as c:
+        r = c.post("/api/v1/jobs", data=fields)
+        r.raise_for_status()
+        return r.json()
+
+
+def update_note(doc_id: int, text: str) -> dict:
+    with _client() as c:
+        r = c.patch(f"/api/v1/notes/{doc_id}", json={"text": text})
+        r.raise_for_status()
+        return r.json()
+
+
+def get_document(doc_id: int) -> dict:
+    with _client() as c:
+        r = c.get(f"/api/v1/documents/{doc_id}")
+        r.raise_for_status()
+        return r.json()
+
+
+def list_documents(doc_type: str | None = None,
+                   tags: str | None = None) -> list[dict]:
+    params: dict = {}
+    if doc_type:
+        params["type"] = doc_type
+    if tags:
+        params["tags"] = tags
+    with _client() as c:
+        r = c.get("/api/v1/documents", params=params)
+        r.raise_for_status()
+        return r.json()
+
+
+def get_status() -> dict:
+    with _client() as c:
+        r = c.get("/api/v1/status")
+        r.raise_for_status()
+        return r.json()
+
+
+def list_jobs(status: str | None = None) -> list[dict]:
+    params: dict = {}
+    if status:
+        params["status"] = status
+    with _client() as c:
+        r = c.get("/api/v1/jobs", params=params)
+        r.raise_for_status()
+        return r.json()
+
+
+def update_tags(doc_id: int, add: list[str] | None = None,
+                remove: list[str] | None = None) -> dict:
+    body: dict = {}
+    if add:
+        body["add"] = add
+    if remove:
+        body["remove"] = remove
+    with _client() as c:
+        r = c.put(f"/api/v1/documents/{doc_id}/tags", json=body)
+        r.raise_for_status()
+        return r.json()
+
+
+def delete_document(doc_id: int) -> dict:
+    with _client() as c:
+        r = c.delete(f"/api/v1/documents/{doc_id}")
+        r.raise_for_status()
+        return r.json()
+
+
+def _bulk_body(
+    document_ids: list[int] | None = None,
+    tags: list[str] | None = None,
+    doc_type: str | None = None,
+    from_id: int | None = None,
+    to_id: int | None = None,
+    force: bool = False,
+    **extra,
+) -> dict:
+    body: dict = {}
+    if document_ids:
+        body["document_ids"] = document_ids
+    if tags:
+        body["tags"] = tags
+    if doc_type:
+        body["doc_type"] = doc_type
+    if from_id is not None:
+        body["from_id"] = from_id
+    if to_id is not None:
+        body["to_id"] = to_id
+    if force:
+        body["force"] = True
+    body.update(extra)
+    return body
+
+
+def bulk_delete(
+    document_ids: list[int] | None = None,
+    tags: list[str] | None = None,
+    doc_type: str | None = None,
+    from_id: int | None = None,
+    to_id: int | None = None,
+    force: bool = False,
+) -> dict:
+    body = _bulk_body(document_ids, tags, doc_type, from_id, to_id, force)
+    with _client() as c:
+        r = c.post("/api/v1/bulk/delete", json=body)
+        r.raise_for_status()
+        return r.json()
+
+
+def bulk_tags(
+    document_ids: list[int] | None = None,
+    tags: list[str] | None = None,
+    doc_type: str | None = None,
+    from_id: int | None = None,
+    to_id: int | None = None,
+    add: list[str] | None = None,
+    remove: list[str] | None = None,
+    force: bool = False,
+) -> dict:
+    extra = {}
+    if add:
+        extra["add"] = add
+    if remove:
+        extra["remove"] = remove
+    body = _bulk_body(document_ids, tags, doc_type, from_id, to_id, force, **extra)
+    with _client() as c:
+        r = c.post("/api/v1/bulk/tags", json=body)
+        r.raise_for_status()
+        return r.json()
+
+
+def bulk_set_tags(
+    document_ids: list[int] | None = None,
+    tags: list[str] | None = None,
+    doc_type: str | None = None,
+    from_id: int | None = None,
+    to_id: int | None = None,
+    new_tags: list[str] | None = None,
+    force: bool = False,
+) -> dict:
+    extra = {"new_tags": new_tags or []}
+    body = _bulk_body(document_ids, tags, doc_type, from_id, to_id, force, **extra)
+    with _client() as c:
+        r = c.post("/api/v1/bulk/set-tags", json=body)
+        r.raise_for_status()
+        return r.json()
+
+
+def upload_file(filename: str, file_bytes: bytes,
+                tags: list[str] | None = None) -> dict:
+    fields: dict = {}
+    if tags:
+        fields["tags"] = ",".join(tags)
+    with _client() as c:
+        r = c.post(
+            "/api/v1/jobs",
+            data=fields,
+            files={"file": (filename, file_bytes)},
+        )
+        r.raise_for_status()
+        return r.json()
@@ -0,0 +1,4 @@
+mcp>=1.9.0
+httpx>=0.27
+uvicorn>=0.30
+starlette>=0.38
@@ -0,0 +1,446 @@
+"""kb MCP server — exposes knowledge base operations as MCP tools."""
+
+import asyncio
+import json
+import logging
+
+from mcp.server.fastmcp import FastMCP
+from mcp.server.transport_security import TransportSecuritySettings
+from starlette.applications import Starlette
+from starlette.middleware import Middleware
+from starlette.middleware.base import BaseHTTPMiddleware
+from starlette.requests import Request
+from starlette.responses import JSONResponse
+from starlette.routing import Mount
+
+import config
+import engine
+import uploads
+
+logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
+logger = logging.getLogger("kb.mcp")
+
+# ---------------------------------------------------------------------------
+# Transport security — DNS rebinding protection with configurable allowed hosts
+# ---------------------------------------------------------------------------
+
+_LOCALHOST_HOSTS = ["127.0.0.1:*", "localhost:*", "[::1]:*"]
+_LOCALHOST_ORIGINS = ["http://127.0.0.1:*", "http://localhost:*", "http://[::1]:*"]
+
+_extra_hosts = config.parse_allowed_hosts()
+_allowed_hosts = _LOCALHOST_HOSTS + [f"{h}:*" for h in _extra_hosts]
+_allowed_origins = _LOCALHOST_ORIGINS + [f"http://{h}:*" for h in _extra_hosts]
+
+_transport_security = TransportSecuritySettings(
+    enable_dns_rebinding_protection=True,
+    allowed_hosts=_allowed_hosts,
+    allowed_origins=_allowed_origins,
+)
+
+# ---------------------------------------------------------------------------
+# FastMCP server
+# ---------------------------------------------------------------------------
+
+mcp = FastMCP(
+    "kb",
+    instructions=(
+        "Knowledge base MCP server. Provides tools for searching, adding, and "
+        "managing documents and notes. Use tags to organise and filter documents "
+        "(e.g. tag notes with 'agent:mybot' and filter searches by that tag). "
+        "This server requires Bearer token authentication — all requests are "
+        "authenticated via the Authorization header at the HTTP transport layer."
+    ),
+    transport_security=_transport_security,
+)
+
+
+@mcp.tool()
+async def kb_search(
+    query: str,
+    top: int = 10,
+    tags: list[str] | None = None,
+    doc_type: str | None = None,
+    fts_only: bool = False,
+) -> str:
+    """Search the knowledge base for relevant documents and notes.
+
+    Returns ranked chunks matching the query, with text content, relevance scores,
+    and document metadata.
+
+    Args:
+        query: The search query. Can be a natural language question or keywords.
+        top: Maximum number of results to return (default 10).
+        tags: Filter results to documents with ALL of these tags.
+        doc_type: Filter by document type (e.g. "note", "pdf", "markdown", "code").
+        fts_only: If true, use only full-text search (no vector similarity).
+
+    Tips for complex queries:
+    - Consider expanding into 2-3 variant phrasings and calling this tool multiple
+      times, then deduplicating results by chunk_id. For example, search for both
+      "pension revaluation rules" and "how are pensions revalued" to cast a wider net.
+    - For precision, rerank the returned results using your own judgement based on
+      relevance to the original question.
+    """
+    result = engine.search(
+        query=query,
+        top=top,
+        tags=tags or None,
+        doc_type=doc_type,
+        fts_only=fts_only,
+    )
+
+    results_list = result if isinstance(result, list) else result.get("results", [])
+    return json.dumps(results_list, indent=2)
+
+
+@mcp.tool()
+async def kb_addnote(
+    text: str,
+    tags: list[str] | None = None,
+    title: str | None = None,
+) -> str:
+    """Add a text note to the knowledge base for indexing and search.
+
+    The note is queued for ingestion — it will be chunked, embedded, and made
+    searchable. Use kb_jobs to check ingestion status.
+
+    Args:
+        text: The note text content.
+        tags: Tags to apply to the note.
+        title: Optional title (auto-derived from first line if omitted).
+    """
+    result = engine.add_note(text=text, tags=tags or None, title=title)
+    return json.dumps(result, indent=2)
+
+
+@mcp.tool()
+async def kb_update_note(
+    document_id: int,
+    text: str,
+) -> str:
+    """Update an existing note's content in place.
+
+    Replaces the note text, re-chunks, and re-embeds while preserving the
+    document ID, creation timestamp, and tags. Only works on documents with
+    doc_type "note".
+
+    Args:
+        document_id: The ID of the note document to update.
+        text: The new text content for the note.
+    """
+    result = engine.update_note(document_id, text)
+    return json.dumps(result, indent=2)
+
+
+@mcp.tool()
+async def kb_get(
+    document_id: int | None = None,
+    source_path: str | None = None,
+) -> str:
+    """Retrieve document details from the knowledge base.
+
+    Look up a document by its ID or source path. Returns full document metadata,
+    tags, and chunk contents.
+
+    Args:
+        document_id: The numeric document ID.
+        source_path: The document's source path (alternative to document_id).
+    """
+    if document_id is not None:
+        result = engine.get_document(document_id)
+        return json.dumps(result, indent=2)
+    elif source_path is not None:
+        docs = engine.list_documents()
+        matches = [d for d in docs if d.get("source_path") == source_path]
+        if not matches:
+            return json.dumps({"error": "No document found with that source_path"})
+        doc = engine.get_document(matches[0]["id"])
+        return json.dumps(doc, indent=2)
+    else:
+        return json.dumps({"error": "Provide either document_id or source_path"})
+
+
+@mcp.tool()
+async def kb_status() -> str:
+    """Get knowledge base engine status.
+
+    Returns engine version, embedding model info, device info, document counts,
+    database size, and ingestion queue state.
+    """
+    result = engine.get_status()
+    result["authenticated"] = bool(config.KB_MCP_API_KEY)
+    return json.dumps(result, indent=2)
+
+
+@mcp.tool()
+async def kb_jobs(
+    status: str | None = None,
+) -> str:
+    """List ingestion jobs and their status.
+
+    Returns recent jobs showing what has been queued, is processing, completed,
+    or failed.
+
+    Args:
+        status: Filter by job status ("queued", "processing", "done", "failed", "skipped").
+    """
+    result = engine.list_jobs(status=status)
+    return json.dumps(result, indent=2)
+
+
+@mcp.tool()
+async def kb_delete(
+    document_id: int,
+) -> str:
+    """Permanently delete a document from the knowledge base.
+
+    Removes the document and all associated data (chunks, embeddings, tags,
+    stored files). This action cannot be undone.
+
+    Args:
+        document_id: The ID of the document to delete.
+    """
+    result = engine.delete_document(document_id)
+    return json.dumps(result, indent=2)
+
+
+@mcp.tool()
+async def kb_upload_start(
+    filename: str,
+    total_size: int,
+    tags: list[str] | None = None,
+) -> str:
+    """Start a chunked file upload to the knowledge base.
+
+    Use this for uploading files from a remote agent. The upload process is:
+    1. Call kb_upload_start to get an upload_id
+    2. Call kb_upload_chunk repeatedly with base64-encoded file chunks (recommended ~1MB each)
+    3. Call kb_upload_finish to submit the file for ingestion
+
+    Example for a 3MB file:
+        upload = kb_upload_start(filename="report.pdf", total_size=3145728, tags=["project:x"])
+        kb_upload_chunk(upload_id=upload["upload_id"], data="<base64 chunk 0>", chunk_index=0)
+        kb_upload_chunk(upload_id=upload["upload_id"], data="<base64 chunk 1>", chunk_index=1)
+        kb_upload_chunk(upload_id=upload["upload_id"], data="<base64 chunk 2>", chunk_index=2)
+        result = kb_upload_finish(upload_id=upload["upload_id"])
+
+    Args:
+        filename: Original filename (used for type detection).
+        total_size: Total file size in bytes.
+        tags: Tags to apply to the uploaded document.
+    """
+    upload_id = uploads.start_upload(filename, total_size, tags or [])
+    return json.dumps({"upload_id": upload_id})
+
+
+@mcp.tool()
+async def kb_upload_chunk(
+    upload_id: str,
+    data: str,
+    chunk_index: int,
+) -> str:
+    """Upload a base64-encoded chunk of a file.
+
+    Part of the chunked upload flow started by kb_upload_start.
+
+    Args:
+        upload_id: The upload ID from kb_upload_start.
+        data: Base64-encoded file data for this chunk.
+        chunk_index: Zero-based index of this chunk.
+    """
+    try:
+        uploads.add_chunk(upload_id, data, chunk_index)
+        return json.dumps({"status": "ok", "chunk_index": chunk_index})
+    except KeyError as e:
+        return json.dumps({"error": str(e)})
+
+
+@mcp.tool()
+async def kb_upload_finish(
+    upload_id: str,
+) -> str:
+    """Finish a chunked upload and submit the file for ingestion.
+
+    Reassembles all uploaded chunks and forwards the complete file to the
+    engine for processing. Returns the ingestion job ID.
+
+    Args:
+        upload_id: The upload ID from kb_upload_start.
+    """
+    try:
+        filename, file_bytes, tags = uploads.finish_upload(upload_id)
+        result = engine.upload_file(filename, file_bytes, tags)
+        return json.dumps(result, indent=2)
+    except KeyError as e:
+        return json.dumps({"error": str(e)})
+
+
+# ---------------------------------------------------------------------------
+# Bulk operation tools
+# ---------------------------------------------------------------------------
+
+
+@mcp.tool()
+async def kb_bulk_delete(
+    document_ids: list[int] | None = None,
+    tags: list[str] | None = None,
+    doc_type: str | None = None,
+    from_id: int | None = None,
+    to_id: int | None = None,
+    force: bool = False,
+) -> str:
+    """Permanently delete multiple documents matching a filter.
+
+    Removes matched documents and all associated data (chunks, embeddings, tags,
+    stored files). This action cannot be undone.
+
+    Selection filters combine with AND logic — at least one is required.
+
+    A safety threshold applies: if the operation would affect more than 70% of
+    all documents, it is rejected unless force=true.
+
+    Args:
+        document_ids: Delete documents with these specific IDs.
+        tags: Delete documents that have ALL of these tags (selection filter).
+        doc_type: Delete documents of this type (e.g. "note", "pdf").
+        from_id: Delete documents with id >= this value.
+        to_id: Delete documents with id <= this value.
+        force: Override the safety threshold if it would block the operation.
+    """
+    result = engine.bulk_delete(
+        document_ids=document_ids, tags=tags, doc_type=doc_type,
+        from_id=from_id, to_id=to_id, force=force,
+    )
+    return json.dumps(result, indent=2)
+
+
+@mcp.tool()
+async def kb_bulk_tags(
+    document_ids: list[int] | None = None,
+    tags: list[str] | None = None,
+    doc_type: str | None = None,
+    from_id: int | None = None,
+    to_id: int | None = None,
+    add: list[str] | None = None,
+    remove: list[str] | None = None,
+    force: bool = False,
+) -> str:
+    """Add and/or remove tags on multiple documents matching a filter.
+
+    Selection filters combine with AND logic — at least one is required.
+    Note: the 'tags' parameter is a SELECTION FILTER (which documents to target),
+    while 'add' and 'remove' specify the TAG CHANGES to apply to those documents.
+
+    Args:
+        document_ids: Target documents with these specific IDs.
+        tags: Target documents that have ALL of these tags (selection filter).
+        doc_type: Target documents of this type.
+        from_id: Target documents with id >= this value.
+        to_id: Target documents with id <= this value.
+        add: Tags to add to matched documents.
+        remove: Tags to remove from matched documents.
+        force: Override the safety threshold if it would block the operation.
+    """
+    result = engine.bulk_tags(
+        document_ids=document_ids, tags=tags, doc_type=doc_type,
+        from_id=from_id, to_id=to_id, add=add, remove=remove, force=force,
+    )
+    return json.dumps(result, indent=2)
+
+
+@mcp.tool()
+async def kb_bulk_set_tags(
+    document_ids: list[int] | None = None,
+    tags: list[str] | None = None,
+    doc_type: str | None = None,
+    from_id: int | None = None,
+    to_id: int | None = None,
+    new_tags: list[str] | None = None,
+    force: bool = False,
+) -> str:
+    """Replace all tags on multiple documents with a new set.
+
+    Removes ALL existing tags from matched documents, then applies the new tag set.
+    Selection filters combine with AND logic — at least one is required.
+    Note: the 'tags' parameter is a SELECTION FILTER (which documents to target),
+    while 'new_tags' is the REPLACEMENT tag set to apply.
+
+    Args:
+        document_ids: Target documents with these specific IDs.
+        tags: Target documents that have ALL of these tags (selection filter).
+        doc_type: Target documents of this type.
+        from_id: Target documents with id >= this value.
+        to_id: Target documents with id <= this value.
+        new_tags: The replacement tag set to apply to all matched documents.
+        force: Override the safety threshold if it would block the operation.
+    """
+    result = engine.bulk_set_tags(
+        document_ids=document_ids, tags=tags, doc_type=doc_type,
+        from_id=from_id, to_id=to_id, new_tags=new_tags, force=force,
+    )
+    return json.dumps(result, indent=2)
+
+
+# ---------------------------------------------------------------------------
+# Auth middleware
+# ---------------------------------------------------------------------------
+
+class BearerAuthMiddleware(BaseHTTPMiddleware):
+    async def dispatch(self, request: Request, call_next):
+        if not config.KB_MCP_API_KEY:
+            return await call_next(request)
+
+        auth_header = request.headers.get("authorization", "")
+        if auth_header.startswith("Bearer ") and auth_header[7:] == config.KB_MCP_API_KEY:
+            return await call_next(request)
+
+        return JSONResponse(
+            status_code=401,
+            content={"error": "Unauthorized"},
+        )
+
+
+# ---------------------------------------------------------------------------
+# ASGI app assembly
+# ---------------------------------------------------------------------------
+
+def create_app():
+    """Create the ASGI app with auth middleware wrapping the MCP server."""
+    from contextlib import asynccontextmanager
+
+    mcp_app = mcp.streamable_http_app()
+
+    @asynccontextmanager
+    async def lifespan(app):
+        uploads.start_cleanup_task()
+        logger.info("Upload cleanup task started")
+        # Delegate to the MCP app's lifespan if it has one
+        if hasattr(mcp_app, 'router') and hasattr(mcp_app.router, 'lifespan_context'):
+            async with mcp_app.router.lifespan_context(app):
+                yield
+        else:
+            yield
+
+    app = Starlette(
+        routes=[Mount("/", app=mcp_app)],
+        middleware=[Middleware(BearerAuthMiddleware)],
+        lifespan=lifespan,
+    )
+    return app
+
+
+# ---------------------------------------------------------------------------
+# Entry point
+# ---------------------------------------------------------------------------
+
+if __name__ == "__main__":
+    import uvicorn
+
+    logger.info(
+        "Starting kb MCP server on port %d, engine=%s",
+        config.KB_MCP_PORT,
+        config.KB_ENGINE_URL,
+    )
+
+    app = create_app()
+    uvicorn.run(app, host="0.0.0.0", port=config.KB_MCP_PORT)
@@ -0,0 +1,96 @@
+"""Chunked upload staging management."""
+
+import asyncio
+import base64
+import logging
+import shutil
+import tempfile
+import time
+import uuid
+from dataclasses import dataclass, field
+from pathlib import Path
+
+logger = logging.getLogger("kb.mcp.uploads")
+
+UPLOAD_TIMEOUT_SECONDS = 600  # 10 minutes
+
+
+@dataclass
+class StagedUpload:
+    upload_id: str
+    filename: str
+    total_size: int
+    tags: list[str]
+    staging_dir: Path
+    created_at: float = field(default_factory=time.time)
+    chunks: dict[int, Path] = field(default_factory=dict)
+
+
+_uploads: dict[str, StagedUpload] = {}
+_cleanup_task: asyncio.Task | None = None
+
+
+def start_upload(filename: str, total_size: int, tags: list[str]) -> str:
+    upload_id = str(uuid.uuid4())
+    staging_dir = Path(tempfile.mkdtemp(prefix=f"kb_upload_{upload_id[:8]}_"))
+    _uploads[upload_id] = StagedUpload(
+        upload_id=upload_id,
+        filename=filename,
+        total_size=total_size,
+        tags=tags,
+        staging_dir=staging_dir,
+    )
+    logger.info("Started upload %s for %s (%d bytes)", upload_id, filename, total_size)
+    return upload_id
+
+
+def add_chunk(upload_id: str, data_b64: str, chunk_index: int) -> None:
+    upload = _uploads.get(upload_id)
+    if upload is None:
+        raise KeyError(f"Upload ID not found: {upload_id}")
+    chunk_bytes = base64.b64decode(data_b64)
+    chunk_path = upload.staging_dir / f"chunk_{chunk_index:06d}"
+    chunk_path.write_bytes(chunk_bytes)
+    upload.chunks[chunk_index] = chunk_path
+    logger.info("Added chunk %d to upload %s (%d bytes)", chunk_index, upload_id, len(chunk_bytes))
+
+
+def finish_upload(upload_id: str) -> tuple[str, bytes, list[str]]:
+    """Reassemble chunks and return (filename, file_bytes, tags)."""
+    upload = _uploads.get(upload_id)
+    if upload is None:
+        raise KeyError(f"Upload ID not found: {upload_id}")
+    try:
+        parts = []
+        for idx in sorted(upload.chunks.keys()):
+            parts.append(upload.chunks[idx].read_bytes())
+        file_bytes = b"".join(parts)
+        return upload.filename, file_bytes, upload.tags
+    finally:
+        _cleanup_upload(upload_id)
+
+
+def _cleanup_upload(upload_id: str) -> None:
+    upload = _uploads.pop(upload_id, None)
+    if upload and upload.staging_dir.exists():
+        shutil.rmtree(upload.staging_dir, ignore_errors=True)
+
+
+async def cleanup_abandoned_uploads() -> None:
+    """Background task that removes uploads older than the timeout."""
+    while True:
+        await asyncio.sleep(60)
+        now = time.time()
+        expired = [
+            uid for uid, u in _uploads.items()
+            if now - u.created_at > UPLOAD_TIMEOUT_SECONDS
+        ]
+        for uid in expired:
+            logger.warning("Cleaning up abandoned upload %s", uid)
+            _cleanup_upload(uid)
+
+
+def start_cleanup_task() -> None:
+    global _cleanup_task
+    if _cleanup_task is None or _cleanup_task.done():
+        _cleanup_task = asyncio.create_task(cleanup_abandoned_uploads())
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-03-29
@@ -0,0 +1,69 @@
+## Context
+
+When a document is ingested, the worker chunks its content and stores each chunk's text in the `chunks` table. FTS5 triggers index that text, and the embedding model embeds it. The document title is stored only in `documents.title` — it never participates in search. This means short documents (or documents whose content lacks the title keywords) are invisible to queries that match the title.
+
+The reindex endpoint (`POST /api/v1/reindex`) currently reads `chunks.text` and re-embeds it. Any fix must apply consistently at both ingestion and reindex time.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Document titles are searchable via both FTS5 and vector search
+- Section header breadcrumbs (when present in chunk metadata) are also searchable
+- Search results continue to return the original chunk text (no title prefix in the `text` field returned to clients)
+- Existing documents become searchable by title after a `kb reindex`
+- No schema-breaking migration — additive column only
+
+**Non-Goals:**
+- Changing the chunking strategies themselves (note, markdown, code, docling)
+- Adding a separate title-search endpoint or client-side title filtering
+- Changing the search result JSON structure
+
+## Decisions
+
+### 1. Add an `enriched_text` column to the `chunks` table
+
+Store the title-prefixed text in a new `chunks.enriched_text` column alongside the existing `chunks.text`. The `text` column remains the raw chunk content (used for display in search results). The `enriched_text` column holds `"{title}\n\n{section_header}\n\n{text}"` (with section_header omitted when absent).
+
+**Why not just modify `chunks.text`?** The title would then appear in every search result's text field, which is redundant (title is already a separate field) and would confuse consumers that display results.
+
+**Why not reconstruct enriched text on-the-fly at search time?** FTS5 uses an external content table and triggers — it needs a real column to index. Reconstructing via JOIN at FTS query time would defeat the purpose of the FTS index.
+
+### 2. Point FTS5 at `enriched_text` instead of `text`
+
+Update the FTS5 virtual table definition and its sync triggers to index `enriched_text` rather than `text`. This is the core change that makes titles searchable via keyword search.
+
+Since FTS5 external content tables cannot be ALTERed, existing databases require a rebuild: drop and recreate `chunks_fts` and its triggers, then repopulate. This is handled as a schema migration in `init_schema`.
+
+### 3. Embed `enriched_text` instead of `text`
+
+At ingestion time, pass `enriched_text` values to `embed_texts()` instead of raw chunk text. At reindex time, read `enriched_text` from the database. This makes titles searchable via vector similarity too.
+
+### 4. Build enriched text in the worker, not in the ingest modules
+
+The enrichment format is: `"{title}\n\n{chunk_text}"` or `"{title} > {section_header}\n\n{chunk_text}"` when a section header exists in chunk metadata.
+
+This happens in `worker._process_job()` after chunking and before embedding/insertion. The ingest modules remain unchanged — they continue to return raw chunk text and metadata.
+
+### 5. Schema migration adds `enriched_text` and rebuilds FTS
+
+The `init_schema` function will:
+1. Add `enriched_text TEXT` column to `chunks` if missing
+2. Backfill `enriched_text` from existing data (join with `documents.title` and chunk metadata)
+3. Drop and recreate `chunks_fts` to index `enriched_text` instead of `text`
+4. Recreate the FTS sync triggers
+
+This is safe because the migration only runs when the column is missing (first startup after upgrade). The backfill uses a single UPDATE...FROM query.
+
+## Risks / Trade-offs
+
+**Slightly larger database** — Each chunk stores the title string twice (once in `enriched_text`, once via the document FK). For a typical KB with short titles this is negligible (< 1% size increase).
+→ Acceptable for the search quality improvement.
+
+**FTS rebuild on upgrade** — First startup after upgrade will rebuild the FTS index, which takes a few seconds for large KBs.
+→ This is a one-time cost and happens automatically.
+
+**Embedding drift** — Existing vector embeddings won't include title context until `kb reindex` is run. The FTS backfill happens automatically, but vectors require an explicit reindex.
+→ Document this in release notes. The FTS improvement alone is a significant win even without reindexing vectors.
+
+**Title changes not propagated** — If a document's title were ever updated, `enriched_text` would be stale. Currently the engine has no title-update endpoint, so this is not a concern.
+→ No mitigation needed now. If title editing is added later, it should update enriched_text.
@@ -0,0 +1,28 @@
+## Why
+
+Short documents and notes are unsearchable when the user's query matches the document title but not the chunk content. For example, a document titled "Suitcase Locks" containing only "Steve = 1234 / Theresa = 4567" is invisible to both FTS and vector search for the query "suitcase locks". This is because chunk text — the only thing indexed and embedded — does not include the document title. This is a standard RAG deficiency that most pipelines solve by prepending title context to each chunk.
+
+## What Changes
+
+- **Prepend document title to chunk text at ingestion time**: Before embedding and FTS indexing, each chunk's text will be prefixed with the document title (e.g., `"Suitcase Locks\n\n Steve = 363..."`). This ensures the title participates in both full-text and semantic search.
+- **Include section header context in chunk text**: For chunks that have a `section_header` in their metadata, prepend the header breadcrumb too (e.g., `"DCG Lab Hardware > GRIMDAWN > motherboard\n\nMSI X870 Tomahawk..."`). This improves search for queries that reference section names.
+- **Store the raw chunk text separately from the enriched text**: The original chunk text (without title prefix) must remain accessible so that search results don't display the prepended title redundantly — the title is already returned as a separate field.
+- **Reindex command must apply the same enrichment**: When `kb reindex` re-embeds all chunks, it must reconstruct the enriched text (title + section header + chunk text) from stored metadata.
+
+## Capabilities
+
+### New Capabilities
+- `chunk-enrichment`: Prepending document title and section context to chunk text before indexing and embedding, while preserving the original text for display.
+
+### Modified Capabilities
+- `engine-api`: The search endpoint's returned `text` field must continue to show the original chunk text (without the prepended title), so no visible API change, but the internal indexing behaviour changes. The reindex endpoint must apply enrichment consistently.
+
+## Impact
+
+- **Engine ingestion pipeline** (`worker.py`): The `_process_job` function must build enriched text from title + section headers + chunk text before passing to `embed_texts()` and `insert_chunk()`.
+- **Database schema** (`database.py`): Need to store both raw `text` (for display) and enriched `text` (for FTS/embedding), or reconstruct enriched text at index time. Simplest approach: store raw text in `chunks.text`, use enriched text only for FTS content and embedding vectors.
+- **FTS triggers** (`database.py`): The FTS5 external content table currently mirrors `chunks.text`. If we add an `enriched_text` column, the FTS index should be built from that instead.
+- **Reindex flow** (`worker.py` / `database.py`): Must reconstruct enriched text by joining chunk metadata with document title.
+- **Search result enrichment** (`routes/search.py`): No change needed — results already return `chunks.text` (raw) and `documents.title` separately.
+- **All four ingest modules** (`note.py`, `markdown.py`, `code.py`, `docling_pipeline.py`): No changes needed — enrichment happens after chunking, in the worker.
+- **Existing documents**: Require a `reindex` to benefit from the new enrichment. No data migration needed since the original text is preserved.
@@ -0,0 +1,75 @@
+## ADDED Requirements
+
+### Requirement: Chunk text enrichment with document title
+
+The engine SHALL prepend the document title to each chunk's text before FTS indexing and vector embedding. The enriched text SHALL be stored in a dedicated `enriched_text` column on the `chunks` table. The original chunk text SHALL remain in the `text` column for display purposes.
+
+The enrichment format SHALL be:
+- Without section header: `"{title}\n\n{chunk_text}"`
+- With section header: `"{title} > {section_header}\n\n{chunk_text}"`
+
+Where `section_header` is the value from the chunk's metadata `section_header` field, when present.
+
+#### Scenario: Note ingestion with title enrichment
+- **WHEN** a note titled "Suitcase Locks" with content "Steve = 363" is ingested
+- **THEN** the `chunks.text` column SHALL contain "Steve = 363" and the `chunks.enriched_text` column SHALL contain "Suitcase Locks\n\nSteve = 363"
+
+#### Scenario: Markdown chunk with section header enrichment
+- **WHEN** a markdown document titled "DCG Lab Hardware" produces a chunk with section_header "GRIMDAWN > motherboard" and text "MSI X870 Tomahawk"
+- **THEN** the `chunks.enriched_text` SHALL contain "DCG Lab Hardware > GRIMDAWN > motherboard\n\nMSI X870 Tomahawk"
+
+#### Scenario: Chunk without section header
+- **WHEN** a document titled "Docker Tips" produces a chunk with no section_header in metadata and text "dbash() { docker exec -it $1 bash; }"
+- **THEN** the `chunks.enriched_text` SHALL contain "Docker Tips\n\ndbash() { docker exec -it $1 bash; }"
+
+---
+
+### Requirement: FTS5 indexes enriched text
+
+The FTS5 virtual table `chunks_fts` SHALL index the `enriched_text` column instead of the `text` column. All FTS sync triggers (insert, update, delete) SHALL operate on `enriched_text`.
+
+#### Scenario: FTS search matches document title
+- **WHEN** a user searches for "suitcase locks" and a document titled "Suitcase Locks" exists with chunk text "Steve = 363"
+- **THEN** the FTS5 search SHALL return that chunk as a match
+
+#### Scenario: FTS search still matches chunk content
+- **WHEN** a user searches for "MSI X870" and a chunk contains that text in its body
+- **THEN** the FTS5 search SHALL return that chunk as a match (enrichment does not break content matching)
+
+---
+
+### Requirement: Vector embeddings use enriched text
+
+The embedding model SHALL receive `enriched_text` (not raw `text`) when generating vectors during both initial ingestion and reindex operations.
+
+#### Scenario: Vector search matches document title
+- **WHEN** a user searches semantically for "luggage combination codes" and a document titled "Suitcase Locks" exists
+- **THEN** the vector search SHALL return that chunk with higher similarity than it would without title enrichment
+
+#### Scenario: Reindex uses enriched text
+- **WHEN** `POST /api/v1/reindex` is called
+- **THEN** the engine SHALL read `enriched_text` from the chunks table and embed that (not `text`)
+
+---
+
+### Requirement: Schema migration adds enriched_text column
+
+On startup, `init_schema` SHALL add the `enriched_text` column to the `chunks` table if it does not exist. It SHALL then backfill `enriched_text` for all existing chunks by joining with `documents.title` and parsing chunk metadata for section headers. It SHALL rebuild the FTS5 table and triggers to index `enriched_text`.
+
+#### Scenario: First startup after upgrade
+- **WHEN** the engine starts and `chunks.enriched_text` column does not exist
+- **THEN** the engine SHALL add the column, backfill all rows, drop and recreate `chunks_fts` to index `enriched_text`, and recreate the FTS sync triggers
+
+#### Scenario: Subsequent startup
+- **WHEN** the engine starts and `chunks.enriched_text` column already exists
+- **THEN** the engine SHALL not perform any migration and start normally
+
+---
+
+### Requirement: Search results return raw text
+
+Search results SHALL continue to return the original chunk text (from `chunks.text`) in the `text` field, not the enriched text. The document title is already returned as a separate `title` field.
+
+#### Scenario: Search result text field
+- **WHEN** a search returns a chunk from document "Suitcase Locks" with raw text "Steve = 363"
+- **THEN** the result `text` field SHALL be "Steve = 363" (not "Suitcase Locks\n\nSteve = 363")
@@ -0,0 +1,31 @@
+## MODIFIED Requirements
+
+### Requirement: Background ingestion worker
+
+The engine SHALL run a background worker that processes queued jobs. The worker SHALL process one job at a time. For each job, it SHALL: detect document type, run the appropriate chunking pipeline (Docling for PDFs, header-based for Markdown, AST-based for code, whole-text for notes), build enriched text by prepending the document title (and section header when present) to each chunk's text, generate embeddings using the enriched text and the resident model, insert chunks (with both raw text and enriched text) and vectors into the database, and move the original file to persistent storage.
+
+#### Scenario: Successful PDF ingestion
+- **WHEN** the background worker picks up a queued PDF job
+- **THEN** it SHALL update the job status to `processing`, run Docling conversion and chunking, build enriched text for each chunk by prepending the document title, embed all chunks using enriched text, insert document and chunks into the database, move the staged file to `{data_dir}/documents/{content_hash}.pdf`, update `documents.stored_path` with the permanent path, store the original filename in `documents.original_filename`, update the job status to `done` with the resulting document_id and chunk count, and clean up the staging entry
+
+#### Scenario: Ingestion failure
+- **WHEN** the background worker encounters an error during processing (e.g., corrupt PDF)
+- **THEN** it SHALL update the job status to `failed` with the error message, delete the staged file, and continue processing the next queued job
+
+#### Scenario: Search during active ingestion
+- **WHEN** a search request arrives while the background worker is processing a job
+- **THEN** the search SHALL execute without blocking (SQLite WAL mode) and return results from already-ingested documents
+
+---
+
+### Requirement: Engine status and reindex
+
+The engine SHALL provide status information and support re-embedding all chunks. The `version` field in the status response SHALL always be present and SHALL reflect the engine's release version as read from the `VERSION` file. This field is the contract used by clients for compatibility checking.
+
+#### Scenario: Get engine status
+- **WHEN** a client sends `GET /api/v1/status`
+- **THEN** the engine SHALL return JSON with `version` (string, from VERSION file), model_name, embedding_dim, GPU device info, database stats (document count by type, total chunks, DB size), and queue stats (queued/processing job count)
+
+#### Scenario: Trigger reindex
+- **WHEN** a client sends `POST /api/v1/reindex`
+- **THEN** the engine SHALL re-embed all existing chunks using the `enriched_text` column and the currently loaded model, and return progress information. This operation SHALL NOT block search queries.
@@ -0,0 +1,33 @@
+## 1. Schema Migration
+
+- [x] 1.1 Add `enriched_text TEXT` column to `chunks` table in `database.py:init_schema` (with migration check for existing DBs)
+- [x] 1.2 Write backfill query: `UPDATE chunks SET enriched_text = ... FROM documents` joining title and parsing chunk metadata for section_header
+- [x] 1.3 Drop and recreate `chunks_fts` virtual table to index `enriched_text` instead of `text`
+- [x] 1.4 Update FTS sync triggers (`chunks_ai`, `chunks_ad`, `chunks_au`) to use `enriched_text`
+
+## 2. Enrichment Helper
+
+- [x] 2.1 Create `build_enriched_text(title: str, chunk_text: str, metadata: dict | None) -> str` helper function in `worker.py` (or a shared util) that formats `"{title} > {section_header}\n\n{chunk_text}"` or `"{title}\n\n{chunk_text}"`
+
+## 3. Ingestion Pipeline
+
+- [x] 3.1 Update `worker._process_job()` to build enriched text for each chunk after chunking
+- [x] 3.2 Pass enriched text to `embed_texts()` instead of raw chunk text
+- [x] 3.3 Pass enriched text to `database.insert_chunk()` as the new `enriched_text` parameter
+- [x] 3.4 Update `database.insert_chunk()` to accept and store `enriched_text`
+
+## 4. Reindex
+
+- [x] 4.1 Update `routes/reindex.py` to read `enriched_text` from chunks table and embed that instead of `text`
+
+## 5. Search Results
+
+- [x] 5.1 Verify `search.py:_enrich()` returns `chunks.text` (raw) not `enriched_text` — no change expected, but confirm
+
+## 6. Testing
+
+- [x] 6.1 Test: ingest a short note with a descriptive title, search by title keywords, confirm it is found
+- [x] 6.2 Test: ingest a markdown doc, search by section header, confirm chunks are found
+- [x] 6.3 Test: verify search result `text` field does not contain the prepended title
+- [x] 6.4 Test: run `reindex`, verify enriched text is used for new embeddings
+- [x] 6.5 Test: verify schema migration backfills enriched_text for pre-existing chunks on startup
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-03-31
@@ -0,0 +1,52 @@
+## Context
+
+README.md currently serves as a single documentation file for both users and developers. It contains ~290 lines mixing installation/usage instructions with build-from-source steps, release scripts, Docker image internals, and developer notes (e.g., ROCm migration plans). There is no DEVELOPER.md or CONTRIBUTING.md file.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Separate user-facing documentation (README.md) from developer-facing documentation (DEVELOPER.md)
+- README.md should answer: "What is this? How do I install it? How do I use it?"
+- DEVELOPER.md should answer: "How do I build from source? How do I release? How do I contribute?"
+- Provide a clear cross-reference link between the two files
+
+**Non-Goals:**
+- Rewriting or improving documentation content itself (just moving it)
+- Creating additional docs files (CONTRIBUTING.md, architecture docs, etc.)
+- Changing any code, build scripts, or CI configuration
+
+## Decisions
+
+### 1. Single DEVELOPER.md file (not multiple docs files)
+
+All developer content goes into one top-level DEVELOPER.md rather than a `docs/` directory or separate CONTRIBUTING.md / BUILDING.md files. The total developer content is small enough (~80 lines) that splitting further would be unnecessary overhead. A single file at the repo root is immediately discoverable.
+
+**Alternative considered**: `docs/` directory with multiple files. Rejected because the content volume doesn't justify the structure, and root-level DEVELOPER.md is a well-known convention.
+
+### 2. Content split boundary
+
+Content stays in README.md if it's needed by someone who just wants to **run** kb. Content moves to DEVELOPER.md if it's only needed by someone who wants to **build, modify, or release** kb.
+
+Specifically moving to DEVELOPER.md:
+- "From source" subsections under both engine and client install
+- Entire "Building and releasing" section (release scripts, version checking, Docker image tags, registry overrides)
+- "Future: ROCm runtime migration" developer note
+
+Staying in README.md:
+- Architecture overview (helps users understand what they're running)
+- Pre-built image / release install instructions
+- Client configuration
+- Usage examples
+- Engine configuration table
+- Data portability
+- API reference
+- Claude Code skill reference
+
+### 3. Cross-reference approach
+
+A short note in README.md's Quick Start section pointing to DEVELOPER.md for building from source. No back-link needed from DEVELOPER.md since developers will naturally find README.md first.
+
+## Risks / Trade-offs
+
+- **[Stale cross-references]** If DEVELOPER.md sections are renamed, the link from README.md could break. Mitigation: link to the file, not to a specific anchor.
+- **[Discoverability]** Some users who want to build from source might miss DEVELOPER.md. Mitigation: explicit "See DEVELOPER.md" callout in the Quick Start section where "from source" instructions used to be.
@@ -0,0 +1,28 @@
+## Why
+
+README.md currently mixes user-facing content (what kb does, how to install and use it) with developer-facing content (building from source, releasing, Docker image internals, architecture deep-dives). Users looking for quick-start instructions have to scroll past release scripts and build commands. Developers looking for contribution/build info have to hunt through user docs. Splitting these into README.md (users) and DEVELOPER.md (developers/contributors) follows standard open-source convention and makes both audiences' experience cleaner.
+
+## What Changes
+
+- **Trim README.md** to focus on user-facing content: what kb is, how to install (from pre-built images/releases), how to configure, how to use, engine configuration reference, data portability, and API reference.
+- **Remove "from source" build instructions** from README.md (both engine and client sections).
+- **Remove "Building and releasing" section** from README.md entirely.
+- **Remove "Future: ROCm runtime migration"** developer note from README.md.
+- **Create DEVELOPER.md** containing: building engine from source, building client from source, release process (client and engine), Docker image details, version checking, ROCm migration notes, and any other contributor-oriented content.
+- **Add a link** from README.md to DEVELOPER.md for developers who want to build from source or contribute.
+
+## Capabilities
+
+### New Capabilities
+- `developer-docs`: Developer-facing documentation covering building from source, releasing, and contributing.
+
+### Modified Capabilities
+
+(none - no spec-level behavior changes, this is a documentation restructuring)
+
+## Impact
+
+- **Files modified**: `README.md` (trimmed)
+- **Files created**: `DEVELOPER.md` (new)
+- **No code changes**: purely documentation restructuring
+- **No API changes**: no functional impact
@@ -0,0 +1,63 @@
+## ADDED Requirements
+
+### Requirement: DEVELOPER.md exists at repo root
+The repository SHALL have a `DEVELOPER.md` file at the project root containing all developer-facing documentation.
+
+#### Scenario: File exists
+- **WHEN** a developer navigates to the repository root
+- **THEN** a `DEVELOPER.md` file SHALL be present
+
+### Requirement: DEVELOPER.md contains build-from-source instructions
+DEVELOPER.md SHALL contain instructions for building both the engine and client from source.
+
+#### Scenario: Engine build from source
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include instructions for starting the engine from source using compose files (both NVIDIA and ROCm)
+
+#### Scenario: Client build from source
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include instructions for building the client binary from source using `make build` and `make all`
+
+### Requirement: DEVELOPER.md contains release process
+DEVELOPER.md SHALL document the release process for both client and engine, including release scripts, version bumping, and Docker image tagging.
+
+#### Scenario: Client release documentation
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include `release-client.sh` usage with flag options (--gitea, --github, --minor, --no-increment, --dry-run)
+
+#### Scenario: Engine release documentation
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include `release-engine.sh` usage with flag options and Docker image tag conventions
+
+#### Scenario: Version checking
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include how to check client and engine versions
+
+### Requirement: DEVELOPER.md contains developer notes
+DEVELOPER.md SHALL include any forward-looking developer notes such as migration plans or technical debt items.
+
+#### Scenario: ROCm migration note
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include the ROCm runtime migration note about onnxruntime and MIGraphX
+
+### Requirement: README.md excludes developer-only content
+README.md SHALL NOT contain build-from-source instructions, release processes, or developer-only notes.
+
+#### Scenario: No from-source build steps in README
+- **WHEN** a user reads README.md
+- **THEN** there SHALL be no "From source" subsections under engine or client installation
+
+#### Scenario: No release section in README
+- **WHEN** a user reads README.md
+- **THEN** there SHALL be no "Building and releasing" section
+
+#### Scenario: No developer notes in README
+- **WHEN** a user reads README.md
+- **THEN** there SHALL be no "Future: ROCm runtime migration" section
+
+### Requirement: README.md cross-references DEVELOPER.md
+README.md SHALL include a link to DEVELOPER.md for users who want to build from source or contribute.
+
+#### Scenario: Developer link in quick start
+- **WHEN** a user reads the Quick Start section of README.md
+- **THEN** there SHALL be a note pointing to DEVELOPER.md for building from source
@@ -0,0 +1,17 @@
+## 1. Create DEVELOPER.md
+
+- [x] 1.1 Create DEVELOPER.md at repo root with engine build-from-source instructions (compose.nvidia.yaml and compose.rocm.yaml)
+- [x] 1.2 Add client build-from-source instructions (make build, make all)
+- [x] 1.3 Add "Building and releasing" section: release-client.sh and release-engine.sh usage with all flag options
+- [x] 1.4 Add version checking instructions (kb --version, curl status endpoint)
+- [x] 1.5 Add Docker image tag conventions and registry override documentation
+- [x] 1.6 Add "Future: ROCm runtime migration" developer note
+
+## 2. Trim README.md
+
+- [x] 2.1 Remove "From source (for development)" subsection under engine quick start
+- [x] 2.2 Remove "From source (for development)" subsection under client installation
+- [x] 2.3 Remove entire "Building and releasing" section
+- [x] 2.4 Remove "Future: ROCm runtime migration" section
+- [x] 2.5 Add cross-reference note to DEVELOPER.md in the Quick Start section for building from source
+- [x] 2.6 Move API reference section from README.md to DEVELOPER.md
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-03-31
@@ -0,0 +1,51 @@
+## Context
+
+The kb client currently overloads the root Cobra command to handle both command dispatch and implicit note ingestion. Any unrecognized multi-word input is silently submitted as a note via `POST /api/v1/jobs`. This was introduced to reduce friction for note-taking but has proven error-prone — typos in commands create unwanted notes. A single-word guard was added but multi-word typos still slip through.
+
+The root command has: custom `ArbitraryArgs` validation, a `RunE` with arg-count branching, a `--tags` flag for the note shorthand, a custom usage template with `isRootCmd` template function, and `submitNote()` living in `add.go`.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Eliminate accidental note creation from mistyped commands
+- Provide a clean, explicit `addnote` command that pairs with existing `addfile`
+- Revert root command to standard Cobra behaviour (no custom args, no custom template)
+- Keep the same API contract — `POST /api/v1/jobs` with `note` field unchanged
+
+**Non-Goals:**
+- Changing the engine API
+- Modifying `addfile` behaviour
+- Adding new content types (url, bookmark, etc.)
+- Backward compatibility shim for `kb "text"` syntax
+
+## Decisions
+
+### 1. New `addnote` command in its own file
+
+Create `client/cmd/addnote.go` with a `cobra.Command` that takes `ExactArgs(1)` — a single quoted string. This mirrors `addfile` which also takes `ExactArgs(1)`.
+
+**Rationale**: Keeps each command in its own file (consistent with the existing pattern). `ExactArgs(1)` means the user must quote multi-word notes, which is unambiguous and avoids the flag-parsing edge cases that plagued the implicit shorthand.
+
+**Alternative considered**: Joining `ArbitraryArgs` like the old shorthand. Rejected — this is exactly the ambiguity we're removing.
+
+### 2. Move `submitNote()` from `add.go` to `addnote.go`
+
+The function is only used by the addnote command, so it belongs in the same file.
+
+**Rationale**: `add.go` becomes purely about file operations (it already is, aside from hosting `submitNote()`). Clean separation.
+
+### 3. Fully revert root command to Cobra defaults
+
+Remove: `ArbitraryArgs`, custom `RunE` (replace with nil — Cobra shows help by default), `--tags` flag on root, custom usage template, `isRootCmd` template function.
+
+**Rationale**: The root command should do one thing — dispatch to subcommands. All the custom logic was there to support the implicit shorthand which is being removed.
+
+### 4. `addnote` gets its own `--tags` flag
+
+The `--tags` flag moves from the root command to `addnote`, matching how `addfile` already has its own `--tags` flag.
+
+## Risks / Trade-offs
+
+- **Breaking change for existing users** → Mitigated by clear error messaging. If someone types `kb "some text"`, Cobra will say "unknown command". The `examples` command will show the new syntax.
+- **Slightly more typing for notes** (`kb addnote "text"` vs `kb "text"`) → Acceptable trade-off for eliminating accidental ingestion. Tab-completion helps.
+- **Scripts using old syntax will break** → This is intentional. The old syntax was a foot-gun.
@@ -0,0 +1,32 @@
+## Why
+
+The implicit note shorthand (`kb "some text"`) makes it too easy to accidentally add notes when mistyping commands. Despite the single-word guard, any multi-word typo (e.g. `kb lisst --type pdf`) silently creates a note. The root command doing double-duty as both command dispatcher and note ingester undermines user trust. Reverting to explicit, structured add commands eliminates accidental ingestion and gives every content type a clear, discoverable verb.
+
+## What Changes
+
+- **New `addnote` command**: `kb addnote <text>` takes a single quoted positional argument and submits it as a note. Supports `--tags`. The `submitNote()` logic moves from `root.go` to a new `addnote.go` command file.
+- **Remove implicit note shorthand**: The root command reverts to standard Cobra behaviour — no `ArbitraryArgs`, no special arg-count logic, no `--tags` flag on root. Unknown input gets Cobra's default "unknown command" error.
+- **Remove custom usage template**: The root command no longer needs the `isRootCmd` template logic. Standard Cobra usage template for all commands.
+- **Update examples**: `examples.go` updated to show `kb addnote` instead of bare `kb "text"`.
+- **Update tests**: Remove implicit note shorthand tests, add `addnote` command tests.
+- **`addfile` unchanged**: Stays exactly as-is.
+- **BREAKING**: `kb "note text"` no longer works. Users must use `kb addnote "note text"`.
+
+## Capabilities
+
+### New Capabilities
+
+_(none)_
+
+### Modified Capabilities
+
+- `go-client`: The "Implicit note shorthand" requirement is removed entirely and replaced by a new "Add note command" requirement. The "Add command (file and note ingestion)" requirement description is updated to reflect `addnote` / `addfile` as the two ingestion commands. The root command reverts to standard Cobra behaviour with no custom arg handling or usage template.
+
+## Impact
+
+- `client/cmd/root.go` — remove `ArbitraryArgs`, `RunE` note logic, `--tags` flag, custom usage template, `isRootCmd` template func
+- `client/cmd/add.go` — `submitNote()` function moves to new `addnote.go` (or stays in `add.go` alongside `addfile` — design decision)
+- `client/cmd/addnote.go` — new file defining the `addnote` command
+- `client/cmd/examples.go` — update example text
+- `client/cmd/root_test.go` — remove implicit note shorthand tests, add standard Cobra behaviour tests
+- No engine changes — the API contract (`POST /api/v1/jobs` with `note` field) is unchanged
@@ -0,0 +1,87 @@
+## ADDED Requirements
+
+### Requirement: Add note command
+
+The client SHALL provide a `kb addnote <text>` command that submits a text note to the engine for ingestion. The command SHALL take exactly one positional argument (the note text) and support a `--tags` flag for comma-separated tags. The note SHALL be submitted via `POST /api/v1/jobs` with the `note` field in a multipart request.
+
+#### Scenario: Add a note
+- **WHEN** the user runs `kb addnote "remember to update DNS records"`
+- **THEN** the client SHALL submit the text as a note via `POST /api/v1/jobs` and print `Queued: note`
+
+#### Scenario: Add a note with tags
+- **WHEN** the user runs `kb addnote "server room is building 3" --tags ops`
+- **THEN** the client SHALL submit the note with the specified tags
+
+#### Scenario: Add a note with JSON output
+- **WHEN** the user runs `kb addnote "my note" --format json`
+- **THEN** the client SHALL output the raw JSON response from the engine
+
+#### Scenario: Duplicate note detection
+- **WHEN** the user runs `kb addnote "my note"` and the engine returns HTTP 409
+- **THEN** the client SHALL display the duplicate information (document ID or job ID) and exit with code 0
+
+#### Scenario: Missing argument
+- **WHEN** the user runs `kb addnote` with no arguments
+- **THEN** the client SHALL display an error indicating that the note text argument is required
+
+#### Scenario: Too many arguments
+- **WHEN** the user runs `kb addnote remember to update dns` (unquoted, multiple args)
+- **THEN** the client SHALL display an error indicating that exactly one argument is required, with a hint to quote the text
+
+## MODIFIED Requirements
+
+### Requirement: Add command (file and note ingestion)
+
+The client SHALL provide a `kb addfile` command that uploads files to the engine for async ingestion. The command SHALL validate file extensions before uploading and reject unsupported types. The client SHALL handle duplicate rejection (HTTP 409) and display the existing document information. Notes are handled by the separate `addnote` command — `addfile` is exclusively for file uploads.
+
+#### Scenario: Add a single file
+- **WHEN** the user runs `kb addfile report.pdf`
+- **THEN** the client SHALL validate the file extension, upload the file via `POST /api/v1/jobs` (multipart), print "Queued: report.pdf", and exit
+
+#### Scenario: Add a file with tags
+- **WHEN** the user runs `kb addfile manual.pdf --tags car,maintenance`
+- **THEN** the client SHALL include the tags in the multipart upload metadata
+
+#### Scenario: Add a directory recursively
+- **WHEN** the user runs `kb addfile ~/documents/ --recursive`
+- **THEN** the client SHALL discover all supported files in the directory tree, upload each one sequentially, and print "Queued: N files"
+
+#### Scenario: Unsupported file extension
+- **WHEN** the user runs `kb addfile photo.jpg`
+- **THEN** the client SHALL print an error listing supported extensions and exit with a non-zero code without making any API call
+
+#### Scenario: Duplicate file rejected (already ingested)
+- **WHEN** the user runs `kb addfile report.pdf` and the engine returns HTTP 409 with `{"error": "duplicate", "document_id": 42, "title": "report.pdf"}`
+- **THEN** the client SHALL print "Already imported: report.pdf (doc ID: 42)" and exit with code 0
+
+#### Scenario: Duplicate file rejected (in-flight job)
+- **WHEN** the user runs `kb addfile report.pdf` and the engine returns HTTP 409 with `{"error": "duplicate", "job_id": 7, "title": "report.pdf"}`
+- **THEN** the client SHALL print "Already queued: report.pdf (job ID: 7)" and exit with code 0
+
+#### Scenario: Duplicate file in recursive add
+- **WHEN** the user runs `kb addfile ~/documents/ --recursive` and some files are rejected as duplicates
+- **THEN** the client SHALL print the duplicate message for each rejected file, continue uploading remaining files, and include a summary (e.g., "Queued: 5 files, 2 duplicates skipped")
+
+#### Scenario: Duplicate with JSON output
+- **WHEN** the user runs `kb addfile report.pdf --format json` and the engine returns HTTP 409
+- **THEN** the client SHALL output the raw JSON response from the engine including the document_id and title
+
+#### Scenario: Add with JSON output
+- **WHEN** the user runs `kb addfile report.pdf --format json`
+- **THEN** the client SHALL output the JSON response from the engine including the job_id
+
+#### Scenario: File not found
+- **WHEN** the user runs `kb addfile nonexistent.pdf`
+- **THEN** the client SHALL print an error and exit with a non-zero code without making any API call
+
+#### Scenario: Upload failure
+- **WHEN** the upload fails (network error, engine returns 4xx/5xx other than 409)
+- **THEN** the client SHALL print the error and exit with a non-zero code
+
+## REMOVED Requirements
+
+### Requirement: Implicit note shorthand
+
+**Reason**: The implicit shorthand caused accidental note creation from mistyped commands. Any unrecognized multi-word input was silently ingested as a note. Replaced by the explicit `addnote` command.
+
+**Migration**: Replace `kb "note text"` with `kb addnote "note text"`. Replace `kb "note text" --tags foo` with `kb addnote "note text" --tags foo`.
@@ -0,0 +1,29 @@
+## 1. Create addnote command
+
+- [x] 1.1 Create `client/cmd/addnote.go` with `addnoteCmd` using `ExactArgs(1)`, `--tags` flag, and `RunE` calling `submitNote()`
+- [x] 1.2 Move `submitNote()` function from `client/cmd/add.go` to `client/cmd/addnote.go`
+
+## 2. Revert root command to standard Cobra behaviour
+
+- [x] 2.1 Remove `ArbitraryArgs`, custom `RunE` logic, and `--tags` flag from root command in `client/cmd/root.go`
+- [x] 2.2 Remove custom usage template and `isRootCmd` template function — let Cobra use its default template
+- [x] 2.3 Set root command to show help when called with no args (standard Cobra `RunE` returning `cmd.Help()` or nil)
+
+## 3. Update examples and help text
+
+- [x] 3.1 Update `client/cmd/examples.go` to show `kb addnote` syntax instead of `kb "text"` shorthand
+- [x] 3.2 Update root command `Long` description to remove reference to note shorthand
+
+## 4. Update tests
+
+- [x] 4.1 Remove implicit note shorthand tests from `client/cmd/root_test.go` (`TestRootCmd_SingleWordRejected`, `TestRootCmd_MultipleWordsNotRejected`)
+- [x] 4.2 Add test for `addnote` command (verify it wires up correctly, takes exactly one arg)
+- [x] 4.3 Add test that root command with unknown args returns an error (standard Cobra behaviour)
+- [x] 4.4 Verify `addfile` tests still pass (no changes expected)
+
+## 5. Build and verify
+
+- [x] 5.1 Run `go build` and verify all commands appear in `kb --help`
+- [x] 5.2 Run `go test ./...` and verify all tests pass
+- [x] 5.3 Verify `kb addnote --help` shows correct usage line and flags
+- [x] 5.4 Verify `kb addfile --help` is unchanged
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-04-02
@@ -0,0 +1,41 @@
+## Context
+
+The engine's `/api/v1/search` endpoint returns flat result objects:
+
+```json
+{
+  "chunk_id": 123,
+  "score": 0.031,
+  "text": "...",
+  "chunk_index": 3,
+  "chunk_metadata": {"page": 12, "section_header": "Installation"},
+  "title": "Git Admin Guide",
+  "doc_type": "pdf",
+  "source_path": "/home/user/docs/git-admin.pdf",
+  "created_at": "2026-03-15T10:30:00",
+  "tags": ["git", "admin"]
+}
+```
+
+The Go client's human-mode struct in `client/cmd/search.go` incorrectly expects a nested `document` object and top-level `page`/`section` fields. This causes all metadata to display as zero values.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Fix the search result struct to match the flat engine response
+- Extract `page` and `section_header` from `chunk_metadata` for human display
+- Maintain identical JSON output (already passes through raw response)
+
+**Non-Goals:**
+- Changing the engine API response format
+- Adding new display fields beyond what was originally intended
+
+## Decisions
+
+**Flatten the struct to match API response.** The result struct will have `Title`, `DocType`, `Tags` as top-level fields (matching `title`, `doc_type`, `tags` JSON keys). `ChunkMetadata` will be decoded as `map[string]interface{}` to extract `page` and `section_header` dynamically, since its contents vary by document type.
+
+**Why not a typed ChunkMetadata struct?** The metadata keys depend on the ingestion pipeline (PDFs have `page`, markdown has `section_header`, code may have others in future). A map is more resilient to engine-side additions.
+
+## Risks / Trade-offs
+
+- [Minimal risk] If the engine adds new top-level fields, the Go struct silently ignores them — this is existing behavior and acceptable for human-mode display.
@@ -0,0 +1,24 @@
+## Why
+
+The Go client's human-mode search output struct expects a nested `document` object and top-level `page`/`section` fields, but the engine API returns flat results with `title`, `doc_type`, `tags` at the result level and `page`/`section_header` inside `chunk_metadata`. This means human-mode display shows empty values for title, type, tags, page, and section.
+
+## What Changes
+
+- Fix the Go client search result struct to match the flat engine API response format
+- Extract `page` and `section_header` from the `chunk_metadata` map instead of expecting them as top-level fields
+- Human-mode output will correctly display document title, type, tags, page number, and section header
+
+## Capabilities
+
+### New Capabilities
+
+(none)
+
+### Modified Capabilities
+
+- `go-client`: Fix search result parsing to match actual engine API response shape
+
+## Impact
+
+- `client/cmd/search.go` — struct definition and display logic
+- No API changes, no breaking changes — this is a bug fix aligning the client with the existing API contract
@@ -0,0 +1,40 @@
+## MODIFIED Requirements
+
+### Requirement: Search command
+
+The client SHALL provide a `kb search <query>` command that sends the query to the engine and displays results.
+
+#### Scenario: Human-readable search output
+- **WHEN** the user runs `kb search "how to change oil"`
+- **THEN** the client SHALL POST to `/api/v1/search`, and display results in a human-readable format showing rank, score, document title, page/section, doc type, tags, and a text snippet
+- **THEN** the client SHALL parse search results as flat objects with top-level `title`, `doc_type`, `tags`, `score`, `text`, `chunk_index` fields
+- **THEN** the client SHALL extract `page` from `chunk_metadata` when present (PDF documents)
+- **THEN** the client SHALL extract `section_header` from `chunk_metadata` when present (markdown documents)
+
+#### Scenario: JSON search output
+- **WHEN** the user runs `kb search "query" --format json`
+- **THEN** the client SHALL output the raw JSON response from the engine
+
+#### Scenario: Search with filters
+- **WHEN** the user runs `kb search "brakes" --tags maintenance --type pdf --top 3`
+- **THEN** the client SHALL include the filters in the API request body
+
+#### Scenario: Search mode flags
+- **WHEN** the user runs `kb search "error" --fts-only`
+- **THEN** the client SHALL set `fts_only: true` in the request body
+
+#### Scenario: PDF result with page number
+- **WHEN** a search result has `chunk_metadata` containing `{"page": 12}`
+- **THEN** the human output SHALL display "Page 12" in the location line
+
+#### Scenario: Markdown result with section header
+- **WHEN** a search result has `chunk_metadata` containing `{"section_header": "Installation > Prerequisites"}`
+- **THEN** the human output SHALL display "Installation > Prerequisites" in the location line
+
+#### Scenario: Result with both page and section
+- **WHEN** a search result has `chunk_metadata` containing both `page` and `section_header`
+- **THEN** the human output SHALL display both separated by " / "
+
+#### Scenario: Result with no location metadata
+- **WHEN** a search result has empty `chunk_metadata` or no page/section keys
+- **THEN** the human output SHALL omit the location line entirely
@@ -0,0 +1,14 @@
+## 1. Fix search result struct
+
+- [x] 1.1 Replace nested `Document` struct with flat fields (`Title`, `DocType`, `Tags`) matching engine JSON keys
+- [x] 1.2 Add `ChunkMetadata map[string]interface{}` field to capture `chunk_metadata`
+
+## 2. Fix display logic
+
+- [x] 2.1 Update title/type/tags references in the display loop to use the new flat fields
+- [x] 2.2 Extract `page` from `ChunkMetadata` map (replacing top-level `Page` field)
+- [x] 2.3 Extract `section_header` from `ChunkMetadata` map (replacing top-level `Section` field)
+
+## 3. Verify
+
+- [x] 3.1 Build the client and verify it compiles cleanly
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-04-04
@@ -0,0 +1,194 @@
+## Context
+
+The engine API (`engine/kb/routes/`) provides single-document operations for delete (`DELETE /api/v1/documents/{id}`) and tag management (`PUT /api/v1/documents/{id}/tags`). The MCP server (`mcp/server.py`) wraps these and adds a "collection" abstraction via `collection:`-prefixed tags — ~70 lines of helpers and translation logic that only the MCP layer understands.
+
+The database is SQLite with WAL mode, FTS5 for full-text search, and sqlite-vec for embeddings. Foreign keys with `ON DELETE CASCADE` handle chunk cleanup when documents are deleted. Stored files on disk must be cleaned up separately.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Bulk delete, bulk tag add/remove, and bulk set-tags (replace) via engine API, MCP tools, and CLI
+- Filter-based selection: by tag, doc_type, ID list, and ID range
+- Safety threshold to prevent accidental mass operations
+- Audit trail via jobs table
+- Remove collection abstraction from MCP server
+
+**Non-Goals:**
+- Async/queued bulk operations (SQLite handles thousands of rows synchronously in <1s)
+- Bulk document retrieval or bulk note creation
+- Undo/recycle bin for bulk deletes
+- Adding collection concept to engine or CLI (collections are being removed, not moved)
+
+## Decisions
+
+### 1. Common selection filter for all bulk endpoints
+
+All three bulk endpoints accept the same selection body:
+
+```json
+{
+  "document_ids": [1, 5, 12],
+  "tags": ["agent:mybot", "draft"],
+  "doc_type": "note",
+  "from_id": 10,
+  "to_id": 50
+}
+```
+
+Filters combine with AND logic. At least one filter is required — the engine rejects requests with no selection criteria (400).
+
+**Selection SQL generation**: A shared helper in `database.py` builds the WHERE clause from the filter. The `tags` filter uses the same JOIN pattern as `list_documents` (all specified tags must match). The `document_ids` filter uses `IN (?)`. The `from_id`/`to_id` filter uses `id >= ? AND id <= ?`.
+
+**Alternative considered**: Separate endpoints per filter type. Rejected — combinable filters are more powerful and the SQL generation is straightforward.
+
+### 2. Safety threshold with configurable percentage
+
+Before executing, the engine counts matched documents and total documents. If `matched / total > threshold`, the request is rejected:
+
+```
+HTTP 409 Conflict
+{
+  "error": "safety_threshold_exceeded",
+  "message": "Operation would affect 750 of 1000 documents (75.0%). Exceeds safety threshold of 70%. Use force: true to proceed.",
+  "matched": 750,
+  "total": 1000,
+  "percent": 75.0,
+  "threshold": 70
+}
+```
+
+- Default threshold: 70% (env var `KB_BULK_SAFETY_PERCENT`, integer 0-100)
+- Override per-request: `"force": true` in the request body
+- Threshold of 0 effectively disables the safety check
+- CLI maps this to `--force` / `-f` flag
+
+The check is a SELECT COUNT before the operation — minimal overhead.
+
+**Alternative considered**: Dry-run mode (preview what would be affected, then confirm). Rejected — adds a two-step flow that doesn't help LLM callers (they'd just always confirm) and the safety threshold covers the dangerous case.
+
+### 3. Synchronous execution with audit logging
+
+Bulk operations execute synchronously and return a summary response:
+
+```json
+{
+  "job_id": 42,
+  "status": "done",
+  "matched": 750,
+  "succeeded": 748,
+  "failed": 2,
+  "errors": [
+    {"document_id": 42, "error": "file locked"},
+    {"document_id": 99, "error": "not found"}
+  ]
+}
+```
+
+A job record is created in the `jobs` table with a new `bulk_delete` / `bulk_tags` / `bulk_set_tags` status type. This requires extending the jobs table:
+
+- Add `job_type` column: `"ingest"` (default, for existing jobs) or `"bulk_delete"` / `"bulk_tags"` / `"bulk_set_tags"`
+- The job's `filename` field stores a JSON summary of the selection filter for auditability
+- `document_id` field stores the count of affected documents
+- `error` field stores JSON array of individual errors if any
+
+**Alternative considered**: Full async with job polling. Rejected — SQLite bulk operations are fast enough synchronously and async would require extra polling calls (defeating the purpose of reducing token usage).
+
+### 4. Bulk delete implementation
+
+For each matched document:
+1. Collect chunk IDs
+2. Delete embeddings from `chunks_vec`
+3. Delete the document row (cascades to chunks, document_tags)
+4. Delete stored file from disk
+
+This follows the same logic as the existing `delete_document` endpoint but batched in a single transaction (except file deletion, which happens after commit). If a file deletion fails, the document is still counted as succeeded (the DB record is gone) but a warning is logged.
+
+The operation processes documents within a single SQLite transaction for atomicity of the DB changes. File deletions happen post-commit and are best-effort.
+
+### 5. Bulk tags implementation
+
+Two distinct operations:
+
+**`POST /api/v1/bulk/tags`** — Add and/or remove tags:
+```json
+{
+  "add": ["reviewed", "approved"],
+  "remove": ["draft"],
+  ...selection filters...
+}
+```
+
+**`POST /api/v1/bulk/set-tags`** — Replace all tags:
+```json
+{
+  "tags": ["final", "approved"],
+  ...selection filters...
+}
+```
+
+The `set-tags` operation removes all existing tags from matched documents, then applies the new set. This is useful for cleaning up tag clutter or migrating tagging schemes.
+
+Both update `updated_at` on affected documents.
+
+### 6. Remove collection abstraction from MCP
+
+Remove from `mcp/server.py`:
+- Constants: `COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`
+- Functions: `_collection_tag`, `_strip_collection_tags`, `_process_document`, `_process_search_results`, `_ensure_exclusive_collection`
+- Tool: `kb_set_collection` (entire tool removed)
+- Parameters: `collection` from `kb_search`, `kb_addnote`, `kb_upload_start`
+
+The `_process_document` and `_process_search_results` calls in remaining tools are removed — documents are returned as-is from the engine, with all tags visible.
+
+Users/agents that need namespace isolation use a tag convention (e.g. `agent:claude-code`) communicated via system prompt or tool instructions.
+
+### 7. Engine bulk route module
+
+New file: `engine/kb/routes/bulk.py`
+
+Three endpoints sharing common infrastructure:
+- `_resolve_selection(conn, filters)` → list of document IDs + count
+- `_check_safety_threshold(matched, total, force)` → raises HTTPException if exceeded
+- `_log_bulk_job(conn, job_type, filters, matched, succeeded, failed, errors)` → job_id
+
+### 8. MCP bulk tools
+
+Three new tools in `mcp/server.py`, thin wrappers calling new `engine.py` methods:
+
+- `kb_bulk_delete(document_ids?, tags?, doc_type?, from_id?, to_id?, force?)` → str (JSON)
+- `kb_bulk_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, add?, remove?, force?)` → str (JSON)
+- `kb_bulk_set_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, new_tags?, force?)` → str (JSON)
+
+Note: The `tags` parameter on bulk tools serves as a **selection filter** (which documents to target), while `add`/`remove` (on bulk_tags) and `new_tags` (on bulk_set_tags) are the **operation** (what to do to the tags). Tool descriptions must make this distinction clear.
+
+### 9. CLI bulk commands
+
+Three new commands under `client/cmd/`:
+
+```
+kb bulk-remove --tags "draft,old" --type note --force --yes
+kb bulk-tag --tags "agent:mybot" --add "reviewed" --remove "pending" --yes
+kb bulk-set-tags --ids "1,5,12" --tags "clean,final" --yes
+```
+
+Filter flags (shared): `--tags`, `--type`, `--ids` (comma-separated), `--from-id`, `--to-id`, `--force`
+Confirmation: `--yes` / `-y` to skip interactive prompt.
+
+Without `--yes`, the CLI first shows the match count and asks for confirmation:
+
+```
+This will delete 47 documents matching: tags=[draft,old] type=note
+Proceed? [y/N]
+```
+
+### 10. Engine config for safety threshold
+
+New env var: `KB_BULK_SAFETY_PERCENT` (integer, default 70). Added to `engine/kb/config.py`.
+
+## Risks / Trade-offs
+
+- **[Bulk delete is irreversible]** → Safety threshold mitigates accidental mass deletion. CLI requires interactive confirmation. No undo mechanism — this is deliberate to keep the system simple.
+- **[Naming collision: `tags` as filter vs operation]** → The `tags` parameter in bulk_tags selects documents, while `add`/`remove` specifies the tag changes. Clear naming and tool descriptions mitigate confusion. Engine request model uses the same field name as the existing list/search filter.
+- **[SQLite lock during large bulk ops]** → A single transaction deleting 5000 documents will hold a write lock. With WAL mode, readers are not blocked. The lock duration should be under a few seconds for typical workloads.
+- **[Breaking change: collection removal]** → Any MCP client relying on `collection` parameters will break. Since collections were only recently added and are not widely deployed, this is acceptable. Existing `collection:*` tags in the database remain as regular tags — they still work as filters, just without special treatment.
+- **[Jobs table overload]** → Bulk operations add a new job type to a table designed for ingestion jobs. The schema change is minimal (one new column) and the audit trail value outweighs the mixing of concerns.
@@ -0,0 +1,91 @@
+## Why
+
+Bulk operations on documents (delete, tag, retag) currently require one API/MCP call per document. When an LLM manages hundreds or thousands of documents, this means hundreds of tool calls — burning tokens, adding latency, and creating fragile multi-step flows that can fail partway through.
+
+Additionally, the "collection" abstraction in the MCP server adds complexity without real benefit. Collections are implemented as `collection:`-prefixed tags, but this convention is only enforced in the MCP layer — the CLI and engine don't know about it. This creates inconsistency and extra code. Tags alone, with a naming convention communicated via system prompt or configuration, achieve the same namespace isolation more simply and uniformly.
+
+## What Changes
+
+### 1. Remove collections from MCP server
+
+Strip all collection logic from `mcp/server.py`:
+- Remove `COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`, and all collection helper functions
+- Remove `collection` parameter from `kb_search`, `kb_addnote`, `kb_upload_start`
+- Remove `kb_set_collection` tool entirely
+- Remove `_process_document` / `_process_search_results` collection-tag stripping
+- Update MCP server instructions to explain tag-based namespace convention
+
+### 2. Add bulk engine endpoints
+
+Three new endpoints in the engine API:
+
+- **POST /api/v1/bulk/delete** — Delete multiple documents matching a filter
+- **POST /api/v1/bulk/tags** — Add/remove tags on multiple documents matching a filter
+- **POST /api/v1/bulk/set-tags** — Replace all tags on multiple documents matching a filter
+
+All accept a common **selection filter** (combinable with AND logic):
+- `document_ids` — explicit list of IDs
+- `tags` — documents matching ALL specified tags
+- `doc_type` — documents of this type
+- `from_id` / `to_id` — ID range (inclusive)
+
+At least one selection criterion is required.
+
+**Safety threshold**: If the operation would affect more than N% of all documents (default 70%, configurable via `KB_BULK_SAFETY_PERCENT` env var), the request is rejected with a 409 response showing what would be affected. The caller must re-send with `force: true` to proceed.
+
+**Response model**: Synchronous execution with summary response. The operation is logged to the jobs table for audit trail:
+
+```json
+{
+  "job_id": 42,
+  "status": "done",
+  "matched": 750,
+  "succeeded": 748,
+  "failed": 2,
+  "errors": [
+    {"document_id": 42, "error": "file locked"},
+    {"document_id": 99, "error": "not found"}
+  ]
+}
+```
+
+### 3. Add bulk MCP tools
+
+Expose the bulk engine endpoints as MCP tools:
+- `kb_bulk_delete` — bulk delete with filter selection
+- `kb_bulk_tags` — bulk add/remove tags with filter selection
+- `kb_bulk_set_tags` — bulk replace tags with filter selection
+
+These are thin wrappers around the engine bulk endpoints — no collection translation, no special logic.
+
+### 4. Add bulk CLI commands
+
+- `kb bulk-remove` — bulk delete with `--tags`, `--type`, `--ids`, `--from-id`, `--to-id`, `--force` flags
+- `kb bulk-tag` — bulk tag/untag with `--add`, `--remove`, and the same filter flags
+- `kb bulk-set-tags` — bulk replace tags with `--tags` (new tags) and the same filter flags
+
+All show a confirmation prompt with match count before executing (unless `--yes`).
+
+## Capabilities
+
+### New Capabilities
+
+- `bulk-operations`: Engine endpoints, MCP tools, and CLI commands for bulk delete, tag, and set-tags operations with filter-based selection and safety threshold.
+
+### Modified Capabilities
+
+- `mcp-document-management`: Remove `kb_set_collection` tool. Remove `collection` parameter from all tools.
+
+### Removed Capabilities
+
+- `mcp-collections`: The collection abstraction (collection helpers, collection parameters, collection tag stripping) is removed from the MCP server entirely.
+
+## Impact
+
+- **Engine API** (`engine/kb/routes/`): New `bulk.py` route module with 3 endpoints. New `bulk` job type in jobs table.
+- **Engine database** (`engine/kb/database.py`): Helper functions for bulk selection queries and bulk delete/tag operations.
+- **MCP server** (`mcp/server.py`): Remove ~70 lines of collection logic. Add 3 bulk tool definitions. Remove `collection` param from `kb_search`, `kb_addnote`, `kb_upload_start`. Remove `kb_set_collection`.
+- **MCP engine client** (`mcp/engine.py`): Add bulk operation methods. Remove no longer needed code.
+- **CLI** (`client/cmd/`): New `bulk_remove.go`, `bulk_tag.go`, `bulk_set_tags.go` command files.
+- **CLI API client** (`client/internal/api/`): Add `Post` with JSON body support if not present.
+- **Breaking changes**: `kb_set_collection` MCP tool removed. `collection` parameter removed from `kb_search`, `kb_addnote`, `kb_upload_start` MCP tools. Any MCP clients using collections will need to switch to tags.
@@ -0,0 +1,230 @@
+## ADDED Requirements
+
+### Requirement: Common selection filter
+
+All bulk engine endpoints SHALL accept a JSON body with the following optional selection fields, combined with AND logic:
+
+- `document_ids` (list of int) — match documents with these specific IDs
+- `tags` (list of str) — match documents that have ALL specified tags
+- `doc_type` (str) — match documents with this document type
+- `from_id` (int) — match documents with id >= this value
+- `to_id` (int) — match documents with id <= this value
+
+At least one selection field MUST be present. If no selection fields are provided, the endpoint SHALL return 400 Bad Request.
+
+#### Scenario: Filter by tags and doc_type
+
+- **WHEN** a bulk endpoint receives `{"tags": ["draft"], "doc_type": "note"}`
+- **THEN** it SHALL match only documents that have the tag "draft" AND have doc_type "note"
+
+#### Scenario: Filter by ID range
+
+- **WHEN** a bulk endpoint receives `{"from_id": 10, "to_id": 50}`
+- **THEN** it SHALL match documents with id >= 10 AND id <= 50
+
+#### Scenario: Filter by explicit IDs
+
+- **WHEN** a bulk endpoint receives `{"document_ids": [1, 5, 12]}`
+- **THEN** it SHALL match only documents with those specific IDs
+
+#### Scenario: Combined filters
+
+- **WHEN** a bulk endpoint receives `{"tags": ["agent:mybot"], "doc_type": "note", "from_id": 100}`
+- **THEN** it SHALL match documents satisfying ALL three criteria
+
+#### Scenario: No selection fields provided
+
+- **WHEN** a bulk endpoint receives `{}` or `{"force": true}` with no selection fields
+- **THEN** it SHALL return 400 Bad Request
+
+### Requirement: Safety threshold
+
+All bulk endpoints SHALL enforce a safety threshold. Before executing, the engine SHALL count the matched documents and the total documents in the database. If `matched / total * 100` exceeds the configured threshold, the request SHALL be rejected with 409 Conflict.
+
+The response SHALL include: `error` ("safety_threshold_exceeded"), `message` (human-readable), `matched` (int), `total` (int), `percent` (float), and `threshold` (int).
+
+The threshold SHALL default to 70 and be configurable via the `KB_BULK_SAFETY_PERCENT` environment variable (integer 0-100). A value of 0 disables the check.
+
+The caller MAY override the threshold by including `"force": true` in the request body.
+
+#### Scenario: Threshold exceeded
+
+- **GIVEN** 1000 total documents and `KB_BULK_SAFETY_PERCENT` is 70
+- **WHEN** a bulk endpoint matches 750 documents (75%) without `force: true`
+- **THEN** it SHALL return 409 with `matched: 750`, `total: 1000`, `percent: 75.0`, `threshold: 70`
+
+#### Scenario: Threshold not exceeded
+
+- **GIVEN** 1000 total documents and `KB_BULK_SAFETY_PERCENT` is 70
+- **WHEN** a bulk endpoint matches 500 documents (50%) without `force: true`
+- **THEN** the operation SHALL proceed normally
+
+#### Scenario: Force override
+
+- **GIVEN** 1000 total documents and a match of 900 (90%)
+- **WHEN** the request includes `"force": true`
+- **THEN** the operation SHALL proceed regardless of threshold
+
+#### Scenario: Zero threshold
+
+- **GIVEN** `KB_BULK_SAFETY_PERCENT` is 0
+- **THEN** the safety check SHALL be effectively disabled for all operations
+
+### Requirement: Synchronous response with audit log
+
+All bulk endpoints SHALL execute synchronously and return a JSON response with:
+
+- `job_id` (int) — ID of the audit log entry in the jobs table
+- `status` (str) — "done" or "partial_failure"
+- `matched` (int) — number of documents that matched the selection
+- `succeeded` (int) — number of documents successfully processed
+- `failed` (int) — number of documents that failed
+- `errors` (list) — array of `{"document_id": int, "error": str}` for each failure (empty on full success)
+
+A job record SHALL be created in the jobs table with `job_type` set to the operation type. The `filename` field SHALL store a JSON representation of the selection filter. The `error` field SHALL store a JSON array of individual errors if any occurred.
+
+#### Scenario: Full success
+
+- **WHEN** a bulk operation matches 50 documents and all succeed
+- **THEN** the response SHALL have `status: "done"`, `matched: 50`, `succeeded: 50`, `failed: 0`, `errors: []`
+
+#### Scenario: Partial failure
+
+- **WHEN** a bulk operation matches 50 documents but 2 fail
+- **THEN** the response SHALL have `status: "partial_failure"`, `matched: 50`, `succeeded: 48`, `failed: 2`, and `errors` listing the 2 failures
+
+### Requirement: Bulk delete endpoint
+
+The engine SHALL expose `POST /api/v1/bulk/delete` which permanently deletes all documents matching the selection filter. For each matched document, it SHALL delete embeddings from `chunks_vec`, delete the document row (cascading to chunks and document_tags), and delete any stored file from disk.
+
+Database deletions SHALL be performed within a single transaction. File deletions SHALL occur after the transaction commits and SHALL be best-effort (failures logged but not counted as document failures).
+
+#### Scenario: Bulk delete by tag
+
+- **WHEN** `POST /api/v1/bulk/delete` receives `{"tags": ["old", "draft"]}`
+- **THEN** all documents with both tags "old" and "draft" SHALL be deleted
+- **AND** their chunks, embeddings, tag associations, and stored files SHALL be removed
+
+#### Scenario: Bulk delete with no matches
+
+- **WHEN** `POST /api/v1/bulk/delete` receives a filter that matches 0 documents
+- **THEN** the response SHALL have `matched: 0`, `succeeded: 0`, `failed: 0`
+
+### Requirement: Bulk tags endpoint
+
+The engine SHALL expose `POST /api/v1/bulk/tags` which adds and/or removes tags on all documents matching the selection filter. The request body SHALL include the selection filter plus:
+
+- `add` (list of str, optional) — tags to add
+- `remove` (list of str, optional) — tags to remove
+
+At least one of `add` or `remove` MUST be present. The endpoint SHALL return 400 if neither is provided.
+
+The endpoint SHALL update `updated_at` on all affected documents.
+
+#### Scenario: Add and remove tags in one call
+
+- **WHEN** `POST /api/v1/bulk/tags` receives `{"tags": ["agent:mybot"], "add": ["reviewed"], "remove": ["pending"]}`
+- **THEN** all documents tagged "agent:mybot" SHALL have "reviewed" added and "pending" removed
+
+### Requirement: Bulk set-tags endpoint
+
+The engine SHALL expose `POST /api/v1/bulk/set-tags` which replaces all tags on matched documents with a new set. The request body SHALL include the selection filter plus:
+
+- `new_tags` (list of str) — the replacement tag set
+
+The endpoint SHALL remove all existing tag associations from matched documents, then apply the new set. It SHALL update `updated_at` on all affected documents.
+
+#### Scenario: Replace all tags
+
+- **WHEN** `POST /api/v1/bulk/set-tags` receives `{"doc_type": "note", "new_tags": ["clean", "final"]}`
+- **THEN** all notes SHALL have their existing tags removed and replaced with "clean" and "final"
+
+### Requirement: Jobs table extension
+
+The jobs table SHALL be extended with a `job_type` column (TEXT, default "ingest") to distinguish ingestion jobs from bulk operation audit entries. Valid values: "ingest", "bulk_delete", "bulk_tags", "bulk_set_tags".
+
+Existing jobs SHALL default to `job_type = "ingest"`. The existing jobs list endpoint and CLI `kb jobs` command SHALL continue to work unchanged.
+
+#### Scenario: Migration adds column
+
+- **GIVEN** an existing database without the `job_type` column
+- **WHEN** the engine starts
+- **THEN** the column SHALL be added with default value "ingest"
+
+### Requirement: Engine config for safety threshold
+
+The engine `Config` class SHALL read `KB_BULK_SAFETY_PERCENT` from the environment as an integer (default 70, range 0-100). This value SHALL be used as the default safety threshold for all bulk endpoints.
+
+### Requirement: MCP bulk delete tool
+
+The MCP server SHALL expose a `kb_bulk_delete` tool with parameters: `document_ids` (optional list of int), `tags` (optional list of str), `doc_type` (optional str), `from_id` (optional int), `to_id` (optional int), `force` (optional bool).
+
+The tool SHALL call `POST /api/v1/bulk/delete` on the engine via the engine client and return the JSON response.
+
+The tool description SHALL clearly state that `tags` is a selection filter (which documents to delete), not tags to delete.
+
+#### Scenario: MCP bulk delete by tag
+
+- **WHEN** `kb_bulk_delete(tags=["old"])` is called
+- **THEN** the engine client SHALL send `POST /api/v1/bulk/delete` with `{"tags": ["old"]}`
+- **AND** the tool SHALL return the engine's JSON response
+
+### Requirement: MCP bulk tags tool
+
+The MCP server SHALL expose a `kb_bulk_tags` tool with parameters: `document_ids`, `tags`, `doc_type`, `from_id`, `to_id` (selection filters), plus `add` (optional list of str), `remove` (optional list of str), and `force` (optional bool).
+
+The tool description SHALL clearly distinguish `tags` (selection filter) from `add`/`remove` (tag changes to apply).
+
+#### Scenario: MCP bulk tag update
+
+- **WHEN** `kb_bulk_tags(tags=["agent:mybot"], add=["reviewed"], remove=["draft"])` is called
+- **THEN** the engine client SHALL send the appropriate `POST /api/v1/bulk/tags` request
+
+### Requirement: MCP bulk set-tags tool
+
+The MCP server SHALL expose a `kb_bulk_set_tags` tool with parameters: `document_ids`, `tags`, `doc_type`, `from_id`, `to_id` (selection filters), plus `new_tags` (list of str) and `force` (optional bool).
+
+#### Scenario: MCP bulk set tags
+
+- **WHEN** `kb_bulk_set_tags(doc_type="note", new_tags=["clean"])` is called
+- **THEN** the engine client SHALL send `POST /api/v1/bulk/set-tags` with `{"doc_type": "note", "new_tags": ["clean"]}`
+
+### Requirement: MCP engine client bulk methods
+
+The MCP engine client (`mcp/engine.py`) SHALL provide three new methods:
+
+- `bulk_delete(document_ids?, tags?, doc_type?, from_id?, to_id?, force?)` → dict
+- `bulk_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, add?, remove?, force?)` → dict
+- `bulk_set_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, new_tags?, force?)` → dict
+
+Each SHALL send a POST request to the corresponding `/api/v1/bulk/*` endpoint with the parameters as a JSON body. Each SHALL raise on non-2xx status codes, consistent with existing methods.
+
+### Requirement: CLI bulk-remove command
+
+The CLI SHALL expose a `kb bulk-remove` command with flags: `--tags` (comma-separated), `--type`, `--ids` (comma-separated), `--from-id`, `--to-id`, `--force`/`-f`, `--yes`/`-y`.
+
+Without `--yes`, the CLI SHALL first display the match count and ask for interactive confirmation before proceeding.
+
+The command SHALL call `POST /api/v1/bulk/delete` with the constructed filter.
+
+#### Scenario: CLI bulk remove with confirmation
+
+- **WHEN** `kb bulk-remove --tags "draft,old" --type note` is run without `--yes`
+- **THEN** the CLI SHALL display "This will delete N documents matching: tags=[draft,old] type=note" and prompt "Proceed? [y/N]"
+
+#### Scenario: CLI bulk remove with --yes
+
+- **WHEN** `kb bulk-remove --tags "draft" --yes` is run
+- **THEN** the CLI SHALL proceed without prompting
+
+### Requirement: CLI bulk-tag command
+
+The CLI SHALL expose a `kb bulk-tag` command with the same filter flags as `bulk-remove`, plus `--add` and `--remove` (comma-separated tag lists).
+
+The command SHALL call `POST /api/v1/bulk/tags` with the constructed filter and tag changes.
+
+### Requirement: CLI bulk-set-tags command
+
+The CLI SHALL expose a `kb bulk-set-tags` command with the filter flags, plus `--set` (comma-separated list of replacement tags).
+
+The command SHALL call `POST /api/v1/bulk/set-tags` with the constructed filter and `new_tags`.
@@ -0,0 +1,55 @@
+## REMOVED Requirements
+
+### Requirement: Collection abstraction in MCP server
+
+The MCP server SHALL NOT maintain any collection abstraction. The following SHALL be removed:
+
+- Constants: `COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`
+- Functions: `_collection_tag`, `_strip_collection_tags`, `_process_document`, `_process_search_results`, `_ensure_exclusive_collection`
+- Tool: `kb_set_collection` (entire tool)
+- Parameters: `collection` from `kb_search`, `kb_addnote`, `kb_upload_start`
+
+Documents SHALL be returned as-is from the engine with all tags visible. No tag stripping or collection field injection SHALL occur.
+
+#### Scenario: Search results show all tags
+
+- **WHEN** `kb_search` is called and a result has tags `["agent:mybot", "collection:documents", "draft"]`
+- **THEN** all three tags SHALL be returned as-is — no stripping of `collection:*` tags
+
+#### Scenario: kb_set_collection no longer exists
+
+- **WHEN** an MCP client attempts to call `kb_set_collection`
+- **THEN** the tool SHALL not be found (removed)
+
+## MODIFIED Requirements
+
+### Requirement: kb_search without collection parameter
+
+The `kb_search` MCP tool SHALL accept `tags` (optional list of str) for filtering but SHALL NOT accept a `collection` parameter. Callers that previously used `collection="memory"` SHALL instead use `tags=["collection:memory"]` or whatever tag convention they prefer.
+
+#### Scenario: Filter by tag instead of collection
+
+- **WHEN** `kb_search(query="test", tags=["agent:mybot"])` is called
+- **THEN** results SHALL be filtered to documents tagged "agent:mybot"
+- **AND** no collection field SHALL be present in the response
+
+### Requirement: kb_addnote without collection parameter
+
+The `kb_addnote` MCP tool SHALL accept `tags` (optional list of str) but SHALL NOT accept a `collection` parameter. The tool SHALL NOT automatically apply any default collection tag — only explicitly provided tags are applied.
+
+#### Scenario: Add note with explicit tags
+
+- **WHEN** `kb_addnote(text="hello", tags=["agent:mybot", "memory"])` is called
+- **THEN** the note SHALL be created with exactly those two tags — no `collection:documents` tag added
+
+### Requirement: kb_upload_start without collection parameter
+
+The `kb_upload_start` MCP tool SHALL accept `tags` (optional list of str) but SHALL NOT accept a `collection` parameter. The tool SHALL NOT automatically apply any default collection tag.
+
+### Requirement: kb_update_note without collection processing
+
+The `kb_update_note` MCP tool SHALL return the document as-is from the engine without passing it through `_process_document`. All tags SHALL be visible in the response.
+
+### Requirement: kb_get without collection processing
+
+The `kb_get` MCP tool SHALL return documents as-is from the engine without passing through `_process_document`. All tags SHALL be visible in the response. No `collection` field SHALL be injected.
@@ -0,0 +1,45 @@
+## 1. Remove collections from MCP server
+
+- [x] 1.1 Remove collection constants and helper functions from `mcp/server.py` (`COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`, `_collection_tag`, `_strip_collection_tags`, `_process_document`, `_process_search_results`, `_ensure_exclusive_collection`)
+- [x] 1.2 Remove `collection` parameter from `kb_search`, `kb_addnote`, `kb_upload_start` tools
+- [x] 1.3 Remove `kb_set_collection` tool entirely
+- [x] 1.4 Remove `_process_document` / `_process_search_results` calls from `kb_get`, `kb_update_note`, `kb_search`
+- [x] 1.5 Update MCP server instructions text to reflect tags-only approach
+
+## 2. Engine bulk infrastructure
+
+- [x] 2.1 Add `bulk_safety_percent` to `Config` class in `engine/kb/config.py` (env var `KB_BULK_SAFETY_PERCENT`, default 70)
+- [x] 2.2 Add `job_type` column migration to `database.py` `init_schema` (TEXT, default "ingest")
+- [x] 2.3 Add `resolve_bulk_selection(conn, document_ids, tags, doc_type, from_id, to_id)` helper to `database.py` — returns list of matching document IDs
+- [x] 2.4 Add `create_bulk_job(conn, job_type, filters_json, matched, succeeded, failed, errors_json)` helper to `database.py`
+
+## 3. Engine bulk endpoints
+
+- [x] 3.1 Create `engine/kb/routes/bulk.py` with shared Pydantic request model (`BulkSelectionRequest` with selection fields + `force` bool)
+- [x] 3.2 Add `_check_safety_threshold` helper that returns 409 if threshold exceeded
+- [x] 3.3 Implement `POST /api/v1/bulk/delete` — resolve selection, check threshold, delete documents in transaction, clean up files, log job, return summary
+- [x] 3.4 Implement `POST /api/v1/bulk/tags` — resolve selection, check threshold, add/remove tags on matched docs, log job, return summary
+- [x] 3.5 Implement `POST /api/v1/bulk/set-tags` — resolve selection, check threshold, clear and replace tags on matched docs, log job, return summary
+- [x] 3.6 Import bulk routes in engine app startup (add to `engine/kb/routes/__init__.py` or `main.py`)
+
+## 4. MCP bulk tools
+
+- [x] 4.1 Add `bulk_delete`, `bulk_tags`, `bulk_set_tags` methods to `mcp/engine.py`
+- [x] 4.2 Add `kb_bulk_delete` tool to `mcp/server.py`
+- [x] 4.3 Add `kb_bulk_tags` tool to `mcp/server.py`
+- [x] 4.4 Add `kb_bulk_set_tags` tool to `mcp/server.py`
+
+## 5. CLI bulk commands
+
+- [x] 5.1 Create `client/cmd/bulk_remove.go` — `kb bulk-remove` with filter flags, confirmation prompt, JSON output support
+- [x] 5.2 Create `client/cmd/bulk_tag.go` — `kb bulk-tag` with filter flags + `--add`/`--remove`, confirmation prompt
+- [x] 5.3 Create `client/cmd/bulk_set_tags.go` — `kb bulk-set-tags` with filter flags + `--set`, confirmation prompt
+
+## 6. Verification
+
+- [x] 6.1 Test collection removal: verify `kb_search`, `kb_addnote`, `kb_get`, `kb_update_note`, `kb_upload_start` work without collection params
+- [x] 6.2 Test bulk delete via engine API: filter by tags, by IDs, by range, safety threshold trigger and force override
+- [x] 6.3 Test bulk tags and bulk set-tags via engine API
+- [x] 6.4 Test MCP bulk tools against running engine
+- [x] 6.5 Test CLI bulk commands against running engine
+- [x] 6.6 Test audit trail: verify bulk jobs appear in `kb jobs` output
@@ -0,0 +1,145 @@
+## Context
+
+kb v2 is a client-server knowledge base: a Python FastAPI engine (SQLite + FTS5 + sqlite-vec, sentence-transformers embeddings) serving a Go CLI client over HTTP. Agent integration currently works via a Claude Code skill that shells out to the Go binary and parses JSON output.
+
+The engine runs in Docker (NVIDIA/ROCm/CPU variants), keeps the embedding model warm in memory, and handles async ingestion via a background worker. The data model has documents, chunks, embeddings, tags, and jobs — but no concept of collections or note mutation.
+
+This design covers three changes: adding an MCP server as a new integration surface, adding collection-scoped search via tag conventions, and adding in-place note updates.
+
+## Goals / Non-Goals
+
+**Goals:**
+
+- Expose kb as native MCP tools so agents interact with it directly, not via shell subprocess
+- Separate agent memory from user documents via collection tags
+- Allow notes to be updated in place, preserving document identity
+- Support file upload from remote agents via the MCP server
+- Keep the engine fully local — no cloud API dependencies
+- Maintain backward compatibility: existing CLI, API, and data all continue to work
+
+**Non-Goals:**
+
+- Query expansion or LLM reranking inside the engine (agent-side responsibility)
+- File-watching / inotify for auto-reindexing (useful but separate concern)
+- Collection-level access control or permissions
+- New schema columns for collections (use existing tags)
+- Stdio MCP transport (Streamable HTTP only)
+
+## Decisions
+
+### D1 — MCP server as a separate container, Streamable HTTP transport, with its own auth
+
+The MCP server runs as its own Docker container alongside the engine, exposed via Streamable HTTP. It is not embedded into the FastAPI engine app. It requires its own Bearer token (`KB_MCP_API_KEY`) from calling agents.
+
+**Why:** The engine and MCP server have different concerns — the engine manages embeddings, search, and ingestion; the MCP server translates MCP protocol to engine API calls. Keeping them separate means either can be updated independently. Both run as long-lived containers in a Docker Compose stack.
+
+Streamable HTTP (not stdio) because the MCP server is a network service that remote agents connect to, not a subprocess spawned by a local agent. This matches the deployment model: engine + MCP server run on an infrastructure host, agents connect over the network.
+
+The MCP server must have its own authentication because it is HTTP-exposed. Without it, anyone who discovers the endpoint has a direct pipe to the engine via `KB_API_KEY`. The MCP server validates the agent's Bearer token (`KB_MCP_API_KEY`) before proxying requests to the engine.
+
+**Alternative considered:** Embedding MCP into the FastAPI app as additional routes. Rejected — it couples the MCP SDK lifecycle to the engine, and the engine shouldn't need to know about MCP protocol details. Also considered stdio transport, rejected because it requires the agent and MCP server to share a host. Also considered relying solely on the engine's `KB_API_KEY` for auth. Rejected — the MCP server is a separate network surface and must authenticate its own callers.
+
+**Implementation:** Separate Python package/directory (`mcp/` at repo root). Uses the `mcp` Python SDK with Streamable HTTP transport. Reads engine URL and engine API key from environment variables (`KB_ENGINE_URL`, `KB_API_KEY`). Reads its own auth token from `KB_MCP_API_KEY`. Makes HTTP calls to the engine using `httpx`. Docker Compose file adds the MCP server as a service alongside the engine.
+
+### D2 — Collections via tag conventions, with MCP-enforced exclusivity
+
+Collections are implemented using the existing tag system with a naming convention: `collection:documents`, `collection:memory`, `collection:workspace`.
+
+**Why:** Tags already exist, already filter search, and are already mutable via the API. A dedicated `collection` column would add a schema migration, new API parameters, and new CLI flags — all duplicating what tags can do.
+
+**Exclusive membership:** The MCP server enforces one collection per document. When adding a document to a collection, the MCP server first removes any existing `collection:*` tags via the engine's tag API, then applies the new one. This prevents a document from appearing in multiple collections and keeps search results clean.
+
+**Tag stripping in MCP responses:** The MCP server strips `collection:*` tags from the `tags` array in search results and presents the collection as a separate `collection` field. Agents see a clean interface: `{"collection": "memory", "tags": ["feedback", "email"]}` rather than raw `collection:memory` mixed in with user tags.
+
+**Implementation:** The MCP tools accept a `collection` parameter (e.g. `"memory"`). The MCP server translates this to tag operations:
+
+- On search: adds `collection:<name>` to the tag filter
+- On addnote/addfile: removes any existing `collection:*` tags, then applies `collection:<name>`
+- On results: strips `collection:*` from tags, adds a `collection` field
+
+The engine is unchanged. The Go CLI can use the same convention manually via `--tags collection:memory`.
+
+**Convention:** `collection:documents` is the default. Standard names: `documents`, `memory`, `workspace`. The MCP tool descriptions document these.
+
+### D3 — Note mutation via dedicated PATCH endpoint, with full chunking support
+
+Note updates go through a new synchronous `PATCH /api/v1/notes/{id}` endpoint, not through the async job queue. The endpoint uses the same chunking logic as the ingestion pipeline, not a hardcoded single-chunk assumption.
+
+**Why:** Most notes are short and produce a single chunk. But if an agent updates a note with text that exceeds the embedding model's token window (~256 tokens for MiniLM), a single-chunk approach would silently embed only a portion of the text. Using the standard note chunking pipeline (which today produces one chunk for typical notes) means the endpoint naturally handles longer notes without silent data loss.
+
+**Alternative considered:** Truncating long notes and returning a warning. Rejected — silent data loss or warnings that the agent might ignore are worse than just doing the right thing. Also considered reusing the job queue for consistency. Rejected — the queue's value is async processing of heavy workloads. Notes don't need it.
+
+**Implementation:** The PATCH endpoint:
+
+1. Validates the document exists and is `doc_type = 'note'`
+2. Deletes existing chunks, FTS entries, and vector embeddings for that document
+3. Runs the new text through the note chunking pipeline (same as ingestion)
+4. Embeds each chunk and inserts into chunks_vec
+5. Updates the document's `content_hash` and `updated_at`
+6. Returns the updated document
+
+All within a single transaction. FTS5 triggers keep the full-text index in sync automatically (existing `chunks_au` and `chunks_ad` triggers handle this). If embedding fails, the transaction rolls back and the old note is preserved.
+
+### D4 — `updated_at` column on documents, set only on mutation
+
+A new `updated_at TEXT` column on `documents`, initially NULL for all existing documents. Set to `current_timestamp` only when a document is modified (note update, tag change).
+
+**Why:** Distinguishes "created" from "last modified". The agent memory use case needs to know when a memory was last updated, not just when it was first created. NULL means "never updated" — cleaner than duplicating `created_at`.
+
+**Date sorting:** Any query that sorts or filters by "most recent" must use `COALESCE(updated_at, created_at)` to ensure un-mutated documents don't disappear from recent lists. This applies to the documents list endpoint and any future "recent" views.
+
+### D5 — File upload via chunked base64, proxied to engine's existing upload API
+
+The MCP server supports file uploads from remote agents using a three-step chunked upload pattern:
+
+1. `kb_upload_start(filename, total_size, tags, collection)` — creates a temporary staging entry on the MCP server, returns a server-generated UUID `upload_id`
+2. `kb_upload_chunk(upload_id, data, chunk_index)` — appends a base64-encoded chunk to the staging entry. Called N times.
+3. `kb_upload_finish(upload_id)` — reassembles chunks, decodes from base64, and forwards the complete file as a multipart upload to the engine's existing `POST /api/v1/jobs` endpoint. Returns the job ID.
+
+**Why:** The MCP server is remote from the calling agent, so file paths are meaningless. The agent reads the file locally, splits it into chunks, base64-encodes each chunk, and sends them as individual tool calls. No single MCP message needs to carry the entire file, avoiding message size limits regardless of file size.
+
+The engine's existing upload pipeline handles everything from there: staging, type detection, chunking, embedding. No new engine code needed for file transfer.
+
+**Alternative considered:** Single-message base64 upload (`kb_addfile` with full file content). Rejected — works for small files but hits practical MCP message size limits on larger PDFs. Also considered a separate file transfer service (SFTP container). Rejected — adds operational complexity for no benefit over the chunked approach. Also considered a plain HTTP upload endpoint on the MCP server. Rejected — adds a second protocol surface the agent needs to interact with. Also considered a single-call shortcut for small files. Rejected — one path for all files is simpler for agents to learn, and the overhead of 3 calls vs 1 is negligible for an LLM.
+
+**Upload ID:** Server-generated UUID, returned by `kb_upload_start`. Prevents collision and is unpredictable (important since the MCP server is network-exposed).
+
+**Chunk size:** Recommended 1MB raw (before base64 encoding, ~1.33MB encoded) per chunk. A 10MB PDF = ~10 tool calls. The MCP server holds chunks in a temporary directory, cleans up on finish or after a timeout (e.g. 10 minutes for abandoned uploads).
+
+**Staging cleanup:** The MCP server tracks active uploads in memory. Chunks are written to a temporary directory. On `kb_upload_finish`, chunks are assembled and forwarded. On timeout or error, the temporary files are cleaned up. No persistent state needed — abandoned uploads are simply garbage collected. The temp directory does not need to survive container restarts; if the MCP server restarts mid-upload, the agent retries from `kb_upload_start`.
+
+### D6 — MCP tool descriptions include agent-side search patterns
+
+The MCP tool descriptions for `kb_search` include guidance on query expansion and reranking as documented patterns, not as engine parameters.
+
+**Why:** The calling agent has an LLM. Expanding queries (call search N times with variant phrasings, merge results) and reranking (read top results, reorder by relevance) are better done in the agent's context. This keeps the engine deterministic and local.
+
+**Implementation:** The `kb_search` tool description includes a note like: *"For complex queries, consider expanding into 2-3 variant phrasings and calling this tool multiple times, then deduplicating results by chunk_id. For precision, rerank the returned results using your own judgement."*
+
+### D7 — Version bump to 3.0.0 for both engine and client
+
+Engine and client both bump to v3.0.0. MIN_ENGINE_VERSION updates to v3.0.0.
+
+**Why:** The `updated_at` column is a schema addition and the new `PATCH /api/v1/notes/{id}` endpoint is a new API surface. The new client command (`updatenote`) requires the new engine. A major version bump signals this clearly. The clean break is worth it given the MCP server is a new integration paradigm.
+
+## Risks / Trade-offs
+
+**MCP SDK maturity** — The `mcp` Python SDK is relatively new. Breaking changes in the SDK could require MCP server updates. Mitigation: the MCP server is a thin adapter, so updating it is low cost. Pin the SDK version.
+
+**Tag convention enforcement** — Collection tags are a convention, not a constraint at the engine level. Typos create new collections silently (e.g. `collection:memeory`). Mitigation: the MCP server enforces exclusivity (removes old `collection:*` tags before applying new) and validates collection names against a known list. The Go CLI does not enforce this — it's a convention for manual users. Direct engine API users can still create arbitrary tags.
+
+**Note mutation with long text** — The PATCH endpoint uses the standard note chunking pipeline, so long notes are chunked correctly. However, a note that grows very large (thousands of tokens) will produce many chunks and embeddings, making the synchronous PATCH slower. Mitigation: for the agent memory use case, notes are typically short. If a note grows large enough for this to matter, the agent should consider splitting it into multiple notes.
+
+**Chunked upload complexity** — The three-step upload pattern (start/chunk/finish) is more complex than a single tool call. An agent must make N+2 calls to upload a file. Mitigation: the pattern is deterministic and easily scripted by agents. The MCP tool descriptions will include a clear usage example. Abandoned uploads (agent crashes mid-upload) are cleaned up by a timeout on the MCP server — no permanent state leaks.
+
+**MCP server as HTTP client** — The MCP server calls the engine over HTTP, adding a network hop. For a compose deployment (both containers on the same Docker network) this adds sub-millisecond latency per call. Acceptable.
+
+## Migration Plan
+
+1. **Engine schema migration** — runs automatically on startup (same pattern as existing migrations in `init_schema`):
+   - `ALTER TABLE documents ADD COLUMN updated_at TEXT`
+2. **New engine endpoint** — `PATCH /api/v1/notes/{id}` for note mutation
+3. **Engine version bump** — update `engine/VERSION` to `3.0.0`
+4. **Client updates** — new `updatenote` command, version bump to `3.0.0`, `MIN_ENGINE_VERSION` to `3.0.0`
+5. **MCP server** — new `mcp/` directory, Dockerfile, added to Docker Compose
+6. **Rollback** — the schema change is additive (one new column). Rolling back to v2 engine code works fine — v2 ignores `updated_at`. Rolling back the client is a binary swap. Removing the MCP server container has no effect on engine or CLI.
@@ -0,0 +1,81 @@
+## Why
+
+The kb engine exposes a well-structured REST API, but agent integration today goes through a Claude Code skill that shells out to the Go CLI binary, parses JSON output, and re-synthesises results. This works but is indirect: subprocess overhead on every call, fragile output parsing, no streaming, and no composability with other MCP tools in the same session. As agents increasingly rely on kb for both document retrieval and memory storage, this friction compounds.
+
+At the same time, there is no way to scope searches to "agent memory" vs "user documents" without careful manual tagging, and no way to update an existing note in place without delete + re-add. These gaps cause agents to accumulate stale duplicates and pollute the user's document index with internal memory notes.
+
+kb v3 adds an MCP server as a new integration surface alongside the existing CLI, establishes collection tag conventions for scoped search, and adds note mutation to support the agent memory use case natively.
+
+## What Changes
+
+### 1. MCP Server (new component)
+
+A Model Context Protocol server that exposes kb operations as native MCP tools. Runs as a separate Docker container alongside the engine, using Streamable HTTP transport. Translates MCP tool calls into engine HTTP API calls.
+
+**MCP tool surface:**
+
+| MCP Tool | Maps to Engine API | Notes |
+|---|---|---|
+| `kb_search` | `POST /api/v1/search` | Query, top_n, tags, doc_type, collection, mode |
+| `kb_addnote` | `POST /api/v1/jobs` | Body text, tags, collection (default: `documents`) |
+| `kb_upload_start` | _(MCP server internal)_ | Start chunked upload: filename, size, tags, collection → returns upload_id |
+| `kb_upload_chunk` | _(MCP server internal)_ | Append base64 chunk to staging: upload_id, data, chunk_index |
+| `kb_upload_finish` | `POST /api/v1/jobs` | Reassemble chunks, decode, proxy as multipart upload → returns job_id |
+| `kb_update_note` | `PATCH /api/v1/notes/{id}` | Replace note text, re-chunk and re-embed in place |
+| `kb_get` | `GET /api/v1/documents` | Retrieve by document ID or source_path |
+| `kb_status` | `GET /api/v1/status` | Index health, doc counts, model info, queue state |
+| `kb_jobs` | `GET /api/v1/jobs` | Check ingestion queue status |
+
+The `collection` parameter on search/addnote/addfile is translated by the MCP server into tag filters using the convention `collection:<name>` (e.g. `collection:memory`). No engine changes required for collections.
+
+### 2. Collection Tag Conventions (no engine changes)
+
+Scoped document organisation using existing tags with a naming convention.
+
+- Convention: `collection:documents` (default), `collection:memory`, `collection:workspace`
+- MCP tools accept a `collection` parameter and translate to tag operations
+- The Go CLI can use the same convention via `--tags collection:memory`
+- No new schema, no new API parameters on the engine — uses existing tag infrastructure
+
+### 3. Note Mutation (engine extension)
+
+Allow existing notes to be updated in place without delete + re-add.
+
+- `PATCH /api/v1/notes/{id}` endpoint — accepts new text, re-chunks and re-embeds
+- Preserves original `created_at`, updates `updated_at`
+- `kb updatenote <id> "new text"` CLI command
+- `kb_update_note` MCP tool
+
+### 4. Agent-Side Search Patterns (no engine changes)
+
+Query expansion and reranking are **caller responsibilities**, not engine features. The calling agent already has an LLM — adding one inside the engine would duplicate capability, introduce a cloud API dependency into a fully local system, and complicate testing.
+
+**Query expansion** — the agent expands its query into 2-3 variant phrasings, makes multiple `kb_search` calls, and merges/deduplicates results in its own context. The MCP tool descriptions should document this as a recommended pattern for complex natural-language questions.
+
+**Reranking** — the agent reads the top N search results and applies its own judgement to reorder by relevance. This is what agents already do when synthesising answers from retrieved chunks.
+
+These patterns should be documented in the MCP tool descriptions and the kb skill guidance, not implemented as engine features.
+
+## Capabilities
+
+### New Capabilities
+
+- `mcp-server`: MCP protocol server exposing kb tools (search, addnote, chunked file upload, update_note, get, status, jobs) for native agent integration. Runs as a Docker container with Streamable HTTP transport. Calls engine HTTP API internally. File uploads use a three-step chunked pattern (start → chunk × N → finish) to avoid message size limits, then proxy to the engine's existing upload endpoint.
+- `note-mutation`: In-place update of existing notes. New PATCH endpoint re-chunks and re-embeds while preserving document identity and creation timestamp.
+- `agent-search-patterns`: Documented patterns for agent-side query expansion (multi-query + merge) and reranking (LLM-based result reordering). No engine changes — these are caller responsibilities, documented in MCP tool descriptions and skill guidance.
+
+### Modified Capabilities
+
+- `engine-api`: New endpoint for note mutation (`PATCH /api/v1/notes/{id}`). `documents` table gains `updated_at` column.
+- `go-client`: New `updatenote` command.
+
+## Impact
+
+- **Code — new**: `mcp/` directory — MCP server package. Thin adapter translating MCP tool calls to engine HTTP API calls, with base64 file upload decoding.
+- **Code — engine**: `kb/database.py` — add `updated_at` column, migration logic. New `kb/routes/notes.py` for PATCH endpoint.
+- **Code — client**: New `cmd/updatenote.go`. `internal/api/client.go` for new endpoint.
+- **APIs**: New `PATCH /api/v1/notes/{id}`.
+- **Dependencies**: MCP Python SDK (`mcp` package) and `httpx` for the MCP server.
+- **Systems**: MCP server added to Docker Compose stack. Agents connect to it via Streamable HTTP.
+- **Data**: SQLite schema migration — `updated_at TEXT` column on `documents` table. Non-destructive.
+- **Versioning**: Engine bumps to v3.0.0 (new endpoint + schema). Client bumps to v3.0.0 (new command). MIN_ENGINE_VERSION updated to v3.0.0.
@@ -0,0 +1,35 @@
+# Agent-Side Search Patterns
+
+## Purpose
+
+Documents recommended patterns for agent-side query expansion and reranking, which are caller responsibilities rather than engine features. These patterns are communicated via MCP tool descriptions.
+
+## Requirements
+
+### Requirement: Query expansion guidance in tool description
+
+The `kb_search` MCP tool description SHALL include guidance on query expansion as a recommended pattern for complex queries.
+
+#### Scenario: Tool description includes expansion pattern
+- **WHEN** an agent reads the `kb_search` tool description
+- **THEN** the description SHALL include guidance such as: "For complex queries, consider expanding into 2-3 variant phrasings and calling this tool multiple times, then deduplicating results by chunk_id"
+
+---
+
+### Requirement: Reranking guidance in tool description
+
+The `kb_search` MCP tool description SHALL include guidance on agent-side reranking as a recommended pattern for improving precision.
+
+#### Scenario: Tool description includes reranking pattern
+- **WHEN** an agent reads the `kb_search` tool description
+- **THEN** the description SHALL include guidance such as: "For precision, rerank the returned results using your own judgement based on relevance to the original question"
+
+---
+
+### Requirement: No engine-side LLM dependency
+
+The engine SHALL NOT require or use any external LLM API for search operations. Query expansion and reranking SHALL remain entirely agent-side concerns.
+
+#### Scenario: Engine has no LLM dependency
+- **WHEN** the engine is deployed without any `ANTHROPIC_API_KEY` or similar LLM API configuration
+- **THEN** all search operations SHALL function fully, with no degraded results or missing features
@@ -0,0 +1,79 @@
+# Engine API (Delta)
+
+## ADDED Requirements
+
+### Requirement: Note mutation endpoint
+
+The engine SHALL provide a `PATCH /api/v1/notes/{id}` endpoint for updating existing notes in place. See the `note-mutation` spec for full details.
+
+#### Scenario: Note update endpoint exists
+- **WHEN** a client sends `PATCH /api/v1/notes/42` with body `{"text": "new content"}`
+- **THEN** the engine SHALL process the update synchronously and return the updated document
+
+---
+
+### Requirement: Document updated_at tracking
+
+The engine SHALL track when documents are modified via an `updated_at` column. This column SHALL be NULL for documents that have never been updated.
+
+#### Scenario: New document has no updated_at
+- **WHEN** a document is first ingested
+- **THEN** `updated_at` SHALL be NULL and `created_at` SHALL be set to the ingestion timestamp
+
+#### Scenario: Note update sets updated_at
+- **WHEN** a note is updated via `PATCH /api/v1/notes/{id}`
+- **THEN** `updated_at` SHALL be set to the current timestamp
+
+#### Scenario: Tag change sets updated_at
+- **WHEN** tags are modified via `PUT /api/v1/documents/{id}/tags`
+- **THEN** `updated_at` SHALL be set to the current timestamp
+
+#### Scenario: Schema migration for updated_at
+- **WHEN** the engine starts against a v2 database without an `updated_at` column
+- **THEN** the engine SHALL automatically add `ALTER TABLE documents ADD COLUMN updated_at TEXT` and all existing documents SHALL have `updated_at = NULL`
+
+## MODIFIED Requirements
+
+### Requirement: Document management
+
+The engine SHALL provide endpoints to list, inspect, remove, and download original files for ingested documents.
+
+#### Scenario: List documents
+- **WHEN** a client sends `GET /api/v1/documents`
+- **THEN** the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, created_at, and updated_at
+
+#### Scenario: List documents with filters
+- **WHEN** a client sends `GET /api/v1/documents?type=pdf&tags=manual`
+- **THEN** the engine SHALL return only documents matching all specified filters
+
+#### Scenario: List documents sorted by most recent
+- **WHEN** a client requests documents sorted by date
+- **THEN** the engine SHALL use `COALESCE(updated_at, created_at)` for ordering, so un-mutated documents sort by creation time and mutated documents sort by their last update
+
+#### Scenario: Get document details
+- **WHEN** a client sends `GET /api/v1/documents/{id}`
+- **THEN** the engine SHALL return the full document record including all chunks, their text content, `updated_at`, and whether the original file is available (`has_file: true/false`)
+
+#### Scenario: Download original file
+- **WHEN** a client sends `GET /api/v1/documents/{id}/file`
+- **THEN** the engine SHALL return the original file with appropriate Content-Type and `Content-Disposition: attachment; filename="{original_filename}"` headers, or HTTP 404 if the file is not available
+
+#### Scenario: Remove a document
+- **WHEN** a client sends `DELETE /api/v1/documents/{id}`
+- **THEN** the engine SHALL delete the document, all its chunks, associated embeddings, tag associations, and the stored original file from disk, and return HTTP 200 with a confirmation
+
+#### Scenario: Remove non-existent document
+- **WHEN** a client sends `DELETE /api/v1/documents/{id}` with a non-existent ID
+- **THEN** the engine SHALL return HTTP 404
+
+### Requirement: Engine status and reindex
+
+The engine SHALL provide status information and support re-embedding all chunks. The `version` field in the status response SHALL always be present and SHALL reflect the engine's release version as read from the `VERSION` file. This field is the contract used by clients for compatibility checking.
+
+#### Scenario: Get engine status
+- **WHEN** a client sends `GET /api/v1/status`
+- **THEN** the engine SHALL return JSON with `version` (string, from VERSION file), model_name, embedding_dim, GPU device info, database stats (document count by type, total chunks, DB size), and queue stats (queued/processing job count)
+
+#### Scenario: Trigger reindex
+- **WHEN** a client sends `POST /api/v1/reindex`
+- **THEN** the engine SHALL re-embed all existing chunks using the `enriched_text` column and the currently loaded model, and return progress information. This operation SHALL NOT block search queries.
@@ -0,0 +1,57 @@
+# Go Client (Delta)
+
+## ADDED Requirements
+
+### Requirement: Update note command
+
+The client SHALL provide a `kb updatenote <id> <text>` command that updates an existing note's content via the engine's `PATCH /api/v1/notes/{id}` endpoint.
+
+#### Scenario: Update a note
+- **WHEN** the user runs `kb updatenote 42 "Updated note content"`
+- **THEN** the client SHALL send `PATCH /api/v1/notes/42` with body `{"text": "Updated note content"}` and display the result
+
+#### Scenario: Update a note with JSON output
+- **WHEN** the user runs `kb updatenote 42 "new content" --format json`
+- **THEN** the client SHALL output the raw JSON response from the engine
+
+#### Scenario: Update a non-existent document
+- **WHEN** the user runs `kb updatenote 999 "text"` and the engine returns HTTP 404
+- **THEN** the client SHALL display an error indicating the document was not found and exit with a non-zero code
+
+#### Scenario: Update a non-note document
+- **WHEN** the user runs `kb updatenote 42 "text"` and the engine returns HTTP 422
+- **THEN** the client SHALL display an error indicating that only notes can be updated and exit with a non-zero code
+
+#### Scenario: Missing arguments
+- **WHEN** the user runs `kb updatenote` or `kb updatenote 42` with insufficient arguments
+- **THEN** the client SHALL display usage help indicating that both document ID and text are required
+
+## MODIFIED Requirements
+
+### Requirement: Engine version compatibility check
+
+The client SHALL verify that the connected engine meets a minimum version requirement before executing any API command. The minimum required engine version SHALL be embedded in the client binary at build time. If the engine version is below the minimum, the client SHALL print an error message and exit with a non-zero code. There SHALL be no flag to skip or suppress this check.
+
+#### Scenario: Compatible engine version
+- **WHEN** the client connects to an engine reporting version `3.0.0` and `MinEngineVersion` is `3.0.0`
+- **THEN** the client SHALL proceed with the command normally
+
+#### Scenario: Incompatible engine version
+- **WHEN** the client connects to an engine reporting version `2.1.0` and `MinEngineVersion` is `3.0.0`
+- **THEN** the client SHALL print to stderr: `Error: kb client vX.Y.Z requires engine v3.0.0+ (connected engine is v2.1.0)` followed by an upgrade hint, and exit with code 1
+
+#### Scenario: Engine unreachable during version check
+- **WHEN** the client cannot reach the engine's `/api/v1/status` endpoint
+- **THEN** the client SHALL skip the version check and proceed with the original command (the actual API call will surface the connectivity error)
+
+#### Scenario: Version check is cached per session
+- **WHEN** the client has already verified engine compatibility during the current invocation
+- **THEN** subsequent API calls within the same invocation SHALL NOT repeat the version check
+
+#### Scenario: Client version command does not check engine
+- **WHEN** the user runs `kb --version`
+- **THEN** the client SHALL print the client version without contacting the engine
+
+#### Scenario: MinEngineVersion not set
+- **WHEN** the client binary has `MinEngineVersion` set to empty string or `dev`
+- **THEN** the client SHALL skip the version check entirely (development builds)
@@ -0,0 +1,205 @@
+# MCP Server
+
+## Purpose
+
+The MCP server provides a Model Context Protocol interface to the kb engine, exposing knowledge base operations as native MCP tools over Streamable HTTP transport. It runs as a separate Docker container alongside the engine, translating MCP tool calls into engine HTTP API calls.
+
+## Requirements
+
+### Requirement: MCP server transport and deployment
+
+The MCP server SHALL expose tools via Streamable HTTP transport. It SHALL run as a Docker container, configured to connect to the kb engine's HTTP API. It SHALL read `KB_ENGINE_URL` and `KB_API_KEY` from environment variables to connect to the engine.
+
+#### Scenario: MCP server starts and connects to engine
+- **WHEN** the MCP server container starts with `KB_ENGINE_URL=http://engine:8000` and `KB_API_KEY=secret`
+- **THEN** it SHALL begin accepting MCP connections over Streamable HTTP and use the configured URL and API key for all engine API calls
+
+#### Scenario: Engine unreachable at startup
+- **WHEN** the MCP server starts but cannot reach the engine at `KB_ENGINE_URL`
+- **THEN** it SHALL start and accept connections, but tool calls SHALL return errors indicating the engine is unreachable
+
+#### Scenario: Docker Compose deployment
+- **WHEN** the MCP server is deployed via Docker Compose alongside the engine
+- **THEN** it SHALL connect to the engine via the Docker network using the service name (e.g. `http://engine:8000`)
+
+---
+
+### Requirement: MCP server authentication
+
+The MCP server SHALL require Bearer token authentication from calling agents via the `KB_MCP_API_KEY` environment variable. This is independent of the engine's `KB_API_KEY`.
+
+#### Scenario: Valid MCP API key
+- **WHEN** `KB_MCP_API_KEY` is set and a calling agent provides a matching Bearer token
+- **THEN** the MCP server SHALL process the request normally
+
+#### Scenario: Missing MCP API key when required
+- **WHEN** `KB_MCP_API_KEY` is set and a calling agent connects without a Bearer token
+- **THEN** the MCP server SHALL reject the connection with an authentication error
+
+#### Scenario: Invalid MCP API key
+- **WHEN** `KB_MCP_API_KEY` is set and a calling agent provides a non-matching Bearer token
+- **THEN** the MCP server SHALL reject the connection with an authentication error
+
+#### Scenario: MCP auth disabled
+- **WHEN** `KB_MCP_API_KEY` is not set
+- **THEN** the MCP server SHALL accept all connections without authentication
+
+---
+
+### Requirement: Search tool
+
+The MCP server SHALL expose a `kb_search` tool that queries the knowledge base via the engine's search API.
+
+#### Scenario: Basic search
+- **WHEN** an agent calls `kb_search` with `{"query": "pension revaluation", "top": 5}`
+- **THEN** the MCP server SHALL POST to the engine's `/api/v1/search` endpoint and return the results with chunk text, scores, document metadata, and tags
+
+#### Scenario: Search with collection filter
+- **WHEN** an agent calls `kb_search` with `{"query": "email preferences", "collection": "memory"}`
+- **THEN** the MCP server SHALL add `collection:memory` to the tags filter and POST to the engine's search endpoint
+
+#### Scenario: Search with tags and collection
+- **WHEN** an agent calls `kb_search` with `{"query": "feedback", "tags": ["email"], "collection": "memory"}`
+- **THEN** the MCP server SHALL combine the explicit tags with `collection:memory` in the tag filter
+
+#### Scenario: Search results strip collection tags
+- **WHEN** the engine returns search results containing tags `["collection:memory", "feedback", "email"]`
+- **THEN** the MCP server SHALL strip `collection:*` tags from the `tags` array and add a separate `collection` field, returning `{"collection": "memory", "tags": ["feedback", "email"], ...}`
+
+#### Scenario: Search with mode override
+- **WHEN** an agent calls `kb_search` with `{"query": "error log", "fts_only": true}`
+- **THEN** the MCP server SHALL pass `fts_only: true` to the engine search endpoint
+
+---
+
+### Requirement: Add note tool
+
+The MCP server SHALL expose a `kb_addnote` tool that submits a text note to the engine for ingestion.
+
+#### Scenario: Add a note with default collection
+- **WHEN** an agent calls `kb_addnote` with `{"text": "User prefers concise responses"}`
+- **THEN** the MCP server SHALL submit the note to the engine's `POST /api/v1/jobs` endpoint with the tag `collection:documents` and return the job ID
+
+#### Scenario: Add a note to a specific collection
+- **WHEN** an agent calls `kb_addnote` with `{"text": "User prefers concise responses", "collection": "memory", "tags": ["feedback"]}`
+- **THEN** the MCP server SHALL submit the note with tags `["collection:memory", "feedback"]` to the engine
+
+#### Scenario: Add a note to a collection replaces existing collection tag
+- **WHEN** an agent calls `kb_addnote` with `{"text": "some note", "collection": "memory"}` and the note is ingested
+- **THEN** the resulting document SHALL have exactly one `collection:*` tag: `collection:memory`
+
+---
+
+### Requirement: Chunked file upload tools
+
+The MCP server SHALL expose a three-step chunked file upload pattern for transferring files from remote agents to the engine.
+
+#### Scenario: Start an upload
+- **WHEN** an agent calls `kb_upload_start` with `{"filename": "report.pdf", "total_size": 5242880, "tags": ["insurance"], "collection": "documents"}`
+- **THEN** the MCP server SHALL create a staging entry, generate a UUID `upload_id`, and return `{"upload_id": "<uuid>"}`
+
+#### Scenario: Upload a chunk
+- **WHEN** an agent calls `kb_upload_chunk` with `{"upload_id": "<uuid>", "data": "<base64-encoded-data>", "chunk_index": 0}`
+- **THEN** the MCP server SHALL decode the base64 data and write it to the staging area for the given upload
+
+#### Scenario: Upload multiple chunks in sequence
+- **WHEN** an agent calls `kb_upload_chunk` multiple times with sequential `chunk_index` values for the same `upload_id`
+- **THEN** the MCP server SHALL store each chunk and track the sequence
+
+#### Scenario: Finish an upload
+- **WHEN** an agent calls `kb_upload_finish` with `{"upload_id": "<uuid>"}`
+- **THEN** the MCP server SHALL reassemble the chunks in order, forward the complete file as a multipart upload to the engine's `POST /api/v1/jobs` endpoint with the tags from `kb_upload_start` (including `collection:<name>`), and return the job ID
+
+#### Scenario: Upload with invalid upload_id
+- **WHEN** an agent calls `kb_upload_chunk` or `kb_upload_finish` with an `upload_id` that does not exist
+- **THEN** the MCP server SHALL return an error indicating the upload ID is not found
+
+#### Scenario: Abandoned upload cleanup
+- **WHEN** an agent starts an upload but does not call `kb_upload_finish` within 10 minutes
+- **THEN** the MCP server SHALL clean up the staged chunks and remove the upload tracking entry
+
+#### Scenario: MCP server restart during upload
+- **WHEN** the MCP server container restarts while an upload is in progress
+- **THEN** the in-progress upload SHALL be lost and the agent SHALL need to restart from `kb_upload_start`
+
+---
+
+### Requirement: Update note tool
+
+The MCP server SHALL expose a `kb_update_note` tool that updates an existing note in place via the engine's note mutation endpoint.
+
+#### Scenario: Update an existing note
+- **WHEN** an agent calls `kb_update_note` with `{"document_id": 42, "text": "Updated preference: user prefers bullet points"}`
+- **THEN** the MCP server SHALL send `PATCH /api/v1/notes/42` to the engine and return the updated document
+
+#### Scenario: Update a non-existent document
+- **WHEN** an agent calls `kb_update_note` with a `document_id` that does not exist
+- **THEN** the MCP server SHALL return an error indicating the document was not found
+
+#### Scenario: Update a non-note document
+- **WHEN** an agent calls `kb_update_note` with a `document_id` that refers to a PDF
+- **THEN** the MCP server SHALL return an error indicating that only notes can be updated
+
+---
+
+### Requirement: Get document tool
+
+The MCP server SHALL expose a `kb_get` tool that retrieves document details from the engine.
+
+#### Scenario: Get by document ID
+- **WHEN** an agent calls `kb_get` with `{"document_id": 42}`
+- **THEN** the MCP server SHALL fetch `GET /api/v1/documents/42` and return the document details with chunks
+
+#### Scenario: Get by source path
+- **WHEN** an agent calls `kb_get` with `{"source_path": "memory/feedback_testing.md"}`
+- **THEN** the MCP server SHALL query the engine's documents endpoint filtered by source path and return matching documents
+
+#### Scenario: Get results strip collection tags
+- **WHEN** the engine returns document details with tags including `collection:memory`
+- **THEN** the MCP server SHALL strip `collection:*` from tags and present a separate `collection` field
+
+---
+
+### Requirement: Status tool
+
+The MCP server SHALL expose a `kb_status` tool that returns engine health and statistics.
+
+#### Scenario: Get engine status
+- **WHEN** an agent calls `kb_status` with no parameters
+- **THEN** the MCP server SHALL fetch `GET /api/v1/status` and return engine version, model info, device info, document counts, and queue state
+
+---
+
+### Requirement: Jobs tool
+
+The MCP server SHALL expose a `kb_jobs` tool that returns ingestion job status.
+
+#### Scenario: List recent jobs
+- **WHEN** an agent calls `kb_jobs` with no parameters
+- **THEN** the MCP server SHALL fetch `GET /api/v1/jobs` and return the list of recent jobs
+
+#### Scenario: Filter jobs by status
+- **WHEN** an agent calls `kb_jobs` with `{"status": "failed"}`
+- **THEN** the MCP server SHALL fetch `GET /api/v1/jobs?status=failed` and return matching jobs
+
+---
+
+### Requirement: Collection management via tags
+
+The MCP server SHALL manage collections using tag conventions. The MCP server SHALL enforce exclusive collection membership — a document SHALL belong to exactly one collection.
+
+#### Scenario: Default collection on addnote
+- **WHEN** an agent calls `kb_addnote` without specifying a collection
+- **THEN** the MCP server SHALL apply the tag `collection:documents`
+
+#### Scenario: Explicit collection on addnote
+- **WHEN** an agent calls `kb_addnote` with `{"collection": "memory"}`
+- **THEN** the MCP server SHALL apply the tag `collection:memory`
+
+#### Scenario: Exclusive collection enforcement
+- **WHEN** a document already has the tag `collection:documents` and an operation changes its collection to `memory`
+- **THEN** the MCP server SHALL first remove `collection:documents` via the engine's tag API, then add `collection:memory`
+
+#### Scenario: Collection field in search results
+- **WHEN** search results include documents with `collection:*` tags
+- **THEN** the MCP server SHALL present the collection as a top-level `collection` field and exclude `collection:*` from the `tags` array
@@ -0,0 +1,43 @@
+# Note Mutation
+
+## Purpose
+
+Note mutation allows existing notes to be updated in place without requiring delete and re-add, preserving document identity (ID, creation timestamp) while updating content, embeddings, and the full-text index.
+
+## Requirements
+
+### Requirement: Note update endpoint
+
+The engine SHALL provide a `PATCH /api/v1/notes/{id}` endpoint that accepts new text for an existing note, re-chunks and re-embeds it, and returns the updated document.
+
+#### Scenario: Update an existing note
+- **WHEN** a client sends `PATCH /api/v1/notes/42` with body `{"text": "Updated note content"}`
+- **THEN** the engine SHALL delete existing chunks and embeddings for document 42, run the new text through the note chunking pipeline, generate embeddings for each chunk, insert new chunks and embeddings, update the document's `content_hash` and `updated_at`, and return the updated document with HTTP 200
+
+#### Scenario: Update preserves document identity
+- **WHEN** a note is updated via PATCH
+- **THEN** the document SHALL retain its original `id` and `created_at` values, and `updated_at` SHALL be set to the current timestamp
+
+#### Scenario: Update with long text that produces multiple chunks
+- **WHEN** a client sends `PATCH /api/v1/notes/42` with text longer than the embedding model's token window
+- **THEN** the engine SHALL chunk the text using the same note chunking pipeline as ingestion, producing multiple chunks, and embed each chunk separately
+
+#### Scenario: Update a non-existent document
+- **WHEN** a client sends `PATCH /api/v1/notes/999` and document 999 does not exist
+- **THEN** the engine SHALL return HTTP 404
+
+#### Scenario: Update a non-note document
+- **WHEN** a client sends `PATCH /api/v1/notes/42` and document 42 has `doc_type = 'pdf'`
+- **THEN** the engine SHALL return HTTP 422 with an error indicating that only notes can be updated via this endpoint
+
+#### Scenario: Embedding failure during update
+- **WHEN** a client sends `PATCH /api/v1/notes/42` but the embedding step fails
+- **THEN** the engine SHALL roll back the entire transaction, preserving the original note content, chunks, and embeddings, and return HTTP 500
+
+#### Scenario: FTS5 index updated on note mutation
+- **WHEN** a note is updated via PATCH
+- **THEN** the FTS5 virtual table SHALL be updated via the existing chunk triggers (`chunks_ad` for deletes, `chunks_ai` for inserts), keeping the full-text index consistent with the new content
+
+#### Scenario: Tags preserved on update
+- **WHEN** a note with tags `["feedback", "collection:memory"]` is updated via PATCH
+- **THEN** the document's tags SHALL be unchanged — only the text content, chunks, and embeddings are replaced
@@ -0,0 +1,90 @@
+## 1. Engine: Schema Migration & updated_at
+
+- [x] 1.1 Add `updated_at TEXT` column migration to `init_schema()` in `kb/database.py` (same pattern as existing `ALTER TABLE` migrations)
+- [x] 1.2 Update `insert_document()` to include `updated_at` in returned/stored fields
+- [x] 1.3 Update document list endpoint (`GET /api/v1/documents`) to include `updated_at` in response and use `COALESCE(updated_at, created_at)` for date sorting
+- [x] 1.4 Update document detail endpoint (`GET /api/v1/documents/{id}`) to include `updated_at` in response
+- [x] 1.5 Update tag management endpoint (`PUT /api/v1/documents/{id}/tags`) to set `updated_at = current_timestamp` on tag changes
+
+## 2. Engine: Note Mutation Endpoint
+
+- [x] 2.1 Create `kb/routes/notes.py` with `PATCH /api/v1/notes/{id}` endpoint
+- [x] 2.2 Implement validation: document must exist and have `doc_type = 'note'` (404 / 422 on failure)
+- [x] 2.3 Implement note update logic: delete old chunks/embeddings, run note chunking pipeline, re-embed, insert new chunks, update `content_hash` and `updated_at` — all in a single transaction
+- [x] 2.4 Register the notes router in `engine/main.py`
+- [x] 2.5 Test: update a note and verify chunks, embeddings, FTS index, and `updated_at` are all correctly updated
+- [x] 2.6 Test: verify rollback on embedding failure preserves original note
+
+## 3. Engine: Version Bump
+
+- [x] 3.1 Update `engine/VERSION` to `3.0.0`
+
+## 4. Go Client: Update Note Command
+
+- [x] 4.1 Add `PATCH /api/v1/notes/{id}` method to `internal/api/client.go`
+- [x] 4.2 Create `cmd/updatenote.go` — takes document ID and text as positional args, calls PATCH endpoint, formats output (human/json)
+- [x] 4.3 Handle error cases: 404 (not found), 422 (not a note), missing arguments
+- [x] 4.4 Update `cmd/examples.go` to include `updatenote` usage
+
+## 5. Go Client: Version Bump
+
+- [x] 5.1 Update `client/VERSION` to `3.0.0`
+- [x] 5.2 Update `client/MIN_ENGINE_VERSION` to `3.0.0`
+
+## 6. MCP Server: Project Setup
+
+- [x] 6.1 Create `mcp/` directory at repo root with Python package structure
+- [x] 6.2 Add `mcp` SDK and `httpx` as dependencies (requirements.txt or pyproject.toml)
+- [x] 6.3 Implement config: read `KB_ENGINE_URL`, `KB_API_KEY`, `KB_MCP_API_KEY` from environment
+- [x] 6.4 Implement Streamable HTTP transport setup using `mcp` SDK
+- [x] 6.5 Implement Bearer token authentication for incoming agent connections (`KB_MCP_API_KEY`)
+
+## 7. MCP Server: Core Tools
+
+- [x] 7.1 Implement `kb_search` tool — proxy to engine search API, translate `collection` param to `collection:*` tag filter, strip `collection:*` tags from results and add `collection` field
+- [x] 7.2 Implement `kb_addnote` tool — proxy to engine jobs API, apply `collection:<name>` tag (default `collection:documents`)
+- [x] 7.3 Implement `kb_update_note` tool — proxy to engine `PATCH /api/v1/notes/{id}`
+- [x] 7.4 Implement `kb_get` tool — proxy to engine documents API, support lookup by ID or source_path, strip collection tags from response
+- [x] 7.5 Implement `kb_status` tool — proxy to engine status API
+- [x] 7.6 Implement `kb_jobs` tool — proxy to engine jobs API with optional status filter
+
+## 8. MCP Server: Chunked File Upload
+
+- [x] 8.1 Implement `kb_upload_start` tool — generate UUID, create temp staging directory, store upload metadata (filename, tags, collection) in memory
+- [x] 8.2 Implement `kb_upload_chunk` tool — validate upload_id exists, decode base64, write chunk to staging directory by chunk_index
+- [x] 8.3 Implement `kb_upload_finish` tool — reassemble chunks in order, forward as multipart upload to engine `POST /api/v1/jobs` with tags (including `collection:*`), return job ID, clean up staging
+- [x] 8.4 Implement abandoned upload cleanup — background task that removes uploads older than 10 minutes
+- [x] 8.5 Test: upload a multi-chunk file and verify it arrives at the engine correctly
+
+## 9. MCP Server: Collection Management
+
+- [x] 9.1 Implement exclusive collection enforcement — on addnote/addfile, query document tags, remove any existing `collection:*` tags via engine tag API before applying new one
+- [x] 9.2 Implement collection tag stripping in all tool responses (search results, document details)
+
+## 10. MCP Server: Tool Descriptions
+
+- [x] 10.1 Write `kb_search` tool description including query expansion and reranking guidance
+- [x] 10.2 Write descriptions for all other tools with parameter documentation and usage examples
+- [x] 10.3 Include chunked upload usage example in `kb_upload_start` description
+
+## 11. MCP Server: Docker & Compose
+
+- [x] 11.1 Create `mcp/Dockerfile` — Python base image, install dependencies, run MCP server
+- [x] 11.2 Add MCP server service to Docker Compose file(s) — connect to engine via Docker network, expose Streamable HTTP port
+- [x] 11.3 Document environment variables (`KB_ENGINE_URL`, `KB_API_KEY`, `KB_MCP_API_KEY`) in compose file
+
+## 12. Integration Testing
+
+- [x] 12.1 Test: MCP search with collection filter returns only matching documents
+- [x] 12.2 Test: MCP addnote with collection applies correct tag and enforces exclusivity
+- [x] 12.3 Test: MCP update note preserves document ID and tags, updates content and `updated_at`
+- [x] 12.4 Test: chunked file upload end-to-end (start → chunk × N → finish → verify job created)
+- [x] 12.5 Test: MCP server rejects unauthenticated connections when `KB_MCP_API_KEY` is set
+
+## 13. Release
+
+- [x] 13.1 Build and tag engine Docker images (`engine-v3.0.0-*`)
+- [x] 13.2 Build and tag MCP server Docker image
+- [x] 13.3 Build Go client binaries for all platforms
+- [x] 13.4 Create git tags: `engine-v3.0.0`, `client-v3.0.0`
+- [x] 13.5 Update SKILL.md to reference MCP server as primary agent integration path
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-04-04
@@ -0,0 +1,59 @@
+## Context
+
+The MCP Python SDK includes DNS rebinding protection via `TransportSecuritySettings` in `mcp.server.transport_security`. When enabled, it validates the `Host` header against an allowlist and returns 421 for unrecognised hosts.
+
+`FastMCP` auto-enables this protection when `host` is `127.0.0.1`, `localhost`, or `::1`, with a default allowlist of those three values (wildcard port). The kb MCP server does not pass a `host` to `FastMCP()` and does not pass `transport_security`, so the behaviour depends on the SDK's defaults — which have changed between versions and will likely keep changing.
+
+Currently the server calls `mcp.streamable_http_app()` to get a Starlette sub-app and wraps it in its own Starlette app with `BearerAuthMiddleware`. The `transport_security` settings flow through `FastMCP()` → `Settings` → `StreamableHTTPSessionManager`, so they must be set at `FastMCP` construction time.
+
+When a remote client connects (e.g. `Host: 192.168.1.50:3000`), the SDK rejects the request with 421 before our auth middleware even runs.
+
+## Goals / Non-Goals
+
+**Goals:**
+
+- Allow operators to configure additional allowed hosts via environment variable so remote clients can connect.
+- Support both IP addresses and FQDNs, with or without port.
+- Preserve DNS rebinding protection (keep it enabled, just widen the allowlist).
+- Maintain backward compatibility — unset variable means localhost-only, same as today.
+
+**Non-Goals:**
+
+- Disabling DNS rebinding protection entirely.
+- Configuring allowed origins separately from allowed hosts (derive origins automatically from hosts).
+- TLS termination or HTTPS — that belongs to a reverse proxy in front of the MCP container.
+
+## Decisions
+
+### 1. Environment variable format
+
+`KB_MCP_ALLOWED_HOSTS` is a comma-separated list of hosts. Each entry is an IP address or FQDN without port and without scheme.
+
+Examples: `192.168.1.50`, `kb.example.com`, `192.168.1.50,kb.example.com,10.0.0.1`
+
+**Rationale:** Comma-separated is the simplest format that doesn't require quoting in Docker Compose YAML or shell environments. Ports are omitted because the wildcard-port pattern (`host:*`) covers all ports — operators shouldn't need to know the internal port.
+
+**Alternative considered:** JSON array — rejected, awkward in env vars and Compose files.
+
+### 2. Merge with localhost defaults
+
+The parsed hosts are merged with the hardcoded localhost set (`127.0.0.1`, `localhost`, `[::1]`). Localhost is always allowed regardless of the env var value.
+
+**Rationale:** Removing localhost would break local development and health checks. There's no reason to ever disallow it.
+
+### 3. Auto-derive allowed origins from allowed hosts
+
+For each allowed host, generate `http://<host>:*` as an allowed origin. No separate env var for origins.
+
+**Rationale:** The MCP server doesn't serve HTTPS (TLS is terminated by a reverse proxy), so `http://` is always correct at the container level. If HTTPS origins are needed in future, a separate env var can be added then.
+
+### 4. Pass TransportSecuritySettings explicitly to FastMCP
+
+Always construct a `TransportSecuritySettings` with `enable_dns_rebinding_protection=True` and the merged allowlist, and pass it as `transport_security=` to `FastMCP()`. This makes the behaviour explicit rather than depending on SDK defaults.
+
+**Rationale:** The SDK's auto-detection logic depends on the `host` parameter which we don't set, and the defaults may change between SDK versions. Being explicit removes the ambiguity.
+
+## Risks / Trade-offs
+
+- **Operator must know their Host header value** — If a reverse proxy rewrites the Host header, the operator needs to allowlist the rewritten value, not the original. → Mitigation: document this in Compose file comments.
+- **No HTTPS origin support** — If a client sends `Origin: https://...`, it will be rejected. → Mitigation: acceptable for now; the MCP server sits behind a proxy that terminates TLS. Can add `KB_MCP_ALLOWED_ORIGINS` later if needed.
@@ -0,0 +1,26 @@
+## Why
+
+The MCP server uses the Python MCP SDK's built-in DNS rebinding protection, which validates the `Host` header on every request. By default it only allows `localhost`, `127.0.0.1`, and `[::1]`. When clients connect remotely — using an IP address or FQDN — the server returns 421 "Invalid Host header" and the connection fails. There is no way to configure allowed hosts without changing code.
+
+## What Changes
+
+- Add a new environment variable `KB_MCP_ALLOWED_HOSTS` that accepts a comma-separated list of additional allowed hosts (IPs and/or FQDNs).
+- The MCP server passes these hosts (plus the existing localhost defaults) to the MCP SDK's `TransportSecuritySettings` when constructing the ASGI app.
+- Both bare hosts and wildcard-port patterns are supported (e.g. `192.168.1.50` and `kb.example.com` both work, with any port).
+- When `KB_MCP_ALLOWED_HOSTS` is empty or unset, behaviour is unchanged (localhost-only).
+
+## Capabilities
+
+### New Capabilities
+
+_None — this is configuration of an existing component, not a new capability._
+
+### Modified Capabilities
+
+- `docker-deployment`: Add `KB_MCP_ALLOWED_HOSTS` to the MCP container's environment variables in Compose files and document its usage.
+
+## Impact
+
+- **mcp/config.py** — new `KB_MCP_ALLOWED_HOSTS` env var.
+- **mcp/server.py** — construct `TransportSecuritySettings` with merged allowed hosts/origins and pass to the FastMCP app.
+- **engine/compose.\*.yaml** — add `KB_MCP_ALLOWED_HOSTS` to the kb-mcp service environment block.
@@ -0,0 +1,65 @@
+## ADDED Requirements
+
+### Requirement: Configurable MCP allowed hosts
+
+The MCP server SHALL accept a `KB_MCP_ALLOWED_HOSTS` environment variable containing a comma-separated list of additional hosts (IP addresses or FQDNs) that are permitted to connect. The server SHALL always allow `127.0.0.1`, `localhost`, and `[::1]` regardless of this setting. DNS rebinding protection SHALL always be enabled.
+
+#### Scenario: Remote client connects with allowed host
+
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50` and a client connects with `Host: 192.168.1.50:3000`
+- **THEN** the server SHALL accept the request and process it normally
+
+#### Scenario: Remote client connects with disallowed host
+
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50` and a client connects with `Host: 10.0.0.99:3000`
+- **THEN** the server SHALL return HTTP 421 "Invalid Host header"
+
+#### Scenario: Multiple allowed hosts
+
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50,kb.example.com`
+- **THEN** the server SHALL accept requests with `Host` matching either `192.168.1.50` or `kb.example.com` on any port
+
+#### Scenario: Variable unset or empty
+
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is unset or empty
+- **THEN** the server SHALL allow only localhost addresses (`127.0.0.1`, `localhost`, `[::1]`) with any port
+
+#### Scenario: Localhost always allowed
+
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50`
+- **THEN** the server SHALL still accept requests with `Host: localhost:3000` or `Host: 127.0.0.1:3000`
+
+#### Scenario: Allowed origins derived from allowed hosts
+
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` includes `192.168.1.50`
+- **THEN** the server SHALL accept `Origin: http://192.168.1.50:3000` (and any port) in addition to localhost origins
+
+## MODIFIED Requirements
+
+### Requirement: Compose files for deployment
+
+The project SHALL provide Docker Compose files for single-command deployment. Compose files SHALL use `build:` context for local development. Release notes SHALL document the versioned image tag for users pulling pre-built images.
+
+#### Scenario: Start NVIDIA deployment
+- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
+- **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
+
+#### Scenario: Start ROCm deployment
+- **WHEN** an admin runs `docker compose -f compose.rocm.yaml up -d`
+- **THEN** the engine SHALL start with GPU access via ROCm device passthrough, bind-mount the data directory, and be reachable on the configured port
+
+#### Scenario: Automatic restart
+- **WHEN** the engine process crashes or the host reboots
+- **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)
+
+#### Scenario: Configure via environment
+- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, KB_MCP_ALLOWED_HOSTS, etc.)
+- **THEN** the engine and MCP server SHALL use those values
+
+#### Scenario: Pre-built image deployment
+- **WHEN** an admin wants to use a pre-built engine image without building from source
+- **THEN** the engine release notes SHALL include the exact `docker pull` command with the versioned tag (e.g. `docker.dcglab.co.uk/dcg/kb/engine:engine-v2.1.0-nvidia`)
+
+#### Scenario: MCP allowed hosts in Compose
+- **WHEN** the kb-mcp service is defined in a Compose file
+- **THEN** the environment block SHALL include `KB_MCP_ALLOWED_HOSTS` with a comment explaining its format and purpose
@@ -0,0 +1,18 @@
+## 1. Configuration
+
+- [x] 1.1 Add `KB_MCP_ALLOWED_HOSTS` to `mcp/config.py` — read from env, default empty string
+- [x] 1.2 Add host-parsing helper that splits the comma-separated value, strips whitespace, and filters empty entries
+
+## 2. Transport security
+
+- [x] 2.1 Build `TransportSecuritySettings` in `mcp/server.py` — merge localhost defaults with parsed `KB_MCP_ALLOWED_HOSTS`, derive allowed origins from allowed hosts
+- [x] 2.2 Pass `transport_security=` to the `FastMCP()` constructor
+
+## 3. Compose files
+
+- [x] 3.1 Add `KB_MCP_ALLOWED_HOSTS=${KB_MCP_ALLOWED_HOSTS:-}` to the kb-mcp environment block in `compose.cpu.yaml`, `compose.nvidia.yaml`, and `compose.rocm.yaml` with a comment explaining the format
+
+## 4. Verification
+
+- [x] 4.1 Test: unset `KB_MCP_ALLOWED_HOSTS` — confirm localhost connects, remote host gets 421
+- [x] 4.2 Test: set `KB_MCP_ALLOWED_HOSTS` to the server IP — confirm remote host connects successfully
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-04-04
@@ -0,0 +1,39 @@
+## Context
+
+The MCP server (`mcp/server.py`) exposes KB operations as tools for LLM clients. Collections are an abstraction over tags — internally stored with a `collection:` prefix. The server already has helpers for managing collection tags (`_collection_tag`, `_ensure_exclusive_collection`, `_strip_collection_tags`) and the engine client (`mcp/engine.py`) already has an `update_tags()` method.
+
+Document deletion is supported by the engine API at `DELETE /api/v1/documents/{doc_id}` but has no corresponding engine client method or MCP tool.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Expose collection assignment for existing documents via MCP (`kb_set_collection`)
+- Expose document deletion via MCP (`kb_delete`)
+- Follow existing patterns in `server.py` and `engine.py`
+
+**Non-Goals:**
+- Bulk operations (multi-document collection assignment or deletion)
+- Tag management beyond collections (direct tag add/remove via MCP)
+- Undo/recycle bin for deleted documents
+- Changes to the engine API layer — all endpoints already exist
+
+## Decisions
+
+### 1. Reuse `_ensure_exclusive_collection` for kb_set_collection
+
+The server already has `_ensure_exclusive_collection(doc_id, collection)` which removes any existing `collection:*` tags and applies the new one. The `kb_set_collection` tool will use this directly when a collection is provided, and manually remove collection tags when clearing.
+
+**Alternative considered**: Exposing raw tag add/remove to the LLM. Rejected because it leaks the `collection:` prefix implementation detail and the LLM could create inconsistent state (multiple collections on one document).
+
+### 2. New `engine.delete_document()` method for kb_delete
+
+Add a simple `delete_document(doc_id)` to `mcp/engine.py` that calls `DELETE /api/v1/documents/{doc_id}`. This follows the same pattern as all other engine client methods.
+
+### 3. Return confirmation with document metadata on delete
+
+`kb_delete` will return the response from the engine API which includes `{"status": "deleted", "document_id": ..., "title": ...}`. This gives the LLM confirmation of what was deleted without needing a separate get call.
+
+## Risks / Trade-offs
+
+- **[Accidental deletion]** → The LLM could delete the wrong document. Mitigation: the tool requires an explicit `document_id`, and the response includes the title so the LLM can verify. No bulk delete is exposed.
+- **[Collection cleared unexpectedly]** → Passing `collection=None` to `kb_set_collection` removes collection assignment. Mitigation: the parameter description will make this behavior explicit.
@@ -0,0 +1,25 @@
+## Why
+
+LLMs using the KB MCP server can create notes in collections and search by collection, but cannot assign existing documents to a collection or delete documents. This forces users to drop out to the HTTP API for routine document management. Both operations are fully supported at the database and HTTP API layers but aren't wired through to MCP tools.
+
+## What Changes
+
+- Add `kb_set_collection` MCP tool — assigns, changes, or removes the collection on an existing document by manipulating `collection:` prefixed tags via the existing `engine.update_tags()` method.
+- Add `kb_delete` MCP tool — deletes a document by ID, calling the existing `DELETE /api/v1/documents/{doc_id}` endpoint via a new `engine.delete_document()` method.
+
+## Capabilities
+
+### New Capabilities
+
+- `mcp-document-management`: MCP tools for modifying and deleting existing documents (kb_set_collection, kb_delete).
+
+### Modified Capabilities
+
+_(none — the engine API endpoints already exist; this change only adds MCP tool wrappers)_
+
+## Impact
+
+- **MCP server** (`mcp/server.py`): Two new tool registrations.
+- **MCP engine client** (`mcp/engine.py`): One new method (`delete_document`). The `update_tags` method already exists and will be reused.
+- **Engine API**: No changes — `DELETE /api/v1/documents/{doc_id}` and `PUT /api/v1/documents/{doc_id}/tags` already exist.
+- **Breaking changes**: None. Additive only.
@@ -0,0 +1,61 @@
+## ADDED Requirements
+
+### Requirement: Set collection on existing document via MCP
+
+The MCP server SHALL expose a `kb_set_collection` tool that assigns or changes the collection of an existing document. The tool SHALL accept a `document_id` (required) and `collection` (optional string). When `collection` is provided, the tool SHALL ensure the document belongs to exactly that collection by removing any existing `collection:*` tags and adding the new one. When `collection` is omitted or null, the tool SHALL remove all `collection:*` tags from the document, leaving it unassigned.
+
+The tool SHALL return the updated document with the `collection` field and cleaned tags (collection tags stripped), consistent with other MCP tool responses.
+
+#### Scenario: Assign untagged document to a collection
+
+- **WHEN** `kb_set_collection` is called with `document_id=42` and `collection="workspace"`
+- **THEN** the document SHALL have the tag `collection:workspace` added
+- **AND** the response SHALL include `"collection": "workspace"`
+
+#### Scenario: Change document from one collection to another
+
+- **WHEN** `kb_set_collection` is called with `document_id=42` and `collection="memory"` on a document currently in collection "documents"
+- **THEN** the tag `collection:documents` SHALL be removed and `collection:memory` SHALL be added
+- **AND** the response SHALL include `"collection": "memory"`
+
+#### Scenario: Remove document from all collections
+
+- **WHEN** `kb_set_collection` is called with `document_id=42` and no `collection` parameter
+- **THEN** all `collection:*` tags SHALL be removed from the document
+- **AND** the response SHALL include `"collection": null`
+
+#### Scenario: Document not found
+
+- **WHEN** `kb_set_collection` is called with a `document_id` that does not exist
+- **THEN** the tool SHALL return an error response indicating the document was not found
+
+### Requirement: Delete document via MCP
+
+The MCP server SHALL expose a `kb_delete` tool that permanently deletes a document from the knowledge base. The tool SHALL accept a `document_id` (required integer). Deletion SHALL remove the document, its chunks, embeddings, tags, and any stored file on disk.
+
+The tool SHALL return a confirmation response including the deleted document's ID and title.
+
+#### Scenario: Successful deletion
+
+- **WHEN** `kb_delete` is called with `document_id=42`
+- **THEN** the document, its chunks, embeddings, tag associations, and stored file SHALL be deleted
+- **AND** the response SHALL include `"status": "deleted"`, the `document_id`, and the document `title`
+
+#### Scenario: Document not found
+
+- **WHEN** `kb_delete` is called with a `document_id` that does not exist
+- **THEN** the tool SHALL return an error response indicating the document was not found
+
+### Requirement: Engine client delete method
+
+The MCP engine client (`mcp/engine.py`) SHALL provide a `delete_document(doc_id)` method that sends a `DELETE` request to `/api/v1/documents/{doc_id}` and returns the JSON response. The method SHALL raise on non-2xx status codes, consistent with other engine client methods.
+
+#### Scenario: Successful engine client delete call
+
+- **WHEN** `delete_document(42)` is called and the engine API returns 200
+- **THEN** the method SHALL return the parsed JSON response
+
+#### Scenario: Engine client delete for missing document
+
+- **WHEN** `delete_document(999)` is called and the engine API returns 404
+- **THEN** the method SHALL raise an `httpx.HTTPStatusError`
@@ -0,0 +1,12 @@
+## 1. Engine Client
+
+- [x] 1.1 Add `delete_document(doc_id)` method to `mcp/engine.py`
+
+## 2. MCP Tools
+
+- [x] 2.1 Add `kb_set_collection` tool to `mcp/server.py`
+- [x] 2.2 Add `kb_delete` tool to `mcp/server.py`
+
+## 3. Verification
+
+- [x] 3.1 Test kb_set_collection and kb_delete against running engine
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-04-06
@@ -0,0 +1,37 @@
+## Context
+
+The project currently ships three Docker image variants: CPU, NVIDIA, and AMD ROCm. The ROCm variant requires a 4.2GB pre-built torch wheel, a multi-stage Dockerfile with ROCm-specific runtime libraries, and additional build/push steps in the release pipeline. ROCm support is less tested and adds disproportionate complexity relative to its usage.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Remove all ROCm-specific files (Dockerfile, compose file, torch wheel)
+- Remove ROCm build/push from the release pipeline
+- Update all documentation to reflect CPU + NVIDIA only
+- Update the docker-deployment spec to remove ROCm requirements
+
+**Non-Goals:**
+- Changing any engine application code (it is already GPU-vendor-agnostic via PyTorch)
+- Modifying the CPU or NVIDIA Dockerfiles (beyond what's already in-flight)
+- Providing a migration path for ROCm users (they can stay on 3.2.x or use CPU mode)
+
+## Decisions
+
+**1. Delete ROCm files outright rather than deprecating**
+
+Remove `Dockerfile.rocm`, `compose.rocm.yaml`, and `assets/` immediately rather than marking them deprecated. There are no downstream consumers that depend on automated ROCm builds — anyone needing AMD support can pin to the last ROCm-supporting release.
+
+*Alternative considered*: Keep files but stop publishing images. Rejected — dead code is confusing and still requires maintenance awareness.
+
+**2. Leave archived openspec changes untouched**
+
+Archived changes under `openspec/changes/archive/` contain historical ROCm references. These are historical records and should not be modified.
+
+**3. Update GPU-vendor-agnostic requirement to reflect NVIDIA-only scope**
+
+The existing spec requirement "Application code is GPU-vendor-agnostic" remains true at the code level (PyTorch abstracts GPU vendors), but the project no longer provides or tests ROCm images. The spec should be simplified to reflect that only NVIDIA and CPU are supported deployment targets.
+
+## Risks / Trade-offs
+
+- **[Breaking change for AMD users]** → Users on AMD GPUs must stay on 3.2.x or use CPU mode. Mitigated by the fact that ROCm support was already "less tested" per the original design risk assessment.
+- **[Future re-addition harder]** → If ROCm support is needed later, the Dockerfile and compose file would need to be recreated. Mitigated by git history preserving the removed files.
@@ -0,0 +1,29 @@
+## Why
+
+AMD ROCm support adds significant complexity and maintenance burden to the project — the ROCm torch wheel alone is 4.2GB, the Dockerfile requires a multi-stage build with ROCm-specific runtime libraries, and the release pipeline must build/push additional images. The final container is >20Gb. ROCm support is less tested and less commonly used than CPU or NVIDIA. Removing it keeps the project focused and manageable.
+
+## What Changes
+
+- **BREAKING**: Remove AMD ROCm Docker image (`Dockerfile.rocm`) and compose file (`compose.rocm.yaml`)
+- **BREAKING**: Remove ROCm image build/push/release-notes from the engine release script
+- Remove pre-built ROCm torch wheel from `assets/`
+- Remove all AMD/ROCm references from user-facing docs (README, DEVELOPER)
+- Update docker-deployment spec to reflect CPU + NVIDIA only
+
+## Capabilities
+
+### New Capabilities
+
+_(none)_
+
+### Modified Capabilities
+
+- `docker-deployment`: Remove AMD ROCm Docker image requirement and all ROCm-specific scenarios. Deployment now covers CPU and NVIDIA only.
+
+## Impact
+
+- **Docker images**: ROCm image variant no longer published
+- **Users**: Anyone running KB on AMD GPUs will need to stay on the last version with ROCm support (3.2.x) or switch to CPU mode
+- **Release pipeline**: `release-engine.sh` simplified — only CPU and NVIDIA images
+- **Repository size**: ~4.2GB reduction by removing the torch wheel from `assets/`
+- **Docs**: README and DEVELOPER updated to remove AMD quick-start and build instructions
@@ -0,0 +1,76 @@
+## REMOVED Requirements
+
+### Requirement: AMD ROCm Docker image
+
+**Reason**: AMD ROCm support removed to reduce project complexity and binary size. The ROCm torch wheel is 4.2GB and the variant is less tested than CPU or NVIDIA.
+
+**Migration**: Users on AMD GPUs should stay on engine v3.2.x or switch to CPU mode (`KB_DEVICE=cpu`).
+
+---
+
+## MODIFIED Requirements
+
+### Requirement: Application code is GPU-vendor-agnostic
+
+The Python engine code SHALL NOT reference CUDA directly. GPU abstraction SHALL be handled at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and CPU images without modification.
+
+#### Scenario: Same engine code on both platforms
+- **WHEN** the engine starts on an NVIDIA image and a CPU image with identical configuration
+- **THEN** both SHALL load the model, accept requests, and return identical search results for the same query and data
+
+---
+
+### Requirement: Compose files for deployment
+
+The project SHALL provide Docker Compose files for single-command deployment. Compose files SHALL use `build:` context for local development. Release notes SHALL document the versioned image tag for users pulling pre-built images.
+
+#### Scenario: Start NVIDIA deployment
+- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
+- **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
+
+#### Scenario: Automatic restart
+- **WHEN** the engine process crashes or the host reboots
+- **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)
+
+#### Scenario: Configure via environment
+- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, KB_MCP_ALLOWED_HOSTS, etc.)
+- **THEN** the engine and MCP server SHALL use those values
+
+#### Scenario: Pre-built image deployment
+- **WHEN** an admin wants to use a pre-built engine image without building from source
+- **THEN** the engine release notes SHALL include the exact `docker pull` command with the versioned tag (e.g. `docker.dcglab.co.uk/dcg/kb/engine:engine-v2.1.0-nvidia`)
+
+#### Scenario: MCP allowed hosts in Compose
+- **WHEN** the kb-mcp service is defined in a Compose file
+- **THEN** the environment block SHALL include `KB_MCP_ALLOWED_HOSTS` with a comment explaining its format and purpose
+
+---
+
+### Requirement: Bind-mount data directory
+
+The engine SHALL store all persistent state (SQLite database, HF model cache, staging directory) under a single configurable data directory. This directory SHALL be mounted from the host via bind mount.
+
+#### Scenario: Data directory structure
+- **WHEN** the engine starts for the first time
+- **THEN** it SHALL create the following structure under the data directory:
+  - `kb.db` — SQLite database
+  - `hf_cache/` — HuggingFace model cache
+  - `staging/` — temporary files for queued ingestion jobs
+
+#### Scenario: Portable data across hosts
+- **WHEN** an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
+- **THEN** the engine SHALL start successfully and serve all previously ingested documents without reprocessing
+
+---
+
+### Requirement: CPU-only fallback
+
+The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.
+
+#### Scenario: No GPU available
+- **WHEN** the container starts without GPU passthrough (no `--gpus`)
+- **THEN** the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable
+
+#### Scenario: Explicit CPU mode
+- **WHEN** `KB_DEVICE=cpu` and `KB_INGEST_DEVICE=cpu` are set in the environment
+- **THEN** the engine SHALL use CPU regardless of GPU availability
@@ -0,0 +1,20 @@
+## 1. Delete ROCm files
+
+- [x] 1.1 Delete `engine/Dockerfile.rocm`
+- [x] 1.2 Delete `engine/compose.rocm.yaml`
+- [x] 1.3 Delete `assets/` directory (ROCm torch wheel)
+
+## 2. Update release pipeline
+
+- [x] 2.1 Remove ROCm image build, tag, and push from `release-engine.sh`
+- [x] 2.2 Remove ROCm entries from release notes output in `release-engine.sh`
+
+## 3. Update documentation
+
+- [x] 3.1 Remove AMD GPU quick-start section and ROCm references from `README.md`
+- [x] 3.2 Remove ROCm build instructions and `compose.rocm.yaml` references from `DEVELOPER.md`
+- [x] 3.3 Remove `onnxruntime-rocm` migration note from `DEVELOPER.md`
+
+## 4. Update specs
+
+- [x] 4.1 Update `openspec/specs/docker-deployment/spec.md` — remove AMD ROCm requirement, remove ROCm scenarios, update GPU-agnostic requirement to CPU + NVIDIA scope
@@ -0,0 +1,35 @@
+# Agent-Side Search Patterns
+
+## Purpose
+
+Documents recommended patterns for agent-side query expansion and reranking, which are caller responsibilities rather than engine features. These patterns are communicated via MCP tool descriptions.
+
+## Requirements
+
+### Requirement: Query expansion guidance in tool description
+
+The `kb_search` MCP tool description SHALL include guidance on query expansion as a recommended pattern for complex queries.
+
+#### Scenario: Tool description includes expansion pattern
+- **WHEN** an agent reads the `kb_search` tool description
+- **THEN** the description SHALL include guidance such as: "For complex queries, consider expanding into 2-3 variant phrasings and calling this tool multiple times, then deduplicating results by chunk_id"
+
+---
+
+### Requirement: Reranking guidance in tool description
+
+The `kb_search` MCP tool description SHALL include guidance on agent-side reranking as a recommended pattern for improving precision.
+
+#### Scenario: Tool description includes reranking pattern
+- **WHEN** an agent reads the `kb_search` tool description
+- **THEN** the description SHALL include guidance such as: "For precision, rerank the returned results using your own judgement based on relevance to the original question"
+
+---
+
+### Requirement: No engine-side LLM dependency
+
+The engine SHALL NOT require or use any external LLM API for search operations. Query expansion and reranking SHALL remain entirely agent-side concerns.
+
+#### Scenario: Engine has no LLM dependency
+- **WHEN** the engine is deployed without any `ANTHROPIC_API_KEY` or similar LLM API configuration
+- **THEN** all search operations SHALL function fully, with no degraded results or missing features
@@ -0,0 +1,230 @@
+## ADDED Requirements
+
+### Requirement: Common selection filter
+
+All bulk engine endpoints SHALL accept a JSON body with the following optional selection fields, combined with AND logic:
+
+- `document_ids` (list of int) — match documents with these specific IDs
+- `tags` (list of str) — match documents that have ALL specified tags
+- `doc_type` (str) — match documents with this document type
+- `from_id` (int) — match documents with id >= this value
+- `to_id` (int) — match documents with id <= this value
+
+At least one selection field MUST be present. If no selection fields are provided, the endpoint SHALL return 400 Bad Request.
+
+#### Scenario: Filter by tags and doc_type
+
+- **WHEN** a bulk endpoint receives `{"tags": ["draft"], "doc_type": "note"}`
+- **THEN** it SHALL match only documents that have the tag "draft" AND have doc_type "note"
+
+#### Scenario: Filter by ID range
+
+- **WHEN** a bulk endpoint receives `{"from_id": 10, "to_id": 50}`
+- **THEN** it SHALL match documents with id >= 10 AND id <= 50
+
+#### Scenario: Filter by explicit IDs
+
+- **WHEN** a bulk endpoint receives `{"document_ids": [1, 5, 12]}`
+- **THEN** it SHALL match only documents with those specific IDs
+
+#### Scenario: Combined filters
+
+- **WHEN** a bulk endpoint receives `{"tags": ["agent:mybot"], "doc_type": "note", "from_id": 100}`
+- **THEN** it SHALL match documents satisfying ALL three criteria
+
+#### Scenario: No selection fields provided
+
+- **WHEN** a bulk endpoint receives `{}` or `{"force": true}` with no selection fields
+- **THEN** it SHALL return 400 Bad Request
+
+### Requirement: Safety threshold
+
+All bulk endpoints SHALL enforce a safety threshold. Before executing, the engine SHALL count the matched documents and the total documents in the database. If `matched / total * 100` exceeds the configured threshold, the request SHALL be rejected with 409 Conflict.
+
+The response SHALL include: `error` ("safety_threshold_exceeded"), `message` (human-readable), `matched` (int), `total` (int), `percent` (float), and `threshold` (int).
+
+The threshold SHALL default to 70 and be configurable via the `KB_BULK_SAFETY_PERCENT` environment variable (integer 0-100). A value of 0 disables the check.
+
+The caller MAY override the threshold by including `"force": true` in the request body.
+
+#### Scenario: Threshold exceeded
+
+- **GIVEN** 1000 total documents and `KB_BULK_SAFETY_PERCENT` is 70
+- **WHEN** a bulk endpoint matches 750 documents (75%) without `force: true`
+- **THEN** it SHALL return 409 with `matched: 750`, `total: 1000`, `percent: 75.0`, `threshold: 70`
+
+#### Scenario: Threshold not exceeded
+
+- **GIVEN** 1000 total documents and `KB_BULK_SAFETY_PERCENT` is 70
+- **WHEN** a bulk endpoint matches 500 documents (50%) without `force: true`
+- **THEN** the operation SHALL proceed normally
+
+#### Scenario: Force override
+
+- **GIVEN** 1000 total documents and a match of 900 (90%)
+- **WHEN** the request includes `"force": true`
+- **THEN** the operation SHALL proceed regardless of threshold
+
+#### Scenario: Zero threshold
+
+- **GIVEN** `KB_BULK_SAFETY_PERCENT` is 0
+- **THEN** the safety check SHALL be effectively disabled for all operations
+
+### Requirement: Synchronous response with audit log
+
+All bulk endpoints SHALL execute synchronously and return a JSON response with:
+
+- `job_id` (int) — ID of the audit log entry in the jobs table
+- `status` (str) — "done" or "partial_failure"
+- `matched` (int) — number of documents that matched the selection
+- `succeeded` (int) — number of documents successfully processed
+- `failed` (int) — number of documents that failed
+- `errors` (list) — array of `{"document_id": int, "error": str}` for each failure (empty on full success)
+
+A job record SHALL be created in the jobs table with `job_type` set to the operation type. The `filename` field SHALL store a JSON representation of the selection filter. The `error` field SHALL store a JSON array of individual errors if any occurred.
+
+#### Scenario: Full success
+
+- **WHEN** a bulk operation matches 50 documents and all succeed
+- **THEN** the response SHALL have `status: "done"`, `matched: 50`, `succeeded: 50`, `failed: 0`, `errors: []`
+
+#### Scenario: Partial failure
+
+- **WHEN** a bulk operation matches 50 documents but 2 fail
+- **THEN** the response SHALL have `status: "partial_failure"`, `matched: 50`, `succeeded: 48`, `failed: 2`, and `errors` listing the 2 failures
+
+### Requirement: Bulk delete endpoint
+
+The engine SHALL expose `POST /api/v1/bulk/delete` which permanently deletes all documents matching the selection filter. For each matched document, it SHALL delete embeddings from `chunks_vec`, delete the document row (cascading to chunks and document_tags), and delete any stored file from disk.
+
+Database deletions SHALL be performed within a single transaction. File deletions SHALL occur after the transaction commits and SHALL be best-effort (failures logged but not counted as document failures).
+
+#### Scenario: Bulk delete by tag
+
+- **WHEN** `POST /api/v1/bulk/delete` receives `{"tags": ["old", "draft"]}`
+- **THEN** all documents with both tags "old" and "draft" SHALL be deleted
+- **AND** their chunks, embeddings, tag associations, and stored files SHALL be removed
+
+#### Scenario: Bulk delete with no matches
+
+- **WHEN** `POST /api/v1/bulk/delete` receives a filter that matches 0 documents
+- **THEN** the response SHALL have `matched: 0`, `succeeded: 0`, `failed: 0`
+
+### Requirement: Bulk tags endpoint
+
+The engine SHALL expose `POST /api/v1/bulk/tags` which adds and/or removes tags on all documents matching the selection filter. The request body SHALL include the selection filter plus:
+
+- `add` (list of str, optional) — tags to add
+- `remove` (list of str, optional) — tags to remove
+
+At least one of `add` or `remove` MUST be present. The endpoint SHALL return 400 if neither is provided.
+
+The endpoint SHALL update `updated_at` on all affected documents.
+
+#### Scenario: Add and remove tags in one call
+
+- **WHEN** `POST /api/v1/bulk/tags` receives `{"tags": ["agent:mybot"], "add": ["reviewed"], "remove": ["pending"]}`
+- **THEN** all documents tagged "agent:mybot" SHALL have "reviewed" added and "pending" removed
+
+### Requirement: Bulk set-tags endpoint
+
+The engine SHALL expose `POST /api/v1/bulk/set-tags` which replaces all tags on matched documents with a new set. The request body SHALL include the selection filter plus:
+
+- `new_tags` (list of str) — the replacement tag set
+
+The endpoint SHALL remove all existing tag associations from matched documents, then apply the new set. It SHALL update `updated_at` on all affected documents.
+
+#### Scenario: Replace all tags
+
+- **WHEN** `POST /api/v1/bulk/set-tags` receives `{"doc_type": "note", "new_tags": ["clean", "final"]}`
+- **THEN** all notes SHALL have their existing tags removed and replaced with "clean" and "final"
+
+### Requirement: Jobs table extension
+
+The jobs table SHALL be extended with a `job_type` column (TEXT, default "ingest") to distinguish ingestion jobs from bulk operation audit entries. Valid values: "ingest", "bulk_delete", "bulk_tags", "bulk_set_tags".
+
+Existing jobs SHALL default to `job_type = "ingest"`. The existing jobs list endpoint and CLI `kb jobs` command SHALL continue to work unchanged.
+
+#### Scenario: Migration adds column
+
+- **GIVEN** an existing database without the `job_type` column
+- **WHEN** the engine starts
+- **THEN** the column SHALL be added with default value "ingest"
+
+### Requirement: Engine config for safety threshold
+
+The engine `Config` class SHALL read `KB_BULK_SAFETY_PERCENT` from the environment as an integer (default 70, range 0-100). This value SHALL be used as the default safety threshold for all bulk endpoints.
+
+### Requirement: MCP bulk delete tool
+
+The MCP server SHALL expose a `kb_bulk_delete` tool with parameters: `document_ids` (optional list of int), `tags` (optional list of str), `doc_type` (optional str), `from_id` (optional int), `to_id` (optional int), `force` (optional bool).
+
+The tool SHALL call `POST /api/v1/bulk/delete` on the engine via the engine client and return the JSON response.
+
+The tool description SHALL clearly state that `tags` is a selection filter (which documents to delete), not tags to delete.
+
+#### Scenario: MCP bulk delete by tag
+
+- **WHEN** `kb_bulk_delete(tags=["old"])` is called
+- **THEN** the engine client SHALL send `POST /api/v1/bulk/delete` with `{"tags": ["old"]}`
+- **AND** the tool SHALL return the engine's JSON response
+
+### Requirement: MCP bulk tags tool
+
+The MCP server SHALL expose a `kb_bulk_tags` tool with parameters: `document_ids`, `tags`, `doc_type`, `from_id`, `to_id` (selection filters), plus `add` (optional list of str), `remove` (optional list of str), and `force` (optional bool).
+
+The tool description SHALL clearly distinguish `tags` (selection filter) from `add`/`remove` (tag changes to apply).
+
+#### Scenario: MCP bulk tag update
+
+- **WHEN** `kb_bulk_tags(tags=["agent:mybot"], add=["reviewed"], remove=["draft"])` is called
+- **THEN** the engine client SHALL send the appropriate `POST /api/v1/bulk/tags` request
+
+### Requirement: MCP bulk set-tags tool
+
+The MCP server SHALL expose a `kb_bulk_set_tags` tool with parameters: `document_ids`, `tags`, `doc_type`, `from_id`, `to_id` (selection filters), plus `new_tags` (list of str) and `force` (optional bool).
+
+#### Scenario: MCP bulk set tags
+
+- **WHEN** `kb_bulk_set_tags(doc_type="note", new_tags=["clean"])` is called
+- **THEN** the engine client SHALL send `POST /api/v1/bulk/set-tags` with `{"doc_type": "note", "new_tags": ["clean"]}`
+
+### Requirement: MCP engine client bulk methods
+
+The MCP engine client (`mcp/engine.py`) SHALL provide three new methods:
+
+- `bulk_delete(document_ids?, tags?, doc_type?, from_id?, to_id?, force?)` → dict
+- `bulk_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, add?, remove?, force?)` → dict
+- `bulk_set_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, new_tags?, force?)` → dict
+
+Each SHALL send a POST request to the corresponding `/api/v1/bulk/*` endpoint with the parameters as a JSON body. Each SHALL raise on non-2xx status codes, consistent with existing methods.
+
+### Requirement: CLI bulk-remove command
+
+The CLI SHALL expose a `kb bulk-remove` command with flags: `--tags` (comma-separated), `--type`, `--ids` (comma-separated), `--from-id`, `--to-id`, `--force`/`-f`, `--yes`/`-y`.
+
+Without `--yes`, the CLI SHALL first display the match count and ask for interactive confirmation before proceeding.
+
+The command SHALL call `POST /api/v1/bulk/delete` with the constructed filter.
+
+#### Scenario: CLI bulk remove with confirmation
+
+- **WHEN** `kb bulk-remove --tags "draft,old" --type note` is run without `--yes`
+- **THEN** the CLI SHALL display "This will delete N documents matching: tags=[draft,old] type=note" and prompt "Proceed? [y/N]"
+
+#### Scenario: CLI bulk remove with --yes
+
+- **WHEN** `kb bulk-remove --tags "draft" --yes` is run
+- **THEN** the CLI SHALL proceed without prompting
+
+### Requirement: CLI bulk-tag command
+
+The CLI SHALL expose a `kb bulk-tag` command with the same filter flags as `bulk-remove`, plus `--add` and `--remove` (comma-separated tag lists).
+
+The command SHALL call `POST /api/v1/bulk/tags` with the constructed filter and tag changes.
+
+### Requirement: CLI bulk-set-tags command
+
+The CLI SHALL expose a `kb bulk-set-tags` command with the filter flags, plus `--set` (comma-separated list of replacement tags).
+
+The command SHALL call `POST /api/v1/bulk/set-tags` with the constructed filter and `new_tags`.
@@ -0,0 +1,50 @@
+### Requirement: DEVELOPER.md exists at repo root
+The repository SHALL have a `DEVELOPER.md` file at the project root containing all developer-facing documentation.
+
+#### Scenario: File exists
+- **WHEN** a developer navigates to the repository root
+- **THEN** a `DEVELOPER.md` file SHALL be present
+
+### Requirement: DEVELOPER.md contains build-from-source instructions
+DEVELOPER.md SHALL contain instructions for building both the engine and client from source.
+
+#### Scenario: Engine build from source
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include instructions for starting the engine from source using compose files (NVIDIA and CPU)
+
+#### Scenario: Client build from source
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include instructions for building the client binary from source using `make build` and `make all`
+
+### Requirement: DEVELOPER.md contains release process
+DEVELOPER.md SHALL document the release process for both client and engine, including release scripts, version bumping, and Docker image tagging.
+
+#### Scenario: Client release documentation
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include `release-client.sh` usage with flag options (--gitea, --github, --minor, --no-increment, --dry-run)
+
+#### Scenario: Engine release documentation
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include `release-engine.sh` usage with flag options and Docker image tag conventions
+
+#### Scenario: Version checking
+- **WHEN** a developer reads DEVELOPER.md
+- **THEN** it SHALL include how to check client and engine versions
+
+### Requirement: README.md excludes developer-only content
+README.md SHALL NOT contain build-from-source instructions, release processes, or developer-only notes.
+
+#### Scenario: No from-source build steps in README
+- **WHEN** a user reads README.md
+- **THEN** there SHALL be no "From source" subsections under engine or client installation
+
+#### Scenario: No release section in README
+- **WHEN** a user reads README.md
+- **THEN** there SHALL be no "Building and releasing" section
+
+### Requirement: README.md cross-references DEVELOPER.md
+README.md SHALL include a link to DEVELOPER.md for users who want to build from source or contribute.
+
+#### Scenario: Developer link in quick start
+- **WHEN** a user reads the Quick Start section of README.md
+- **THEN** there SHALL be a note pointing to DEVELOPER.md for building from source
@@ -2,7 +2,7 @@

 ## Purpose

-Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA and AMD platforms, along with Compose files for single-command deployment.
+Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA, along with Compose files for single-command deployment.

 ## Requirements

@@ -20,26 +20,12 @@ The project SHALL provide a `Dockerfile.nvidia` that builds the engine on an NVI

 ---

-### Requirement: AMD ROCm Docker image
-
-The project SHALL provide a `Dockerfile.rocm` that builds the engine on an AMD ROCm base image with GPU support for PyTorch and ONNX Runtime.
-
-#### Scenario: Build ROCm image
- **WHEN** an admin runs `docker compose -f compose.rocm.yaml build`
- **THEN** the build SHALL produce a working image with ROCm runtime, PyTorch with ROCm support, onnxruntime-rocm, and all engine dependencies
-
-#### Scenario: GPU access in ROCm container
- **WHEN** the ROCm container starts with `--device=/dev/kfd --device=/dev/dri`
- **THEN** `torch.cuda.is_available()` SHALL return True (via HIP) and the engine SHALL load the embedding model on GPU
-
---
-
 ### Requirement: Application code is GPU-vendor-agnostic

-The Python engine code SHALL NOT reference CUDA or ROCm directly. GPU vendor abstraction SHALL be handled entirely at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and AMD images without modification.
+The Python engine code SHALL NOT reference CUDA directly. GPU abstraction SHALL be handled at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and CPU images without modification.

 #### Scenario: Same engine code on both platforms
- **WHEN** the engine starts on an NVIDIA image and an AMD image with identical configuration
+- **WHEN** the engine starts on an NVIDIA image and a CPU image with identical configuration
 - **THEN** both SHALL load the model, accept requests, and return identical search results for the same query and data

 ---
@@ -59,10 +45,6 @@ The engine SHALL store all persistent state (SQLite database, HF model cache, st
 - **WHEN** an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
 - **THEN** the engine SHALL start successfully and serve all previously ingested documents without reprocessing

-#### Scenario: Portable data across GPU vendors
- **WHEN** an admin moves the data directory from an NVIDIA host to an AMD host (same model name)
- **THEN** the engine SHALL start successfully. Embeddings in the database remain valid (they are model-specific, not GPU-vendor-specific)
-
 ---

 ### Requirement: Compose files for deployment
@@ -73,22 +55,52 @@ The project SHALL provide Docker Compose files for single-command deployment. Co
 - **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
 - **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port

-#### Scenario: Start ROCm deployment
- **WHEN** an admin runs `docker compose -f compose.rocm.yaml up -d`
- **THEN** the engine SHALL start with GPU access via ROCm device passthrough, bind-mount the data directory, and be reachable on the configured port
-
 #### Scenario: Automatic restart
 - **WHEN** the engine process crashes or the host reboots
 - **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)

 #### Scenario: Configure via environment
- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, etc.)
- **THEN** the engine SHALL use those values
+- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, KB_MCP_ALLOWED_HOSTS, etc.)
+- **THEN** the engine and MCP server SHALL use those values

 #### Scenario: Pre-built image deployment
 - **WHEN** an admin wants to use a pre-built engine image without building from source
 - **THEN** the engine release notes SHALL include the exact `docker pull` command with the versioned tag (e.g. `docker.dcglab.co.uk/dcg/kb/engine:engine-v2.1.0-nvidia`)

+#### Scenario: MCP allowed hosts in Compose
+- **WHEN** the kb-mcp service is defined in a Compose file
+- **THEN** the environment block SHALL include `KB_MCP_ALLOWED_HOSTS` with a comment explaining its format and purpose
+
+---
+
+### Requirement: Configurable MCP allowed hosts
+
+The MCP server SHALL accept a `KB_MCP_ALLOWED_HOSTS` environment variable containing a comma-separated list of additional hosts (IP addresses or FQDNs) that are permitted to connect. The server SHALL always allow `127.0.0.1`, `localhost`, and `[::1]` regardless of this setting. DNS rebinding protection SHALL always be enabled.
+
+#### Scenario: Remote client connects with allowed host
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50` and a client connects with `Host: 192.168.1.50:3000`
+- **THEN** the server SHALL accept the request and process it normally
+
+#### Scenario: Remote client connects with disallowed host
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50` and a client connects with `Host: 10.0.0.99:3000`
+- **THEN** the server SHALL return HTTP 421 "Invalid Host header"
+
+#### Scenario: Multiple allowed hosts
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50,kb.example.com`
+- **THEN** the server SHALL accept requests with `Host` matching either `192.168.1.50` or `kb.example.com` on any port
+
+#### Scenario: Variable unset or empty
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is unset or empty
+- **THEN** the server SHALL allow only localhost addresses (`127.0.0.1`, `localhost`, `[::1]`) with any port
+
+#### Scenario: Localhost always allowed
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50`
+- **THEN** the server SHALL still accept requests with `Host: localhost:3000` or `Host: 127.0.0.1:3000`
+
+#### Scenario: Allowed origins derived from allowed hosts
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` includes `192.168.1.50`
+- **THEN** the server SHALL accept `Origin: http://192.168.1.50:3000` (and any port) in addition to localhost origins
+
 ---

 ### Requirement: CPU-only fallback
@@ -96,7 +108,7 @@ The project SHALL provide Docker Compose files for single-command deployment. Co
 The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.

 #### Scenario: No GPU available
- **WHEN** the container starts without GPU passthrough (no `--gpus`, no `/dev/kfd`)
+- **WHEN** the container starts without GPU passthrough (no `--gpus`)
 - **THEN** the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable

 #### Scenario: Explicit CPU mode
@@ -150,15 +150,19 @@ The engine SHALL provide endpoints to list, inspect, remove, and download origin

 #### Scenario: List documents
 - **WHEN** a client sends `GET /api/v1/documents`
- **THEN** the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, and created_at
+- **THEN** the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, created_at, and updated_at

 #### Scenario: List documents with filters
 - **WHEN** a client sends `GET /api/v1/documents?type=pdf&tags=manual`
 - **THEN** the engine SHALL return only documents matching all specified filters

+#### Scenario: List documents sorted by most recent
+- **WHEN** a client requests documents sorted by date
+- **THEN** the engine SHALL use `COALESCE(updated_at, created_at)` for ordering, so un-mutated documents sort by creation time and mutated documents sort by their last update
+
 #### Scenario: Get document details
 - **WHEN** a client sends `GET /api/v1/documents/{id}`
- **THEN** the engine SHALL return the full document record including all chunks, their text content, and whether the original file is available (`has_file: true/false`)
+- **THEN** the engine SHALL return the full document record including all chunks, their text content, `updated_at`, and whether the original file is available (`has_file: true/false`)

 #### Scenario: Download original file
 - **WHEN** a client sends `GET /api/v1/documents/{id}/file`
@@ -174,6 +178,38 @@ The engine SHALL provide endpoints to list, inspect, remove, and download origin

 ---

+### Requirement: Note mutation endpoint
+
+The engine SHALL provide a `PATCH /api/v1/notes/{id}` endpoint for updating existing notes in place. See the `note-mutation` spec for full details.
+
+#### Scenario: Note update endpoint exists
+- **WHEN** a client sends `PATCH /api/v1/notes/42` with body `{"text": "new content"}`
+- **THEN** the engine SHALL process the update synchronously and return the updated document
+
+---
+
+### Requirement: Document updated_at tracking
+
+The engine SHALL track when documents are modified via an `updated_at` column. This column SHALL be NULL for documents that have never been updated.
+
+#### Scenario: New document has no updated_at
+- **WHEN** a document is first ingested
+- **THEN** `updated_at` SHALL be NULL and `created_at` SHALL be set to the ingestion timestamp
+
+#### Scenario: Note update sets updated_at
+- **WHEN** a note is updated via `PATCH /api/v1/notes/{id}`
+- **THEN** `updated_at` SHALL be set to the current timestamp
+
+#### Scenario: Tag change sets updated_at
+- **WHEN** tags are modified via `PUT /api/v1/documents/{id}/tags`
+- **THEN** `updated_at` SHALL be set to the current timestamp
+
+#### Scenario: Schema migration for updated_at
+- **WHEN** the engine starts against a v2 database without an `updated_at` column
+- **THEN** the engine SHALL automatically add `ALTER TABLE documents ADD COLUMN updated_at TEXT` and all existing documents SHALL have `updated_at = NULL`
+
+---
+
 ### Requirement: Tag management

 The engine SHALL provide endpoints to list all tags and manage tags on documents.
@@ -53,6 +53,9 @@ The client SHALL provide a `kb search <query>` command that sends the query to t
 #### Scenario: Human-readable search output
 - **WHEN** the user runs `kb search "how to change oil"`
 - **THEN** the client SHALL POST to `/api/v1/search`, and display results in a human-readable format showing rank, score, document title, page/section, doc type, tags, and a text snippet
+- **THEN** the client SHALL parse search results as flat objects with top-level `title`, `doc_type`, `tags`, `score`, `text`, `chunk_index` fields
+- **THEN** the client SHALL extract `page` from `chunk_metadata` when present (PDF documents)
+- **THEN** the client SHALL extract `section_header` from `chunk_metadata` when present (markdown documents)

 #### Scenario: JSON search output
 - **WHEN** the user runs `kb search "query" --format json`
@@ -66,49 +69,57 @@ The client SHALL provide a `kb search <query>` command that sends the query to t
 - **WHEN** the user runs `kb search "error" --fts-only`
 - **THEN** the client SHALL set `fts_only: true` in the request body

+#### Scenario: PDF result with page number
+- **WHEN** a search result has `chunk_metadata` containing `{"page": 12}`
+- **THEN** the human output SHALL display "Page 12" in the location line
+
+#### Scenario: Markdown result with section header
+- **WHEN** a search result has `chunk_metadata` containing `{"section_header": "Installation > Prerequisites"}`
+- **THEN** the human output SHALL display "Installation > Prerequisites" in the location line
+
+#### Scenario: Result with both page and section
+- **WHEN** a search result has `chunk_metadata` containing both `page` and `section_header`
+- **THEN** the human output SHALL display both separated by " / "
+
+#### Scenario: Result with no location metadata
+- **WHEN** a search result has empty `chunk_metadata` or no page/section keys
+- **THEN** the human output SHALL omit the location line entirely
+
 ---

-### Requirement: Implicit note shorthand
+### Requirement: Add note command

-The client SHALL treat bare string arguments (with no subcommand) as an implicit note only when **more than one argument** is provided. `kb "my note"` SHALL behave identically to submitting a note via `POST /api/v1/jobs`. All persistent flags (`--format`, `--engine`, `--api-key`) and the root `--tags` flag SHALL work with the shorthand form. A single bare word SHALL be rejected with an error message.
+The client SHALL provide a `kb addnote <text>` command that submits a text note to the engine for ingestion. The command SHALL take exactly one positional argument (the note text) and support a `--tags` flag for comma-separated tags. The note SHALL be submitted via `POST /api/v1/jobs` with the `note` field in a multipart request.

-#### Scenario: Quick note via bare argument
- **WHEN** the user runs `kb "remember to update DNS"`
+#### Scenario: Add a note
+- **WHEN** the user runs `kb addnote "remember to update DNS records"`
 - **THEN** the client SHALL submit the text as a note via `POST /api/v1/jobs` and print `Queued: note`

-#### Scenario: Bare argument with tags
- **WHEN** the user runs `kb "server room is building 3" --tags ops`
+#### Scenario: Add a note with tags
+- **WHEN** the user runs `kb addnote "server room is building 3" --tags ops`
 - **THEN** the client SHALL submit the note with the specified tags

-#### Scenario: Bare argument with JSON output
- **WHEN** the user runs `kb "my note" --format json`
+#### Scenario: Add a note with JSON output
+- **WHEN** the user runs `kb addnote "my note" --format json`
 - **THEN** the client SHALL output the raw JSON response from the engine

-#### Scenario: Bare argument duplicate detection
- **WHEN** the user runs `kb "my note"` and the engine returns HTTP 409
- **THEN** the client SHALL handle the duplicate response identically to the previous `kb add --note` behaviour
+#### Scenario: Duplicate note detection
+- **WHEN** the user runs `kb addnote "my note"` and the engine returns HTTP 409
+- **THEN** the client SHALL display the duplicate information (document ID or job ID) and exit with code 0

-#### Scenario: Multiple unquoted words
- **WHEN** the user runs `kb remember to update dns` (without quotes)
- **THEN** the client SHALL join all arguments into a single note string and submit it
+#### Scenario: Missing argument
+- **WHEN** the user runs `kb addnote` with no arguments
+- **THEN** the client SHALL display an error indicating that the note text argument is required

-#### Scenario: Single bare word rejected
- **WHEN** the user runs `kb infow` (a single unrecognized word)
- **THEN** the client SHALL print to stderr: `Unknown command "infow". Run 'kb --help' for available commands.` followed by a hint about note usage, and exit with a non-zero code
-
-#### Scenario: No interference with subcommands
- **WHEN** the user runs `kb search "query"` or any other existing subcommand
- **THEN** the client SHALL route to the subcommand as before — the implicit note shorthand SHALL NOT interfere
-
-#### Scenario: No arguments
- **WHEN** the user runs `kb` with no arguments
- **THEN** the client SHALL display the help text
+#### Scenario: Too many arguments
+- **WHEN** the user runs `kb addnote remember to update dns` (unquoted, multiple args)
+- **THEN** the client SHALL display an error indicating that exactly one argument is required, with a hint to quote the text

 ---

 ### Requirement: Add command (file and note ingestion)

-The client SHALL provide a `kb addfile` command that uploads files to the engine for async ingestion. The command SHALL validate file extensions before uploading and reject unsupported types. The client SHALL handle duplicate rejection (HTTP 409) and display the existing document information. The command SHALL NOT handle notes — notes are submitted via the implicit note shorthand (`kb "text"`).
+The client SHALL provide a `kb addfile` command that uploads files to the engine for async ingestion. The command SHALL validate file extensions before uploading and reject unsupported types. The client SHALL handle duplicate rejection (HTTP 409) and display the existing document information. Notes are handled by the separate `addnote` command — `addfile` is exclusively for file uploads.

 #### Scenario: Add a single file
 - **WHEN** the user runs `kb addfile report.pdf`
@@ -254,17 +265,43 @@ The client SHALL provide a `kb reindex` command that triggers re-embedding of al

 ---

+### Requirement: Update note command
+
+The client SHALL provide a `kb updatenote <id> <text>` command that updates an existing note's content via the engine's `PATCH /api/v1/notes/{id}` endpoint.
+
+#### Scenario: Update a note
+- **WHEN** the user runs `kb updatenote 42 "Updated note content"`
+- **THEN** the client SHALL send `PATCH /api/v1/notes/42` with body `{"text": "Updated note content"}` and display the result
+
+#### Scenario: Update a note with JSON output
+- **WHEN** the user runs `kb updatenote 42 "new content" --format json`
+- **THEN** the client SHALL output the raw JSON response from the engine
+
+#### Scenario: Update a non-existent document
+- **WHEN** the user runs `kb updatenote 999 "text"` and the engine returns HTTP 404
+- **THEN** the client SHALL display an error indicating the document was not found and exit with a non-zero code
+
+#### Scenario: Update a non-note document
+- **WHEN** the user runs `kb updatenote 42 "text"` and the engine returns HTTP 422
+- **THEN** the client SHALL display an error indicating that only notes can be updated and exit with a non-zero code
+
+#### Scenario: Missing arguments
+- **WHEN** the user runs `kb updatenote` or `kb updatenote 42` with insufficient arguments
+- **THEN** the client SHALL display usage help indicating that both document ID and text are required
+
+---
+
 ### Requirement: Engine version compatibility check

 The client SHALL verify that the connected engine meets a minimum version requirement before executing any API command. The minimum required engine version SHALL be embedded in the client binary at build time. If the engine version is below the minimum, the client SHALL print an error message and exit with a non-zero code. There SHALL be no flag to skip or suppress this check.

 #### Scenario: Compatible engine version
- **WHEN** the client connects to an engine reporting version `2.1.5` and `MinEngineVersion` is `2.1.0`
+- **WHEN** the client connects to an engine reporting version `3.0.0` and `MinEngineVersion` is `3.0.0`
 - **THEN** the client SHALL proceed with the command normally

 #### Scenario: Incompatible engine version
- **WHEN** the client connects to an engine reporting version `2.0.3` and `MinEngineVersion` is `2.1.0`
- **THEN** the client SHALL print to stderr: `Error: kb client vX.Y.Z requires engine v2.1.0+ (connected engine is v2.0.3)` followed by an upgrade hint, and exit with code 1
+- **WHEN** the client connects to an engine reporting version `2.1.0` and `MinEngineVersion` is `3.0.0`
+- **THEN** the client SHALL print to stderr: `Error: kb client vX.Y.Z requires engine v3.0.0+ (connected engine is v2.1.0)` followed by an upgrade hint, and exit with code 1

 #### Scenario: Engine unreachable during version check
 - **WHEN** the client cannot reach the engine's `/api/v1/status` endpoint
@@ -0,0 +1,198 @@
+# MCP Server
+
+## Purpose
+
+The MCP server provides a Model Context Protocol interface to the kb engine, exposing knowledge base operations as native MCP tools over Streamable HTTP transport. It runs as a separate Docker container alongside the engine, translating MCP tool calls into engine HTTP API calls.
+
+## Requirements
+
+### Requirement: MCP server transport and deployment
+
+The MCP server SHALL expose tools via Streamable HTTP transport. It SHALL run as a Docker container, configured to connect to the kb engine's HTTP API. It SHALL read `KB_ENGINE_URL` and `KB_API_KEY` from environment variables to connect to the engine.
+
+#### Scenario: MCP server starts and connects to engine
+- **WHEN** the MCP server container starts with `KB_ENGINE_URL=http://engine:8000` and `KB_API_KEY=secret`
+- **THEN** it SHALL begin accepting MCP connections over Streamable HTTP and use the configured URL and API key for all engine API calls
+
+#### Scenario: Engine unreachable at startup
+- **WHEN** the MCP server starts but cannot reach the engine at `KB_ENGINE_URL`
+- **THEN** it SHALL start and accept connections, but tool calls SHALL return errors indicating the engine is unreachable
+
+#### Scenario: Docker Compose deployment
+- **WHEN** the MCP server is deployed via Docker Compose alongside the engine
+- **THEN** it SHALL connect to the engine via the Docker network using the service name (e.g. `http://engine:8000`)
+
+---
+
+### Requirement: MCP server authentication
+
+The MCP server SHALL require Bearer token authentication from calling agents via the `KB_MCP_API_KEY` environment variable. This is independent of the engine's `KB_API_KEY`.
+
+#### Scenario: Valid MCP API key
+- **WHEN** `KB_MCP_API_KEY` is set and a calling agent provides a matching Bearer token
+- **THEN** the MCP server SHALL process the request normally
+
+#### Scenario: Missing MCP API key when required
+- **WHEN** `KB_MCP_API_KEY` is set and a calling agent connects without a Bearer token
+- **THEN** the MCP server SHALL reject the connection with an authentication error
+
+#### Scenario: Invalid MCP API key
+- **WHEN** `KB_MCP_API_KEY` is set and a calling agent provides a non-matching Bearer token
+- **THEN** the MCP server SHALL reject the connection with an authentication error
+
+#### Scenario: MCP auth disabled
+- **WHEN** `KB_MCP_API_KEY` is not set
+- **THEN** the MCP server SHALL accept all connections without authentication
+
+---
+
+### Requirement: Search tool
+
+The MCP server SHALL expose a `kb_search` tool that queries the knowledge base via the engine's search API.
+
+#### Scenario: Basic search
+- **WHEN** an agent calls `kb_search` with `{"query": "pension revaluation", "top": 5}`
+- **THEN** the MCP server SHALL POST to the engine's `/api/v1/search` endpoint and return the results with chunk text, scores, document metadata, and tags
+
+#### Scenario: Search with tag filter
+- **WHEN** an agent calls `kb_search` with `{"query": "email preferences", "tags": ["agent:mybot"]}`
+- **THEN** the MCP server SHALL include the tags in the filter and POST to the engine's search endpoint
+
+#### Scenario: Search with mode override
+- **WHEN** an agent calls `kb_search` with `{"query": "error log", "fts_only": true}`
+- **THEN** the MCP server SHALL pass `fts_only: true` to the engine search endpoint
+
+---
+
+### Requirement: Add note tool
+
+The MCP server SHALL expose a `kb_addnote` tool that submits a text note to the engine for ingestion.
+
+#### Scenario: Add a note
+- **WHEN** an agent calls `kb_addnote` with `{"text": "User prefers concise responses"}`
+- **THEN** the MCP server SHALL submit the note to the engine's `POST /api/v1/jobs` endpoint and return the job ID
+
+#### Scenario: Add a note with tags
+- **WHEN** an agent calls `kb_addnote` with `{"text": "User prefers concise responses", "tags": ["agent:mybot", "feedback"]}`
+- **THEN** the MCP server SHALL submit the note with exactly those tags to the engine
+
+---
+
+### Requirement: Chunked file upload tools
+
+The MCP server SHALL expose a three-step chunked file upload pattern for transferring files from remote agents to the engine.
+
+#### Scenario: Start an upload
+- **WHEN** an agent calls `kb_upload_start` with `{"filename": "report.pdf", "total_size": 5242880, "tags": ["insurance"]}`
+- **THEN** the MCP server SHALL create a staging entry, generate a UUID `upload_id`, and return `{"upload_id": "<uuid>"}`
+
+#### Scenario: Upload a chunk
+- **WHEN** an agent calls `kb_upload_chunk` with `{"upload_id": "<uuid>", "data": "<base64-encoded-data>", "chunk_index": 0}`
+- **THEN** the MCP server SHALL decode the base64 data and write it to the staging area for the given upload
+
+#### Scenario: Upload multiple chunks in sequence
+- **WHEN** an agent calls `kb_upload_chunk` multiple times with sequential `chunk_index` values for the same `upload_id`
+- **THEN** the MCP server SHALL store each chunk and track the sequence
+
+#### Scenario: Finish an upload
+- **WHEN** an agent calls `kb_upload_finish` with `{"upload_id": "<uuid>"}`
+- **THEN** the MCP server SHALL reassemble the chunks in order, forward the complete file as a multipart upload to the engine's `POST /api/v1/jobs` endpoint with the tags from `kb_upload_start`, and return the job ID
+
+#### Scenario: Upload with invalid upload_id
+- **WHEN** an agent calls `kb_upload_chunk` or `kb_upload_finish` with an `upload_id` that does not exist
+- **THEN** the MCP server SHALL return an error indicating the upload ID is not found
+
+#### Scenario: Abandoned upload cleanup
+- **WHEN** an agent starts an upload but does not call `kb_upload_finish` within 10 minutes
+- **THEN** the MCP server SHALL clean up the staged chunks and remove the upload tracking entry
+
+#### Scenario: MCP server restart during upload
+- **WHEN** the MCP server container restarts while an upload is in progress
+- **THEN** the in-progress upload SHALL be lost and the agent SHALL need to restart from `kb_upload_start`
+
+---
+
+### Requirement: Update note tool
+
+The MCP server SHALL expose a `kb_update_note` tool that updates an existing note in place via the engine's note mutation endpoint.
+
+#### Scenario: Update an existing note
+- **WHEN** an agent calls `kb_update_note` with `{"document_id": 42, "text": "Updated preference: user prefers bullet points"}`
+- **THEN** the MCP server SHALL send `PATCH /api/v1/notes/42` to the engine and return the updated document
+
+#### Scenario: Update a non-existent document
+- **WHEN** an agent calls `kb_update_note` with a `document_id` that does not exist
+- **THEN** the MCP server SHALL return an error indicating the document was not found
+
+#### Scenario: Update a non-note document
+- **WHEN** an agent calls `kb_update_note` with a `document_id` that refers to a PDF
+- **THEN** the MCP server SHALL return an error indicating that only notes can be updated
+
+---
+
+### Requirement: Get document tool
+
+The MCP server SHALL expose a `kb_get` tool that retrieves document details from the engine.
+
+#### Scenario: Get by document ID
+- **WHEN** an agent calls `kb_get` with `{"document_id": 42}`
+- **THEN** the MCP server SHALL fetch `GET /api/v1/documents/42` and return the document details with chunks
+
+#### Scenario: Get by source path
+- **WHEN** an agent calls `kb_get` with `{"source_path": "memory/feedback_testing.md"}`
+- **THEN** the MCP server SHALL query the engine's documents endpoint filtered by source path and return matching documents
+
+---
+
+### Requirement: Status tool
+
+The MCP server SHALL expose a `kb_status` tool that returns engine health and statistics.
+
+#### Scenario: Get engine status
+- **WHEN** an agent calls `kb_status` with no parameters
+- **THEN** the MCP server SHALL fetch `GET /api/v1/status` and return engine version, model info, device info, document counts, and queue state
+
+---
+
+### Requirement: Jobs tool
+
+The MCP server SHALL expose a `kb_jobs` tool that returns ingestion job status.
+
+#### Scenario: List recent jobs
+- **WHEN** an agent calls `kb_jobs` with no parameters
+- **THEN** the MCP server SHALL fetch `GET /api/v1/jobs` and return the list of recent jobs
+
+#### Scenario: Filter jobs by status
+- **WHEN** an agent calls `kb_jobs` with `{"status": "failed"}`
+- **THEN** the MCP server SHALL fetch `GET /api/v1/jobs?status=failed` and return matching jobs
+
+---
+
+### Requirement: Delete document tool
+
+The MCP server SHALL expose a `kb_delete` tool that permanently deletes a document from the knowledge base. The tool SHALL accept a `document_id` (required integer). Deletion SHALL remove the document, its chunks, embeddings, tags, and any stored file on disk.
+
+The tool SHALL return a confirmation response including the deleted document's ID and title.
+
+#### Scenario: Successful deletion
+- **WHEN** `kb_delete` is called with `document_id=42`
+- **THEN** the document, its chunks, embeddings, tag associations, and stored file SHALL be deleted
+- **AND** the response SHALL include `"status": "deleted"`, the `document_id`, and the document `title`
+
+#### Scenario: Document not found
+- **WHEN** `kb_delete` is called with a `document_id` that does not exist
+- **THEN** the tool SHALL return an error response indicating the document was not found
+
+---
+
+### Requirement: Tags-only document organisation
+
+The MCP server SHALL NOT maintain any collection abstraction. Documents SHALL be returned as-is from the engine with all tags visible. No tag stripping or collection field injection SHALL occur. Namespace isolation (e.g. separating agent memory from user documents) is achieved via tag conventions communicated through system prompts or tool descriptions.
+
+#### Scenario: Search results show all tags
+- **WHEN** `kb_search` is called and a result has tags `["agent:mybot", "collection:documents", "draft"]`
+- **THEN** all three tags SHALL be returned as-is — no stripping of `collection:*` tags
+
+#### Scenario: Add note with explicit tags only
+- **WHEN** `kb_addnote(text="hello", tags=["agent:mybot", "memory"])` is called
+- **THEN** the note SHALL be created with exactly those two tags — no default tags added
@@ -0,0 +1,43 @@
+# Note Mutation
+
+## Purpose
+
+Note mutation allows existing notes to be updated in place without requiring delete and re-add, preserving document identity (ID, creation timestamp) while updating content, embeddings, and the full-text index.
+
+## Requirements
+
+### Requirement: Note update endpoint
+
+The engine SHALL provide a `PATCH /api/v1/notes/{id}` endpoint that accepts new text for an existing note, re-chunks and re-embeds it, and returns the updated document.
+
+#### Scenario: Update an existing note
+- **WHEN** a client sends `PATCH /api/v1/notes/42` with body `{"text": "Updated note content"}`
+- **THEN** the engine SHALL delete existing chunks and embeddings for document 42, run the new text through the note chunking pipeline, generate embeddings for each chunk, insert new chunks and embeddings, update the document's `content_hash` and `updated_at`, and return the updated document with HTTP 200
+
+#### Scenario: Update preserves document identity
+- **WHEN** a note is updated via PATCH
+- **THEN** the document SHALL retain its original `id` and `created_at` values, and `updated_at` SHALL be set to the current timestamp
+
+#### Scenario: Update with long text that produces multiple chunks
+- **WHEN** a client sends `PATCH /api/v1/notes/42` with text longer than the embedding model's token window
+- **THEN** the engine SHALL chunk the text using the same note chunking pipeline as ingestion, producing multiple chunks, and embed each chunk separately
+
+#### Scenario: Update a non-existent document
+- **WHEN** a client sends `PATCH /api/v1/notes/999` and document 999 does not exist
+- **THEN** the engine SHALL return HTTP 404
+
+#### Scenario: Update a non-note document
+- **WHEN** a client sends `PATCH /api/v1/notes/42` and document 42 has `doc_type = 'pdf'`
+- **THEN** the engine SHALL return HTTP 422 with an error indicating that only notes can be updated via this endpoint
+
+#### Scenario: Embedding failure during update
+- **WHEN** a client sends `PATCH /api/v1/notes/42` but the embedding step fails
+- **THEN** the engine SHALL roll back the entire transaction, preserving the original note content, chunks, and embeddings, and return HTTP 500
+
+#### Scenario: FTS5 index updated on note mutation
+- **WHEN** a note is updated via PATCH
+- **THEN** the FTS5 virtual table SHALL be updated via the existing chunk triggers (`chunks_ad` for deletes, `chunks_ai` for inserts), keeping the full-text index consistent with the new content
+
+#### Scenario: Tags preserved on update
+- **WHEN** a note with tags `["feedback", "collection:memory"]` is updated via PATCH
+- **THEN** the document's tags SHALL be unchanged — only the text content, chunks, and embeddings are replaced
@@ -111,9 +111,11 @@ else
    echo "==> Engine version: $VERSION (no increment)"
 fi

-TAG="engine-v${VERSION}"
+GIT_TAG="engine-v${VERSION}"
+DOCKER_TAG="v${VERSION}"

-echo "    Tag:       $TAG"
+echo "    Git tag:   $GIT_TAG"
+echo "    Image tag: $DOCKER_TAG"
 echo "    Registry:  $IMAGE_BASE"
 echo "    Forge CLI: $FORGE"
 echo "    Dry run:   $DRY_RUN"
@@ -125,8 +127,8 @@ echo ""
 echo "==> Pre-flight checks"

 if [[ "$DRY_RUN" == false ]]; then
-    if git -C "$SCRIPT_DIR" rev-parse "$TAG" &>/dev/null; then
-        echo "Error: tag $TAG already exists"
+    if git -C "$SCRIPT_DIR" rev-parse "$GIT_TAG" &>/dev/null; then
+        echo "Error: tag $GIT_TAG already exists"
        exit 1
    fi
 fi
@@ -148,29 +150,45 @@ fi
 #──────────────────────────────────────────────────────────────────────
 echo "==> Building Docker engine images ($VERSION)"

-NVIDIA_IMAGE="${IMAGE_BASE}/engine:${TAG}-nvidia"
-ROCM_IMAGE="${IMAGE_BASE}/engine:${TAG}-rocm"
+NVIDIA_IMAGE="${IMAGE_BASE}/engine:${DOCKER_TAG}-nvidia"
+CPU_IMAGE="${IMAGE_BASE}/engine:${DOCKER_TAG}-cpu"
 NVIDIA_LATEST="${IMAGE_BASE}/engine:latest-nvidia"
-ROCM_LATEST="${IMAGE_BASE}/engine:latest-rocm"
+CPU_LATEST="${IMAGE_BASE}/engine:latest-cpu"

 run docker build -t "$NVIDIA_IMAGE" -t "$NVIDIA_LATEST" -f "$ENGINE_DIR/Dockerfile.nvidia" "$ENGINE_DIR"
-run docker build -t "$ROCM_IMAGE" -t "$ROCM_LATEST" -f "$ENGINE_DIR/Dockerfile.rocm" "$ENGINE_DIR"
+run docker build -t "$CPU_IMAGE" -t "$CPU_LATEST" -f "$ENGINE_DIR/Dockerfile.cpu" "$ENGINE_DIR"

 echo ""

+#──────────────────────────────────────────────────────────────────────
+# 3b. Build Docker MCP server image
+#──────────────────────────────────────────────────────────────────────
+MCP_DIR="$SCRIPT_DIR/mcp"
+
+if [[ -f "$MCP_DIR/Dockerfile" ]]; then
+    echo "==> Building Docker MCP server image ($VERSION)"
+
+    MCP_IMAGE="${IMAGE_BASE}/mcp:${DOCKER_TAG}"
+    MCP_LATEST="${IMAGE_BASE}/mcp:latest"
+
+    run docker build -t "$MCP_IMAGE" -t "$MCP_LATEST" -f "$MCP_DIR/Dockerfile" "$MCP_DIR"
+
+    echo ""
+fi
+
 #──────────────────────────────────────────────────────────────────────
 # 4. Commit, tag, and push
 #──────────────────────────────────────────────────────────────────────
-echo "==> Committing and tagging $TAG"
+echo "==> Committing and tagging $GIT_TAG"

 if [[ "$INCREMENT" == true ]]; then
    run git -C "$SCRIPT_DIR" add "$VERSION_FILE"
    run git -C "$SCRIPT_DIR" commit -m "Bump engine version to $VERSION"
 fi

-run git -C "$SCRIPT_DIR" tag -a "$TAG" -m "Release $TAG"
+run git -C "$SCRIPT_DIR" tag -a "$GIT_TAG" -m "Release $GIT_TAG"
 run git -C "$SCRIPT_DIR" push origin HEAD
-run git -C "$SCRIPT_DIR" push origin "$TAG"
+run git -C "$SCRIPT_DIR" push origin "$GIT_TAG"

 echo ""

@@ -179,25 +197,31 @@ echo ""
 #──────────────────────────────────────────────────────────────────────
 echo "==> Creating release via $FORGE"

-RELEASE_TITLE="Engine $TAG"
+RELEASE_TITLE="Engine $GIT_TAG"
 RELEASE_NOTES="## Docker images

 \`\`\`bash
 # NVIDIA GPU
 docker pull ${NVIDIA_IMAGE}

-# AMD GPU (ROCm)
-docker pull ${ROCM_IMAGE}
+# CPU only
+docker pull ${CPU_IMAGE}
+\`\`\`
+
+## MCP server
+
+\`\`\`bash
+docker pull ${MCP_IMAGE:-${IMAGE_BASE}/mcp:${DOCKER_TAG}}
 \`\`\`"

 if [[ "$FORGE" == "gh" ]]; then
-    run gh release create "$TAG" \
+    run gh release create "$GIT_TAG" \
        --title "$RELEASE_TITLE" \
        --notes "$RELEASE_NOTES"

 elif [[ "$FORGE" == "tea" ]]; then
    run tea release create \
-        --tag "$TAG" \
+        --tag "$GIT_TAG" \
        --title "$RELEASE_TITLE" \
        --note "$RELEASE_NOTES"
 fi
@@ -211,12 +235,20 @@ echo "==> Pushing Docker images to $REGISTRY"

 run docker push "$NVIDIA_IMAGE"
 run docker push "$NVIDIA_LATEST"
-run docker push "$ROCM_IMAGE"
-run docker push "$ROCM_LATEST"
+run docker push "$CPU_IMAGE"
+run docker push "$CPU_LATEST"
+
+if [[ -n "${MCP_IMAGE:-}" ]]; then
+    run docker push "$MCP_IMAGE"
+    run docker push "$MCP_LATEST"
+fi

 echo ""
-echo "==> Release $TAG complete!"
+echo "==> Release $GIT_TAG complete!"
 echo ""
 echo "    Images:"
 echo "      $NVIDIA_IMAGE"
-echo "      $ROCM_IMAGE"
+echo "      $CPU_IMAGE"
+if [[ -n "${MCP_IMAGE:-}" ]]; then
+    echo "      $MCP_IMAGE"
+fi
Author	SHA1	Message	Date
steve	d44d11e4fe	Bump engine version to 3.2.2	2026-04-14 21:48:55 +01:00
steve	574370e8d1	Remove AMD ROCm support — CPU and NVIDIA only BREAKING: Remove Dockerfile.rocm, compose.rocm.yaml, and ROCm image build/push from the release pipeline. Remove AMD quick-start and ROCm references from README and DEVELOPER docs. Update docker-deployment and developer-docs specs to reflect CPU + NVIDIA only. The ROCm variant added significant complexity (4.2GB torch wheel, >20GB container) with limited usage. Users on AMD GPUs should stay on engine v3.2.x or switch to CPU mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:39:37 +01:00
steve	17b19999de	Switch nvidia and rocm Dockerfiles from onnxruntime to torch Nvidia: install torch+torchvision from PyTorch cu130 index, drop onnxruntime-gpu. ROCm: use local torch wheel with rocm6.4 index for torchvision, clean up nvidia remnants from the venv. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:13:41 +01:00
steve	bb78f4ea80	Fix 500 error on notes with slashes in title, bump engine to 3.2.1 Sanitize / and \ in note titles and filenames when writing to the staging directory — a title like "/reset skill" was interpreted as a path separator, causing a FileNotFoundError and a 500 from the jobs endpoint. Also add PRAGMA busy_timeout=5000 to SQLite connections to prevent immediate failure under concurrent write load. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:12:58 +01:00
steve	223ff2cf5d	Latest changes all archived	2026-04-04 22:50:19 +01:00
steve	e9a282ddb1	Document KB_BULK_SAFETY_PERCENT in README, DEVELOPER, MCP, and SKILL docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 22:43:42 +01:00
steve	b5a203d2aa	Add bulk operations and remove collections abstraction - Add bulk delete, bulk tags, and bulk set-tags engine endpoints (POST /api/v1/bulk/delete, /bulk/tags, /bulk/set-tags) - Filter-based selection: by tags, doc_type, ID list, ID range - Safety threshold (KB_BULK_SAFETY_PERCENT, default 70%) prevents accidental mass operations unless force=true - Synchronous execution with audit trail via jobs table - Add kb_bulk_delete, kb_bulk_tags, kb_bulk_set_tags MCP tools - Add kb bulk-remove, bulk-tag, bulk-set-tags CLI commands - Remove collection abstraction from MCP server (use tags instead) - Remove kb_set_collection MCP tool - Update SKILL.md, MCP.md, README.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 22:34:47 +01:00
steve	0c124c4ab7	Bump engine version to 3.0.1	2026-04-04 12:42:32 +01:00
steve	da5b8435bc	Add configurable allowed hosts for MCP remote access (KB_MCP_ALLOWED_HOSTS) The MCP SDK's DNS rebinding protection rejects remote clients with 421 when the Host header isn't in the allowlist. Add KB_MCP_ALLOWED_HOSTS env var (comma-separated IPs/FQDNs) to configure additional allowed hosts while keeping localhost always permitted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 12:39:43 +01:00
steve	e39e00a2c0	Add MCP auth status to kb_status and update server instructions - kb_status now returns authenticated: true/false so clients can verify auth - Server instructions mention Bearer token auth requirement - Add .env, .venv/, test_mcp_client.py to .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 12:04:12 +01:00
steve	d078af9ad3	Split MCP docs into MCP.md with AI tool setup examples Move MCP server documentation from README into dedicated MCP.md. Add configuration examples for Claude Code, VS Code, Cursor, Windsurf, and JetBrains IDEs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 22:03:41 +01:00
steve	b3dce188e1	Fix version check failing on non-200 status responses When the engine returns 401 (auth required) or other non-200 responses, the version check was parsing the error body, getting an empty version string, and fatally exiting. Now skips the check on non-200 responses and lets the actual API call surface the real error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 21:52:24 +01:00
steve	0dc3065979	Update README for v3.0.0 — add MCP server docs, updatenote, fix version refs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 21:45:31 +01:00
steve	e7136a4a20	Add MCP server, note mutation endpoint, and updated_at tracking (v3.0.0) New MCP server (mcp/) exposes kb operations as native MCP tools over Streamable HTTP with Bearer token auth. Supports collections via tag conventions, chunked file uploads, and agent-side search patterns. Engine gains PATCH /api/v1/notes/{id} for in-place note updates with transactional re-chunk/re-embed, and updated_at column on documents. Go client adds updatenote command and Patch HTTP method. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 21:34:55 +01:00
steve	adeba21712	Bump client version to 2.2.1	2026-04-02 16:18:06 +01:00
steve	2d179af557	Fix search human-mode output to match engine API response The Go client struct expected a nested document object and top-level page/section fields, but the engine returns flat results with metadata in chunk_metadata. This caused empty display for title, type, tags, page, and section in human output mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 16:17:35 +01:00
steve	a6bab5e55e	Add CPU-only Docker image and fix release tag naming - Add Dockerfile.cpu and compose.cpu.yaml for CPU-only deployments - Use sentence-transformers[onnx] + CPU-only torch for ~4x smaller image - Fix release script: separate git tags (engine-v) from Docker tags (v) - Add CPU image to release build/push pipeline - Update README with CPU deployment instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 16:02:00 +01:00
steve	c5191df9c0	Bump client version to 2.2.0	2026-03-31 20:50:17 +01:00
steve	afbe270181	Replace implicit note shorthand with explicit addnote command and split README Two changes: 1. structured-add-commands: The implicit note shorthand (kb "text") caused accidental note creation from mistyped commands. Replaced with explicit kb addnote <text> command. Root command reverts to standard Cobra behaviour. Updated examples, tests, SKILL.md, and specs. 2. split-readme-developer-docs: Moved build-from-source instructions, release process, API reference, and ROCm migration notes from README.md into a new DEVELOPER.md. README now links to DEVELOPER.md for dev workflows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 20:48:22 +01:00
steve	9e957f1a9a	Added pycache to gitignore	2026-03-30 07:26:16 +01:00
steve	bbe6a5e909	Add dev-up script and archive kb-title-in-chunks change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 07:25:22 +01:00
@@ -1 +1 @@
 .0.0
 .2.0
@@ -1 +1 @@
 .1.1
 .2.0
@@ -1 +1 @@
 .1.0
 .2.2