Latest changes all archived

2026-04-04 22:50:19 +01:00
parent e9a282ddb1
commit 223ff2cf5d
31 changed files with 748 additions and 7 deletions
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-04-04
@@ -0,0 +1,39 @@
+## Context
+
+The MCP server (`mcp/server.py`) exposes KB operations as tools for LLM clients. Collections are an abstraction over tags — internally stored with a `collection:` prefix. The server already has helpers for managing collection tags (`_collection_tag`, `_ensure_exclusive_collection`, `_strip_collection_tags`) and the engine client (`mcp/engine.py`) already has an `update_tags()` method.
+
+Document deletion is supported by the engine API at `DELETE /api/v1/documents/{doc_id}` but has no corresponding engine client method or MCP tool.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Expose collection assignment for existing documents via MCP (`kb_set_collection`)
+- Expose document deletion via MCP (`kb_delete`)
+- Follow existing patterns in `server.py` and `engine.py`
+
+**Non-Goals:**
+- Bulk operations (multi-document collection assignment or deletion)
+- Tag management beyond collections (direct tag add/remove via MCP)
+- Undo/recycle bin for deleted documents
+- Changes to the engine API layer — all endpoints already exist
+
+## Decisions
+
+### 1. Reuse `_ensure_exclusive_collection` for kb_set_collection
+
+The server already has `_ensure_exclusive_collection(doc_id, collection)` which removes any existing `collection:*` tags and applies the new one. The `kb_set_collection` tool will use this directly when a collection is provided, and manually remove collection tags when clearing.
+
+**Alternative considered**: Exposing raw tag add/remove to the LLM. Rejected because it leaks the `collection:` prefix implementation detail and the LLM could create inconsistent state (multiple collections on one document).
+
+### 2. New `engine.delete_document()` method for kb_delete
+
+Add a simple `delete_document(doc_id)` to `mcp/engine.py` that calls `DELETE /api/v1/documents/{doc_id}`. This follows the same pattern as all other engine client methods.
+
+### 3. Return confirmation with document metadata on delete
+
+`kb_delete` will return the response from the engine API which includes `{"status": "deleted", "document_id": ..., "title": ...}`. This gives the LLM confirmation of what was deleted without needing a separate get call.
+
+## Risks / Trade-offs
+
+- **[Accidental deletion]** → The LLM could delete the wrong document. Mitigation: the tool requires an explicit `document_id`, and the response includes the title so the LLM can verify. No bulk delete is exposed.
+- **[Collection cleared unexpectedly]** → Passing `collection=None` to `kb_set_collection` removes collection assignment. Mitigation: the parameter description will make this behavior explicit.
@@ -0,0 +1,25 @@
+## Why
+
+LLMs using the KB MCP server can create notes in collections and search by collection, but cannot assign existing documents to a collection or delete documents. This forces users to drop out to the HTTP API for routine document management. Both operations are fully supported at the database and HTTP API layers but aren't wired through to MCP tools.
+
+## What Changes
+
+- Add `kb_set_collection` MCP tool — assigns, changes, or removes the collection on an existing document by manipulating `collection:` prefixed tags via the existing `engine.update_tags()` method.
+- Add `kb_delete` MCP tool — deletes a document by ID, calling the existing `DELETE /api/v1/documents/{doc_id}` endpoint via a new `engine.delete_document()` method.
+
+## Capabilities
+
+### New Capabilities
+
+- `mcp-document-management`: MCP tools for modifying and deleting existing documents (kb_set_collection, kb_delete).
+
+### Modified Capabilities
+
+_(none — the engine API endpoints already exist; this change only adds MCP tool wrappers)_
+
+## Impact
+
+- **MCP server** (`mcp/server.py`): Two new tool registrations.
+- **MCP engine client** (`mcp/engine.py`): One new method (`delete_document`). The `update_tags` method already exists and will be reused.
+- **Engine API**: No changes — `DELETE /api/v1/documents/{doc_id}` and `PUT /api/v1/documents/{doc_id}/tags` already exist.
+- **Breaking changes**: None. Additive only.
@@ -0,0 +1,61 @@
+## ADDED Requirements
+
+### Requirement: Set collection on existing document via MCP
+
+The MCP server SHALL expose a `kb_set_collection` tool that assigns or changes the collection of an existing document. The tool SHALL accept a `document_id` (required) and `collection` (optional string). When `collection` is provided, the tool SHALL ensure the document belongs to exactly that collection by removing any existing `collection:*` tags and adding the new one. When `collection` is omitted or null, the tool SHALL remove all `collection:*` tags from the document, leaving it unassigned.
+
+The tool SHALL return the updated document with the `collection` field and cleaned tags (collection tags stripped), consistent with other MCP tool responses.
+
+#### Scenario: Assign untagged document to a collection
+
+- **WHEN** `kb_set_collection` is called with `document_id=42` and `collection="workspace"`
+- **THEN** the document SHALL have the tag `collection:workspace` added
+- **AND** the response SHALL include `"collection": "workspace"`
+
+#### Scenario: Change document from one collection to another
+
+- **WHEN** `kb_set_collection` is called with `document_id=42` and `collection="memory"` on a document currently in collection "documents"
+- **THEN** the tag `collection:documents` SHALL be removed and `collection:memory` SHALL be added
+- **AND** the response SHALL include `"collection": "memory"`
+
+#### Scenario: Remove document from all collections
+
+- **WHEN** `kb_set_collection` is called with `document_id=42` and no `collection` parameter
+- **THEN** all `collection:*` tags SHALL be removed from the document
+- **AND** the response SHALL include `"collection": null`
+
+#### Scenario: Document not found
+
+- **WHEN** `kb_set_collection` is called with a `document_id` that does not exist
+- **THEN** the tool SHALL return an error response indicating the document was not found
+
+### Requirement: Delete document via MCP
+
+The MCP server SHALL expose a `kb_delete` tool that permanently deletes a document from the knowledge base. The tool SHALL accept a `document_id` (required integer). Deletion SHALL remove the document, its chunks, embeddings, tags, and any stored file on disk.
+
+The tool SHALL return a confirmation response including the deleted document's ID and title.
+
+#### Scenario: Successful deletion
+
+- **WHEN** `kb_delete` is called with `document_id=42`
+- **THEN** the document, its chunks, embeddings, tag associations, and stored file SHALL be deleted
+- **AND** the response SHALL include `"status": "deleted"`, the `document_id`, and the document `title`
+
+#### Scenario: Document not found
+
+- **WHEN** `kb_delete` is called with a `document_id` that does not exist
+- **THEN** the tool SHALL return an error response indicating the document was not found
+
+### Requirement: Engine client delete method
+
+The MCP engine client (`mcp/engine.py`) SHALL provide a `delete_document(doc_id)` method that sends a `DELETE` request to `/api/v1/documents/{doc_id}` and returns the JSON response. The method SHALL raise on non-2xx status codes, consistent with other engine client methods.
+
+#### Scenario: Successful engine client delete call
+
+- **WHEN** `delete_document(42)` is called and the engine API returns 200
+- **THEN** the method SHALL return the parsed JSON response
+
+#### Scenario: Engine client delete for missing document
+
+- **WHEN** `delete_document(999)` is called and the engine API returns 404
+- **THEN** the method SHALL raise an `httpx.HTTPStatusError`
@@ -0,0 +1,12 @@
+## 1. Engine Client
+
+- [x] 1.1 Add `delete_document(doc_id)` method to `mcp/engine.py`
+
+## 2. MCP Tools
+
+- [x] 2.1 Add `kb_set_collection` tool to `mcp/server.py`
+- [x] 2.2 Add `kb_delete` tool to `mcp/server.py`
+
+## 3. Verification
+
+- [x] 3.1 Test kb_set_collection and kb_delete against running engine
@@ -0,0 +1,35 @@
+# Agent-Side Search Patterns
+
+## Purpose
+
+Documents recommended patterns for agent-side query expansion and reranking, which are caller responsibilities rather than engine features. These patterns are communicated via MCP tool descriptions.
+
+## Requirements
+
+### Requirement: Query expansion guidance in tool description
+
+The `kb_search` MCP tool description SHALL include guidance on query expansion as a recommended pattern for complex queries.
+
+#### Scenario: Tool description includes expansion pattern
+- **WHEN** an agent reads the `kb_search` tool description
+- **THEN** the description SHALL include guidance such as: "For complex queries, consider expanding into 2-3 variant phrasings and calling this tool multiple times, then deduplicating results by chunk_id"
+
+---
+
+### Requirement: Reranking guidance in tool description
+
+The `kb_search` MCP tool description SHALL include guidance on agent-side reranking as a recommended pattern for improving precision.
+
+#### Scenario: Tool description includes reranking pattern
+- **WHEN** an agent reads the `kb_search` tool description
+- **THEN** the description SHALL include guidance such as: "For precision, rerank the returned results using your own judgement based on relevance to the original question"
+
+---
+
+### Requirement: No engine-side LLM dependency
+
+The engine SHALL NOT require or use any external LLM API for search operations. Query expansion and reranking SHALL remain entirely agent-side concerns.
+
+#### Scenario: Engine has no LLM dependency
+- **WHEN** the engine is deployed without any `ANTHROPIC_API_KEY` or similar LLM API configuration
+- **THEN** all search operations SHALL function fully, with no degraded results or missing features
@@ -0,0 +1,230 @@
+## ADDED Requirements
+
+### Requirement: Common selection filter
+
+All bulk engine endpoints SHALL accept a JSON body with the following optional selection fields, combined with AND logic:
+
+- `document_ids` (list of int) — match documents with these specific IDs
+- `tags` (list of str) — match documents that have ALL specified tags
+- `doc_type` (str) — match documents with this document type
+- `from_id` (int) — match documents with id >= this value
+- `to_id` (int) — match documents with id <= this value
+
+At least one selection field MUST be present. If no selection fields are provided, the endpoint SHALL return 400 Bad Request.
+
+#### Scenario: Filter by tags and doc_type
+
+- **WHEN** a bulk endpoint receives `{"tags": ["draft"], "doc_type": "note"}`
+- **THEN** it SHALL match only documents that have the tag "draft" AND have doc_type "note"
+
+#### Scenario: Filter by ID range
+
+- **WHEN** a bulk endpoint receives `{"from_id": 10, "to_id": 50}`
+- **THEN** it SHALL match documents with id >= 10 AND id <= 50
+
+#### Scenario: Filter by explicit IDs
+
+- **WHEN** a bulk endpoint receives `{"document_ids": [1, 5, 12]}`
+- **THEN** it SHALL match only documents with those specific IDs
+
+#### Scenario: Combined filters
+
+- **WHEN** a bulk endpoint receives `{"tags": ["agent:mybot"], "doc_type": "note", "from_id": 100}`
+- **THEN** it SHALL match documents satisfying ALL three criteria
+
+#### Scenario: No selection fields provided
+
+- **WHEN** a bulk endpoint receives `{}` or `{"force": true}` with no selection fields
+- **THEN** it SHALL return 400 Bad Request
+
+### Requirement: Safety threshold
+
+All bulk endpoints SHALL enforce a safety threshold. Before executing, the engine SHALL count the matched documents and the total documents in the database. If `matched / total * 100` exceeds the configured threshold, the request SHALL be rejected with 409 Conflict.
+
+The response SHALL include: `error` ("safety_threshold_exceeded"), `message` (human-readable), `matched` (int), `total` (int), `percent` (float), and `threshold` (int).
+
+The threshold SHALL default to 70 and be configurable via the `KB_BULK_SAFETY_PERCENT` environment variable (integer 0-100). A value of 0 disables the check.
+
+The caller MAY override the threshold by including `"force": true` in the request body.
+
+#### Scenario: Threshold exceeded
+
+- **GIVEN** 1000 total documents and `KB_BULK_SAFETY_PERCENT` is 70
+- **WHEN** a bulk endpoint matches 750 documents (75%) without `force: true`
+- **THEN** it SHALL return 409 with `matched: 750`, `total: 1000`, `percent: 75.0`, `threshold: 70`
+
+#### Scenario: Threshold not exceeded
+
+- **GIVEN** 1000 total documents and `KB_BULK_SAFETY_PERCENT` is 70
+- **WHEN** a bulk endpoint matches 500 documents (50%) without `force: true`
+- **THEN** the operation SHALL proceed normally
+
+#### Scenario: Force override
+
+- **GIVEN** 1000 total documents and a match of 900 (90%)
+- **WHEN** the request includes `"force": true`
+- **THEN** the operation SHALL proceed regardless of threshold
+
+#### Scenario: Zero threshold
+
+- **GIVEN** `KB_BULK_SAFETY_PERCENT` is 0
+- **THEN** the safety check SHALL be effectively disabled for all operations
+
+### Requirement: Synchronous response with audit log
+
+All bulk endpoints SHALL execute synchronously and return a JSON response with:
+
+- `job_id` (int) — ID of the audit log entry in the jobs table
+- `status` (str) — "done" or "partial_failure"
+- `matched` (int) — number of documents that matched the selection
+- `succeeded` (int) — number of documents successfully processed
+- `failed` (int) — number of documents that failed
+- `errors` (list) — array of `{"document_id": int, "error": str}` for each failure (empty on full success)
+
+A job record SHALL be created in the jobs table with `job_type` set to the operation type. The `filename` field SHALL store a JSON representation of the selection filter. The `error` field SHALL store a JSON array of individual errors if any occurred.
+
+#### Scenario: Full success
+
+- **WHEN** a bulk operation matches 50 documents and all succeed
+- **THEN** the response SHALL have `status: "done"`, `matched: 50`, `succeeded: 50`, `failed: 0`, `errors: []`
+
+#### Scenario: Partial failure
+
+- **WHEN** a bulk operation matches 50 documents but 2 fail
+- **THEN** the response SHALL have `status: "partial_failure"`, `matched: 50`, `succeeded: 48`, `failed: 2`, and `errors` listing the 2 failures
+
+### Requirement: Bulk delete endpoint
+
+The engine SHALL expose `POST /api/v1/bulk/delete` which permanently deletes all documents matching the selection filter. For each matched document, it SHALL delete embeddings from `chunks_vec`, delete the document row (cascading to chunks and document_tags), and delete any stored file from disk.
+
+Database deletions SHALL be performed within a single transaction. File deletions SHALL occur after the transaction commits and SHALL be best-effort (failures logged but not counted as document failures).
+
+#### Scenario: Bulk delete by tag
+
+- **WHEN** `POST /api/v1/bulk/delete` receives `{"tags": ["old", "draft"]}`
+- **THEN** all documents with both tags "old" and "draft" SHALL be deleted
+- **AND** their chunks, embeddings, tag associations, and stored files SHALL be removed
+
+#### Scenario: Bulk delete with no matches
+
+- **WHEN** `POST /api/v1/bulk/delete` receives a filter that matches 0 documents
+- **THEN** the response SHALL have `matched: 0`, `succeeded: 0`, `failed: 0`
+
+### Requirement: Bulk tags endpoint
+
+The engine SHALL expose `POST /api/v1/bulk/tags` which adds and/or removes tags on all documents matching the selection filter. The request body SHALL include the selection filter plus:
+
+- `add` (list of str, optional) — tags to add
+- `remove` (list of str, optional) — tags to remove
+
+At least one of `add` or `remove` MUST be present. The endpoint SHALL return 400 if neither is provided.
+
+The endpoint SHALL update `updated_at` on all affected documents.
+
+#### Scenario: Add and remove tags in one call
+
+- **WHEN** `POST /api/v1/bulk/tags` receives `{"tags": ["agent:mybot"], "add": ["reviewed"], "remove": ["pending"]}`
+- **THEN** all documents tagged "agent:mybot" SHALL have "reviewed" added and "pending" removed
+
+### Requirement: Bulk set-tags endpoint
+
+The engine SHALL expose `POST /api/v1/bulk/set-tags` which replaces all tags on matched documents with a new set. The request body SHALL include the selection filter plus:
+
+- `new_tags` (list of str) — the replacement tag set
+
+The endpoint SHALL remove all existing tag associations from matched documents, then apply the new set. It SHALL update `updated_at` on all affected documents.
+
+#### Scenario: Replace all tags
+
+- **WHEN** `POST /api/v1/bulk/set-tags` receives `{"doc_type": "note", "new_tags": ["clean", "final"]}`
+- **THEN** all notes SHALL have their existing tags removed and replaced with "clean" and "final"
+
+### Requirement: Jobs table extension
+
+The jobs table SHALL be extended with a `job_type` column (TEXT, default "ingest") to distinguish ingestion jobs from bulk operation audit entries. Valid values: "ingest", "bulk_delete", "bulk_tags", "bulk_set_tags".
+
+Existing jobs SHALL default to `job_type = "ingest"`. The existing jobs list endpoint and CLI `kb jobs` command SHALL continue to work unchanged.
+
+#### Scenario: Migration adds column
+
+- **GIVEN** an existing database without the `job_type` column
+- **WHEN** the engine starts
+- **THEN** the column SHALL be added with default value "ingest"
+
+### Requirement: Engine config for safety threshold
+
+The engine `Config` class SHALL read `KB_BULK_SAFETY_PERCENT` from the environment as an integer (default 70, range 0-100). This value SHALL be used as the default safety threshold for all bulk endpoints.
+
+### Requirement: MCP bulk delete tool
+
+The MCP server SHALL expose a `kb_bulk_delete` tool with parameters: `document_ids` (optional list of int), `tags` (optional list of str), `doc_type` (optional str), `from_id` (optional int), `to_id` (optional int), `force` (optional bool).
+
+The tool SHALL call `POST /api/v1/bulk/delete` on the engine via the engine client and return the JSON response.
+
+The tool description SHALL clearly state that `tags` is a selection filter (which documents to delete), not tags to delete.
+
+#### Scenario: MCP bulk delete by tag
+
+- **WHEN** `kb_bulk_delete(tags=["old"])` is called
+- **THEN** the engine client SHALL send `POST /api/v1/bulk/delete` with `{"tags": ["old"]}`
+- **AND** the tool SHALL return the engine's JSON response
+
+### Requirement: MCP bulk tags tool
+
+The MCP server SHALL expose a `kb_bulk_tags` tool with parameters: `document_ids`, `tags`, `doc_type`, `from_id`, `to_id` (selection filters), plus `add` (optional list of str), `remove` (optional list of str), and `force` (optional bool).
+
+The tool description SHALL clearly distinguish `tags` (selection filter) from `add`/`remove` (tag changes to apply).
+
+#### Scenario: MCP bulk tag update
+
+- **WHEN** `kb_bulk_tags(tags=["agent:mybot"], add=["reviewed"], remove=["draft"])` is called
+- **THEN** the engine client SHALL send the appropriate `POST /api/v1/bulk/tags` request
+
+### Requirement: MCP bulk set-tags tool
+
+The MCP server SHALL expose a `kb_bulk_set_tags` tool with parameters: `document_ids`, `tags`, `doc_type`, `from_id`, `to_id` (selection filters), plus `new_tags` (list of str) and `force` (optional bool).
+
+#### Scenario: MCP bulk set tags
+
+- **WHEN** `kb_bulk_set_tags(doc_type="note", new_tags=["clean"])` is called
+- **THEN** the engine client SHALL send `POST /api/v1/bulk/set-tags` with `{"doc_type": "note", "new_tags": ["clean"]}`
+
+### Requirement: MCP engine client bulk methods
+
+The MCP engine client (`mcp/engine.py`) SHALL provide three new methods:
+
+- `bulk_delete(document_ids?, tags?, doc_type?, from_id?, to_id?, force?)` → dict
+- `bulk_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, add?, remove?, force?)` → dict
+- `bulk_set_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, new_tags?, force?)` → dict
+
+Each SHALL send a POST request to the corresponding `/api/v1/bulk/*` endpoint with the parameters as a JSON body. Each SHALL raise on non-2xx status codes, consistent with existing methods.
+
+### Requirement: CLI bulk-remove command
+
+The CLI SHALL expose a `kb bulk-remove` command with flags: `--tags` (comma-separated), `--type`, `--ids` (comma-separated), `--from-id`, `--to-id`, `--force`/`-f`, `--yes`/`-y`.
+
+Without `--yes`, the CLI SHALL first display the match count and ask for interactive confirmation before proceeding.
+
+The command SHALL call `POST /api/v1/bulk/delete` with the constructed filter.
+
+#### Scenario: CLI bulk remove with confirmation
+
+- **WHEN** `kb bulk-remove --tags "draft,old" --type note` is run without `--yes`
+- **THEN** the CLI SHALL display "This will delete N documents matching: tags=[draft,old] type=note" and prompt "Proceed? [y/N]"
+
+#### Scenario: CLI bulk remove with --yes
+
+- **WHEN** `kb bulk-remove --tags "draft" --yes` is run
+- **THEN** the CLI SHALL proceed without prompting
+
+### Requirement: CLI bulk-tag command
+
+The CLI SHALL expose a `kb bulk-tag` command with the same filter flags as `bulk-remove`, plus `--add` and `--remove` (comma-separated tag lists).
+
+The command SHALL call `POST /api/v1/bulk/tags` with the constructed filter and tag changes.
+
+### Requirement: CLI bulk-set-tags command
+
+The CLI SHALL expose a `kb bulk-set-tags` command with the filter flags, plus `--set` (comma-separated list of replacement tags).
+
+The command SHALL call `POST /api/v1/bulk/set-tags` with the constructed filter and `new_tags`.
@@ -82,13 +82,47 @@ The project SHALL provide Docker Compose files for single-command deployment. Co
 - **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)

 #### Scenario: Configure via environment
- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, etc.)
- **THEN** the engine SHALL use those values
+- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, KB_MCP_ALLOWED_HOSTS, etc.)
+- **THEN** the engine and MCP server SHALL use those values

 #### Scenario: Pre-built image deployment
 - **WHEN** an admin wants to use a pre-built engine image without building from source
 - **THEN** the engine release notes SHALL include the exact `docker pull` command with the versioned tag (e.g. `docker.dcglab.co.uk/dcg/kb/engine:engine-v2.1.0-nvidia`)

+#### Scenario: MCP allowed hosts in Compose
+- **WHEN** the kb-mcp service is defined in a Compose file
+- **THEN** the environment block SHALL include `KB_MCP_ALLOWED_HOSTS` with a comment explaining its format and purpose
+
+---
+
+### Requirement: Configurable MCP allowed hosts
+
+The MCP server SHALL accept a `KB_MCP_ALLOWED_HOSTS` environment variable containing a comma-separated list of additional hosts (IP addresses or FQDNs) that are permitted to connect. The server SHALL always allow `127.0.0.1`, `localhost`, and `[::1]` regardless of this setting. DNS rebinding protection SHALL always be enabled.
+
+#### Scenario: Remote client connects with allowed host
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50` and a client connects with `Host: 192.168.1.50:3000`
+- **THEN** the server SHALL accept the request and process it normally
+
+#### Scenario: Remote client connects with disallowed host
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50` and a client connects with `Host: 10.0.0.99:3000`
+- **THEN** the server SHALL return HTTP 421 "Invalid Host header"
+
+#### Scenario: Multiple allowed hosts
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50,kb.example.com`
+- **THEN** the server SHALL accept requests with `Host` matching either `192.168.1.50` or `kb.example.com` on any port
+
+#### Scenario: Variable unset or empty
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is unset or empty
+- **THEN** the server SHALL allow only localhost addresses (`127.0.0.1`, `localhost`, `[::1]`) with any port
+
+#### Scenario: Localhost always allowed
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50`
+- **THEN** the server SHALL still accept requests with `Host: localhost:3000` or `Host: 127.0.0.1:3000`
+
+#### Scenario: Allowed origins derived from allowed hosts
+- **WHEN** `KB_MCP_ALLOWED_HOSTS` includes `192.168.1.50`
+- **THEN** the server SHALL accept `Origin: http://192.168.1.50:3000` (and any port) in addition to localhost origins
+
 ---

 ### Requirement: CPU-only fallback
@@ -150,15 +150,19 @@ The engine SHALL provide endpoints to list, inspect, remove, and download origin

 #### Scenario: List documents
 - **WHEN** a client sends `GET /api/v1/documents`
- **THEN** the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, and created_at
+- **THEN** the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, created_at, and updated_at

 #### Scenario: List documents with filters
 - **WHEN** a client sends `GET /api/v1/documents?type=pdf&tags=manual`
 - **THEN** the engine SHALL return only documents matching all specified filters

+#### Scenario: List documents sorted by most recent
+- **WHEN** a client requests documents sorted by date
+- **THEN** the engine SHALL use `COALESCE(updated_at, created_at)` for ordering, so un-mutated documents sort by creation time and mutated documents sort by their last update
+
 #### Scenario: Get document details
 - **WHEN** a client sends `GET /api/v1/documents/{id}`
- **THEN** the engine SHALL return the full document record including all chunks, their text content, and whether the original file is available (`has_file: true/false`)
+- **THEN** the engine SHALL return the full document record including all chunks, their text content, `updated_at`, and whether the original file is available (`has_file: true/false`)

 #### Scenario: Download original file
 - **WHEN** a client sends `GET /api/v1/documents/{id}/file`
@@ -174,6 +178,38 @@ The engine SHALL provide endpoints to list, inspect, remove, and download origin

 ---

+### Requirement: Note mutation endpoint
+
+The engine SHALL provide a `PATCH /api/v1/notes/{id}` endpoint for updating existing notes in place. See the `note-mutation` spec for full details.
+
+#### Scenario: Note update endpoint exists
+- **WHEN** a client sends `PATCH /api/v1/notes/42` with body `{"text": "new content"}`
+- **THEN** the engine SHALL process the update synchronously and return the updated document
+
+---
+
+### Requirement: Document updated_at tracking
+
+The engine SHALL track when documents are modified via an `updated_at` column. This column SHALL be NULL for documents that have never been updated.
+
+#### Scenario: New document has no updated_at
+- **WHEN** a document is first ingested
+- **THEN** `updated_at` SHALL be NULL and `created_at` SHALL be set to the ingestion timestamp
+
+#### Scenario: Note update sets updated_at
+- **WHEN** a note is updated via `PATCH /api/v1/notes/{id}`
+- **THEN** `updated_at` SHALL be set to the current timestamp
+
+#### Scenario: Tag change sets updated_at
+- **WHEN** tags are modified via `PUT /api/v1/documents/{id}/tags`
+- **THEN** `updated_at` SHALL be set to the current timestamp
+
+#### Scenario: Schema migration for updated_at
+- **WHEN** the engine starts against a v2 database without an `updated_at` column
+- **THEN** the engine SHALL automatically add `ALTER TABLE documents ADD COLUMN updated_at TEXT` and all existing documents SHALL have `updated_at = NULL`
+
+---
+
 ### Requirement: Tag management

 The engine SHALL provide endpoints to list all tags and manage tags on documents.
@@ -265,17 +265,43 @@ The client SHALL provide a `kb reindex` command that triggers re-embedding of al

 ---

+### Requirement: Update note command
+
+The client SHALL provide a `kb updatenote <id> <text>` command that updates an existing note's content via the engine's `PATCH /api/v1/notes/{id}` endpoint.
+
+#### Scenario: Update a note
+- **WHEN** the user runs `kb updatenote 42 "Updated note content"`
+- **THEN** the client SHALL send `PATCH /api/v1/notes/42` with body `{"text": "Updated note content"}` and display the result
+
+#### Scenario: Update a note with JSON output
+- **WHEN** the user runs `kb updatenote 42 "new content" --format json`
+- **THEN** the client SHALL output the raw JSON response from the engine
+
+#### Scenario: Update a non-existent document
+- **WHEN** the user runs `kb updatenote 999 "text"` and the engine returns HTTP 404
+- **THEN** the client SHALL display an error indicating the document was not found and exit with a non-zero code
+
+#### Scenario: Update a non-note document
+- **WHEN** the user runs `kb updatenote 42 "text"` and the engine returns HTTP 422
+- **THEN** the client SHALL display an error indicating that only notes can be updated and exit with a non-zero code
+
+#### Scenario: Missing arguments
+- **WHEN** the user runs `kb updatenote` or `kb updatenote 42` with insufficient arguments
+- **THEN** the client SHALL display usage help indicating that both document ID and text are required
+
+---
+
 ### Requirement: Engine version compatibility check

 The client SHALL verify that the connected engine meets a minimum version requirement before executing any API command. The minimum required engine version SHALL be embedded in the client binary at build time. If the engine version is below the minimum, the client SHALL print an error message and exit with a non-zero code. There SHALL be no flag to skip or suppress this check.

 #### Scenario: Compatible engine version
- **WHEN** the client connects to an engine reporting version `2.1.5` and `MinEngineVersion` is `2.1.0`
+- **WHEN** the client connects to an engine reporting version `3.0.0` and `MinEngineVersion` is `3.0.0`
 - **THEN** the client SHALL proceed with the command normally

 #### Scenario: Incompatible engine version
- **WHEN** the client connects to an engine reporting version `2.0.3` and `MinEngineVersion` is `2.1.0`
- **THEN** the client SHALL print to stderr: `Error: kb client vX.Y.Z requires engine v2.1.0+ (connected engine is v2.0.3)` followed by an upgrade hint, and exit with code 1
+- **WHEN** the client connects to an engine reporting version `2.1.0` and `MinEngineVersion` is `3.0.0`
+- **THEN** the client SHALL print to stderr: `Error: kb client vX.Y.Z requires engine v3.0.0+ (connected engine is v2.1.0)` followed by an upgrade hint, and exit with code 1

 #### Scenario: Engine unreachable during version check
 - **WHEN** the client cannot reach the engine's `/api/v1/status` endpoint
@@ -0,0 +1,198 @@
+# MCP Server
+
+## Purpose
+
+The MCP server provides a Model Context Protocol interface to the kb engine, exposing knowledge base operations as native MCP tools over Streamable HTTP transport. It runs as a separate Docker container alongside the engine, translating MCP tool calls into engine HTTP API calls.
+
+## Requirements
+
+### Requirement: MCP server transport and deployment
+
+The MCP server SHALL expose tools via Streamable HTTP transport. It SHALL run as a Docker container, configured to connect to the kb engine's HTTP API. It SHALL read `KB_ENGINE_URL` and `KB_API_KEY` from environment variables to connect to the engine.
+
+#### Scenario: MCP server starts and connects to engine
+- **WHEN** the MCP server container starts with `KB_ENGINE_URL=http://engine:8000` and `KB_API_KEY=secret`
+- **THEN** it SHALL begin accepting MCP connections over Streamable HTTP and use the configured URL and API key for all engine API calls
+
+#### Scenario: Engine unreachable at startup
+- **WHEN** the MCP server starts but cannot reach the engine at `KB_ENGINE_URL`
+- **THEN** it SHALL start and accept connections, but tool calls SHALL return errors indicating the engine is unreachable
+
+#### Scenario: Docker Compose deployment
+- **WHEN** the MCP server is deployed via Docker Compose alongside the engine
+- **THEN** it SHALL connect to the engine via the Docker network using the service name (e.g. `http://engine:8000`)
+
+---
+
+### Requirement: MCP server authentication
+
+The MCP server SHALL require Bearer token authentication from calling agents via the `KB_MCP_API_KEY` environment variable. This is independent of the engine's `KB_API_KEY`.
+
+#### Scenario: Valid MCP API key
+- **WHEN** `KB_MCP_API_KEY` is set and a calling agent provides a matching Bearer token
+- **THEN** the MCP server SHALL process the request normally
+
+#### Scenario: Missing MCP API key when required
+- **WHEN** `KB_MCP_API_KEY` is set and a calling agent connects without a Bearer token
+- **THEN** the MCP server SHALL reject the connection with an authentication error
+
+#### Scenario: Invalid MCP API key
+- **WHEN** `KB_MCP_API_KEY` is set and a calling agent provides a non-matching Bearer token
+- **THEN** the MCP server SHALL reject the connection with an authentication error
+
+#### Scenario: MCP auth disabled
+- **WHEN** `KB_MCP_API_KEY` is not set
+- **THEN** the MCP server SHALL accept all connections without authentication
+
+---
+
+### Requirement: Search tool
+
+The MCP server SHALL expose a `kb_search` tool that queries the knowledge base via the engine's search API.
+
+#### Scenario: Basic search
+- **WHEN** an agent calls `kb_search` with `{"query": "pension revaluation", "top": 5}`
+- **THEN** the MCP server SHALL POST to the engine's `/api/v1/search` endpoint and return the results with chunk text, scores, document metadata, and tags
+
+#### Scenario: Search with tag filter
+- **WHEN** an agent calls `kb_search` with `{"query": "email preferences", "tags": ["agent:mybot"]}`
+- **THEN** the MCP server SHALL include the tags in the filter and POST to the engine's search endpoint
+
+#### Scenario: Search with mode override
+- **WHEN** an agent calls `kb_search` with `{"query": "error log", "fts_only": true}`
+- **THEN** the MCP server SHALL pass `fts_only: true` to the engine search endpoint
+
+---
+
+### Requirement: Add note tool
+
+The MCP server SHALL expose a `kb_addnote` tool that submits a text note to the engine for ingestion.
+
+#### Scenario: Add a note
+- **WHEN** an agent calls `kb_addnote` with `{"text": "User prefers concise responses"}`
+- **THEN** the MCP server SHALL submit the note to the engine's `POST /api/v1/jobs` endpoint and return the job ID
+
+#### Scenario: Add a note with tags
+- **WHEN** an agent calls `kb_addnote` with `{"text": "User prefers concise responses", "tags": ["agent:mybot", "feedback"]}`
+- **THEN** the MCP server SHALL submit the note with exactly those tags to the engine
+
+---
+
+### Requirement: Chunked file upload tools
+
+The MCP server SHALL expose a three-step chunked file upload pattern for transferring files from remote agents to the engine.
+
+#### Scenario: Start an upload
+- **WHEN** an agent calls `kb_upload_start` with `{"filename": "report.pdf", "total_size": 5242880, "tags": ["insurance"]}`
+- **THEN** the MCP server SHALL create a staging entry, generate a UUID `upload_id`, and return `{"upload_id": "<uuid>"}`
+
+#### Scenario: Upload a chunk
+- **WHEN** an agent calls `kb_upload_chunk` with `{"upload_id": "<uuid>", "data": "<base64-encoded-data>", "chunk_index": 0}`
+- **THEN** the MCP server SHALL decode the base64 data and write it to the staging area for the given upload
+
+#### Scenario: Upload multiple chunks in sequence
+- **WHEN** an agent calls `kb_upload_chunk` multiple times with sequential `chunk_index` values for the same `upload_id`
+- **THEN** the MCP server SHALL store each chunk and track the sequence
+
+#### Scenario: Finish an upload
+- **WHEN** an agent calls `kb_upload_finish` with `{"upload_id": "<uuid>"}`
+- **THEN** the MCP server SHALL reassemble the chunks in order, forward the complete file as a multipart upload to the engine's `POST /api/v1/jobs` endpoint with the tags from `kb_upload_start`, and return the job ID
+
+#### Scenario: Upload with invalid upload_id
+- **WHEN** an agent calls `kb_upload_chunk` or `kb_upload_finish` with an `upload_id` that does not exist
+- **THEN** the MCP server SHALL return an error indicating the upload ID is not found
+
+#### Scenario: Abandoned upload cleanup
+- **WHEN** an agent starts an upload but does not call `kb_upload_finish` within 10 minutes
+- **THEN** the MCP server SHALL clean up the staged chunks and remove the upload tracking entry
+
+#### Scenario: MCP server restart during upload
+- **WHEN** the MCP server container restarts while an upload is in progress
+- **THEN** the in-progress upload SHALL be lost and the agent SHALL need to restart from `kb_upload_start`
+
+---
+
+### Requirement: Update note tool
+
+The MCP server SHALL expose a `kb_update_note` tool that updates an existing note in place via the engine's note mutation endpoint.
+
+#### Scenario: Update an existing note
+- **WHEN** an agent calls `kb_update_note` with `{"document_id": 42, "text": "Updated preference: user prefers bullet points"}`
+- **THEN** the MCP server SHALL send `PATCH /api/v1/notes/42` to the engine and return the updated document
+
+#### Scenario: Update a non-existent document
+- **WHEN** an agent calls `kb_update_note` with a `document_id` that does not exist
+- **THEN** the MCP server SHALL return an error indicating the document was not found
+
+#### Scenario: Update a non-note document
+- **WHEN** an agent calls `kb_update_note` with a `document_id` that refers to a PDF
+- **THEN** the MCP server SHALL return an error indicating that only notes can be updated
+
+---
+
+### Requirement: Get document tool
+
+The MCP server SHALL expose a `kb_get` tool that retrieves document details from the engine.
+
+#### Scenario: Get by document ID
+- **WHEN** an agent calls `kb_get` with `{"document_id": 42}`
+- **THEN** the MCP server SHALL fetch `GET /api/v1/documents/42` and return the document details with chunks
+
+#### Scenario: Get by source path
+- **WHEN** an agent calls `kb_get` with `{"source_path": "memory/feedback_testing.md"}`
+- **THEN** the MCP server SHALL query the engine's documents endpoint filtered by source path and return matching documents
+
+---
+
+### Requirement: Status tool
+
+The MCP server SHALL expose a `kb_status` tool that returns engine health and statistics.
+
+#### Scenario: Get engine status
+- **WHEN** an agent calls `kb_status` with no parameters
+- **THEN** the MCP server SHALL fetch `GET /api/v1/status` and return engine version, model info, device info, document counts, and queue state
+
+---
+
+### Requirement: Jobs tool
+
+The MCP server SHALL expose a `kb_jobs` tool that returns ingestion job status.
+
+#### Scenario: List recent jobs
+- **WHEN** an agent calls `kb_jobs` with no parameters
+- **THEN** the MCP server SHALL fetch `GET /api/v1/jobs` and return the list of recent jobs
+
+#### Scenario: Filter jobs by status
+- **WHEN** an agent calls `kb_jobs` with `{"status": "failed"}`
+- **THEN** the MCP server SHALL fetch `GET /api/v1/jobs?status=failed` and return matching jobs
+
+---
+
+### Requirement: Delete document tool
+
+The MCP server SHALL expose a `kb_delete` tool that permanently deletes a document from the knowledge base. The tool SHALL accept a `document_id` (required integer). Deletion SHALL remove the document, its chunks, embeddings, tags, and any stored file on disk.
+
+The tool SHALL return a confirmation response including the deleted document's ID and title.
+
+#### Scenario: Successful deletion
+- **WHEN** `kb_delete` is called with `document_id=42`
+- **THEN** the document, its chunks, embeddings, tag associations, and stored file SHALL be deleted
+- **AND** the response SHALL include `"status": "deleted"`, the `document_id`, and the document `title`
+
+#### Scenario: Document not found
+- **WHEN** `kb_delete` is called with a `document_id` that does not exist
+- **THEN** the tool SHALL return an error response indicating the document was not found
+
+---
+
+### Requirement: Tags-only document organisation
+
+The MCP server SHALL NOT maintain any collection abstraction. Documents SHALL be returned as-is from the engine with all tags visible. No tag stripping or collection field injection SHALL occur. Namespace isolation (e.g. separating agent memory from user documents) is achieved via tag conventions communicated through system prompts or tool descriptions.
+
+#### Scenario: Search results show all tags
+- **WHEN** `kb_search` is called and a result has tags `["agent:mybot", "collection:documents", "draft"]`
+- **THEN** all three tags SHALL be returned as-is — no stripping of `collection:*` tags
+
+#### Scenario: Add note with explicit tags only
+- **WHEN** `kb_addnote(text="hello", tags=["agent:mybot", "memory"])` is called
+- **THEN** the note SHALL be created with exactly those two tags — no default tags added
@@ -0,0 +1,43 @@
+# Note Mutation
+
+## Purpose
+
+Note mutation allows existing notes to be updated in place without requiring delete and re-add, preserving document identity (ID, creation timestamp) while updating content, embeddings, and the full-text index.
+
+## Requirements
+
+### Requirement: Note update endpoint
+
+The engine SHALL provide a `PATCH /api/v1/notes/{id}` endpoint that accepts new text for an existing note, re-chunks and re-embeds it, and returns the updated document.
+
+#### Scenario: Update an existing note
+- **WHEN** a client sends `PATCH /api/v1/notes/42` with body `{"text": "Updated note content"}`
+- **THEN** the engine SHALL delete existing chunks and embeddings for document 42, run the new text through the note chunking pipeline, generate embeddings for each chunk, insert new chunks and embeddings, update the document's `content_hash` and `updated_at`, and return the updated document with HTTP 200
+
+#### Scenario: Update preserves document identity
+- **WHEN** a note is updated via PATCH
+- **THEN** the document SHALL retain its original `id` and `created_at` values, and `updated_at` SHALL be set to the current timestamp
+
+#### Scenario: Update with long text that produces multiple chunks
+- **WHEN** a client sends `PATCH /api/v1/notes/42` with text longer than the embedding model's token window
+- **THEN** the engine SHALL chunk the text using the same note chunking pipeline as ingestion, producing multiple chunks, and embed each chunk separately
+
+#### Scenario: Update a non-existent document
+- **WHEN** a client sends `PATCH /api/v1/notes/999` and document 999 does not exist
+- **THEN** the engine SHALL return HTTP 404
+
+#### Scenario: Update a non-note document
+- **WHEN** a client sends `PATCH /api/v1/notes/42` and document 42 has `doc_type = 'pdf'`
+- **THEN** the engine SHALL return HTTP 422 with an error indicating that only notes can be updated via this endpoint
+
+#### Scenario: Embedding failure during update
+- **WHEN** a client sends `PATCH /api/v1/notes/42` but the embedding step fails
+- **THEN** the engine SHALL roll back the entire transaction, preserving the original note content, chunks, and embeddings, and return HTTP 500
+
+#### Scenario: FTS5 index updated on note mutation
+- **WHEN** a note is updated via PATCH
+- **THEN** the FTS5 virtual table SHALL be updated via the existing chunk triggers (`chunks_ad` for deletes, `chunks_ai` for inserts), keeping the full-text index consistent with the new content
+
+#### Scenario: Tags preserved on update
+- **WHEN** a note with tags `["feedback", "collection:memory"]` is updated via PATCH
+- **THEN** the document's tags SHALL be unchanged — only the text content, chunks, and embeddings are replaced