7 Commits

Author SHA1 Message Date
steve 45e2c5ce91 Bump engine version to 3.2.3 2026-05-15 18:22:08 +01:00
steve e6e91f1d5c Clarify hybrid semantic + full-text search in MCP descriptions
Agents were misreading kb_search as keyword-only because the vector/semantic
component was only mentioned in the negative ("fts_only: no vector similarity").
Lead with hybrid semantic + BM25 + RRF in the server instructions, kb_search
docstring, and MCP.md so agents recognise it as a vector search tool.
2026-05-15 18:19:42 +01:00
steve 9eccc527ae Add next-steps.md with UX improvement ideas for kb CLI
Captures pain points found while trying to locate an uploaded PDF: kb
list silently ignores positional args, kb search results lack
document_id, kb info dumps all chunks with no summary mode, and
scan-heavy PDFs produce noisy single-char chunk hits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 20:34:55 +01:00
steve d44d11e4fe Bump engine version to 3.2.2 2026-04-14 21:48:55 +01:00
steve 574370e8d1 Remove AMD ROCm support — CPU and NVIDIA only
BREAKING: Remove Dockerfile.rocm, compose.rocm.yaml, and ROCm image
build/push from the release pipeline. Remove AMD quick-start and ROCm
references from README and DEVELOPER docs. Update docker-deployment
and developer-docs specs to reflect CPU + NVIDIA only.

The ROCm variant added significant complexity (4.2GB torch wheel,
>20GB container) with limited usage. Users on AMD GPUs should stay
on engine v3.2.x or switch to CPU mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:39:37 +01:00
steve 17b19999de Switch nvidia and rocm Dockerfiles from onnxruntime to torch
Nvidia: install torch+torchvision from PyTorch cu130 index, drop
onnxruntime-gpu. ROCm: use local torch wheel with rocm6.4 index for
torchvision, clean up nvidia remnants from the venv.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:13:41 +01:00
steve bb78f4ea80 Fix 500 error on notes with slashes in title, bump engine to 3.2.1
Sanitize / and \ in note titles and filenames when writing to the
staging directory — a title like "/reset skill" was interpreted as a
path separator, causing a FileNotFoundError and a 500 from the jobs
endpoint. Also add PRAGMA busy_timeout=5000 to SQLite connections to
prevent immediate failure under concurrent write load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:12:58 +01:00
19 changed files with 266 additions and 197 deletions
+3 -10
View File
@@ -11,9 +11,6 @@ cd engine
# NVIDIA GPU # NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d
``` ```
### Client ### Client
@@ -50,7 +47,7 @@ The client embeds a `MinEngineVersion` (from `client/MIN_ENGINE_VERSION`) and wi
./release-engine.sh --gitea --dry-run # preview without doing anything ./release-engine.sh --gitea --dry-run # preview without doing anything
``` ```
Creates tag `engine-vX.Y.Z`, builds NVIDIA and ROCm Docker images, creates a Gitea/GitHub release, and pushes images to the registry. Creates tag `engine-vX.Y.Z`, builds NVIDIA and CPU Docker images, creates a Gitea/GitHub release, and pushes images to the registry.
### Checking versions ### Checking versions
@@ -66,8 +63,8 @@ curl http://localhost:8000/api/v1/status | jq .version
Images are pushed to `docker.dcglab.co.uk/dcg/kb/engine` with tags: Images are pushed to `docker.dcglab.co.uk/dcg/kb/engine` with tags:
- `engine-v2.0.6-nvidia` / `engine-v2.0.6-rocm` — versioned - `engine-v2.0.6-nvidia` / `engine-v2.0.6-cpu` — versioned
- `latest-nvidia` / `latest-rocm` — latest release - `latest-nvidia` / `latest-cpu` — latest release
Override the registry and org via environment variables: Override the registry and org via environment variables:
@@ -97,7 +94,3 @@ All endpoints are under `/api/v1/`. Requires `Authorization: Bearer <key>` heade
| `POST` | `/bulk/delete` | Bulk delete documents by filter | | `POST` | `/bulk/delete` | Bulk delete documents by filter |
| `POST` | `/bulk/tags` | Bulk add/remove tags by filter | | `POST` | `/bulk/tags` | Bulk add/remove tags by filter |
| `POST` | `/bulk/set-tags` | Bulk replace tags by filter | | `POST` | `/bulk/set-tags` | Bulk replace tags by filter |
## Future: ROCm runtime migration
The `onnxruntime-rocm` execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the **MIGraphX execution provider** as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from `onnxruntime-rocm` to `onnxruntime` with the MIGraphX EP and install the `migraphx` runtime libraries instead.
+2 -2
View File
@@ -1,6 +1,6 @@
# MCP Server (Agent Integration) # MCP Server (Agent Integration)
The MCP server exposes kb operations as native MCP tools, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI. The MCP server exposes kb operations as native MCP tools, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI. `kb_search` is hybrid: dense vector embeddings (semantic similarity) fused with BM25 full-text ranking via Reciprocal Rank Fusion, so agents can ask natural-language questions and find conceptually related content even when the exact words don't match.
## Start the MCP server ## Start the MCP server
@@ -27,7 +27,7 @@ docker run -d --name kb-mcp \
| Tool | Description | | Tool | Description |
|---|---| |---|---|
| `kb_search` | Hybrid search with optional tag/type filters | | `kb_search` | Hybrid semantic (vector) + full-text search with tag/type filters |
| `kb_addnote` | Add a text note (queued for async ingestion) | | `kb_addnote` | Add a text note (queued for async ingestion) |
| `kb_update_note` | Update an existing note in place | | `kb_update_note` | Update an existing note in place |
| `kb_get` | Get document details by ID or source path | | `kb_get` | Get document details by ID or source path |
+2 -17
View File
@@ -12,7 +12,7 @@ Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
MCP Agents ──MCP/HTTP──▶ MCP Server (Docker) ──┘ MCP Agents ──MCP/HTTP──▶ MCP Server (Docker) ──┘
``` ```
- **Engine**: Keeps the embedding model warm in memory. Handles search, ingestion, document management, and note mutation via REST API. Runs in Docker with NVIDIA GPU, AMD GPU (ROCm), or CPU-only support. - **Engine**: Keeps the embedding model warm in memory. Handles search, ingestion, document management, and note mutation via REST API. Runs in Docker with NVIDIA GPU or CPU-only support.
- **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP. - **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
- **MCP Server**: Exposes kb operations as native MCP tools over Streamable HTTP. Runs as a separate Docker container alongside the engine. Use tags to scope agent data from user documents. - **MCP Server**: Exposes kb operations as native MCP tools over Streamable HTTP. Runs as a separate Docker container alongside the engine. Use tags to scope agent data from user documents.
- **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts. - **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.
@@ -35,18 +35,6 @@ docker run -d --name kb-engine \
--restart unless-stopped \ --restart unless-stopped \
docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia
# AMD GPU (ROCm)
docker run -d --name kb-engine \
--device /dev/kfd --device /dev/dri \
--group-add video \
-p 8000:8000 \
-v ~/kb-data:/data \
-e KB_MODEL=all-MiniLM-L6-v2 \
-e KB_DEVICE=auto \
-e KB_API_KEY=your-secret-key \
--restart unless-stopped \
docker.dcglab.co.uk/dcg/kb/engine:latest-rocm
# CPU only (no GPU required — smaller image) # CPU only (no GPU required — smaller image)
docker run -d --name kb-engine \ docker run -d --name kb-engine \
-p 8000:8000 \ -p 8000:8000 \
@@ -63,9 +51,6 @@ Or use a compose file from the repo:
# NVIDIA GPU # NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d
# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.rocm.yaml up -d
# CPU only # CPU only
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d
``` ```
@@ -192,7 +177,7 @@ rsync -a ~/kb-data/ user@target:/home/user/kb-data/
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
``` ```
Data is device-agnostic — you can ingest on NVIDIA and serve from AMD or CPU (or any combination) with the same data directory. Data is device-agnostic — you can ingest on NVIDIA and serve from CPU (or vice versa) with the same data directory.
## MCP server (agent integration) ## MCP server (agent integration)
+2 -2
View File
@@ -20,8 +20,8 @@ COPY VERSION ./
RUN uv venv .venv && \ RUN uv venv .venv && \
. .venv/bin/activate && \ . .venv/bin/activate && \
uv pip install -e . && \ UV_HTTP_TIMEOUT=600 uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130 && \
uv pip install --no-deps onnxruntime-gpu uv pip install -e .
ENV PATH="/app/.venv/bin:$PATH" ENV PATH="/app/.venv/bin:$PATH"
ENV VIRTUAL_ENV="/app/.venv" ENV VIRTUAL_ENV="/app/.venv"
-68
View File
@@ -1,68 +0,0 @@
# Stage 1: Build — install Python deps with dev tools available
FROM rocm/dev-ubuntu-24.04:6.4-complete AS builder
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.12 python3.12-venv python3.12-dev python3-pip \
libpoppler-cpp-dev poppler-utils \
build-essential curl \
&& rm -rf /var/lib/apt/lists/*
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app
COPY pyproject.toml ./
COPY kb/ kb/
COPY main.py ./
COPY VERSION ./
RUN uv venv .venv && \
. .venv/bin/activate && \
uv pip install -e . && \
uv pip install --no-deps onnxruntime-rocm
# Stage 2: Runtime — minimal ROCm runtime libs only
FROM ubuntu:24.04
ENV DEBIAN_FRONTEND=noninteractive
# Add ROCm apt repository
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates curl gnupg \
&& mkdir -p /etc/apt/keyrings \
&& curl -fsSL https://repo.radeon.com/rocm/rocm.gpg.key \
| gpg --dearmor -o /etc/apt/keyrings/rocm.gpg \
&& echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4.1 noble main" \
> /etc/apt/sources.list.d/rocm.list \
&& printf 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600\n' \
> /etc/apt/preferences.d/rocm-pin-600 \
&& apt-get update && apt-get install -y --no-install-recommends \
python3.12 python3.12-venv \
libpoppler-cpp0t64 poppler-utils \
libgl1 libglib2.0-0 \
rocm-hip-runtime \
rocm-hip-libraries \
miopen-hip \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Copy built venv and application from builder
COPY --from=builder /app/.venv .venv
COPY --from=builder /app/kb kb
COPY --from=builder /app/main.py .
COPY --from=builder /app/pyproject.toml .
COPY --from=builder /app/VERSION .
ENV PATH="/app/.venv/bin:$PATH"
ENV VIRTUAL_ENV="/app/.venv"
ENV KB_DEVICE=auto
ENV KB_INGEST_DEVICE=auto
ENV KB_DATA_DIR=/data
EXPOSE 8000
VOLUME ["/data"]
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
+1 -1
View File
@@ -1 +1 @@
3.2.0 3.2.3
-38
View File
@@ -1,38 +0,0 @@
services:
kb-engine:
build:
context: .
dockerfile: Dockerfile.rocm
devices:
- "/dev/kfd"
- "/dev/dri"
group_add:
- "video"
ports:
- "${KB_PORT:-8000}:8000"
volumes:
- ${KB_DATA_PATH:-./data}:/data
environment:
- KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
- KB_DEVICE=${KB_DEVICE:-auto}
- KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
- KB_API_KEY=${KB_API_KEY:-}
- KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
- HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
restart: unless-stopped
kb-mcp:
build:
context: ../mcp
dockerfile: Dockerfile
ports:
- "${KB_MCP_PORT:-3000}:3000"
environment:
- KB_ENGINE_URL=http://kb-engine:8000
- KB_API_KEY=${KB_API_KEY:-}
- KB_MCP_API_KEY=${KB_MCP_API_KEY:-}
# Comma-separated IPs/FQDNs allowed to connect remotely (e.g. 192.168.1.50,kb.example.com)
- KB_MCP_ALLOWED_HOSTS=${KB_MCP_ALLOWED_HOSTS:-}
depends_on:
- kb-engine
restart: unless-stopped
+1
View File
@@ -74,6 +74,7 @@ def get_connection(db_path: str) -> sqlite3.Connection:
conn.enable_load_extension(False) conn.enable_load_extension(False)
conn.row_factory = sqlite3.Row conn.row_factory = sqlite3.Row
conn.execute("PRAGMA journal_mode=WAL") conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA busy_timeout=5000")
conn.execute("PRAGMA foreign_keys=ON") conn.execute("PRAGMA foreign_keys=ON")
return conn return conn
+4 -2
View File
@@ -16,7 +16,8 @@ def stage_file(staging_dir: Path, filename: str, content: bytes) -> Path:
The path to the newly created staged file. The path to the newly created staged file.
""" """
staging_dir.mkdir(parents=True, exist_ok=True) staging_dir.mkdir(parents=True, exist_ok=True)
dest = staging_dir / f"{uuid.uuid4()}_{filename}" safe_filename = filename.replace("/", "_").replace("\\", "_")
dest = staging_dir / f"{uuid.uuid4()}_{safe_filename}"
dest.write_bytes(content) dest.write_bytes(content)
logger.debug("Staged file: %s (%d bytes)", dest, len(content)) logger.debug("Staged file: %s (%d bytes)", dest, len(content))
return dest return dest
@@ -31,7 +32,8 @@ def stage_note(staging_dir: Path, title: str, text: str) -> Path:
The path to the newly created staged note file. The path to the newly created staged note file.
""" """
staging_dir.mkdir(parents=True, exist_ok=True) staging_dir.mkdir(parents=True, exist_ok=True)
dest = staging_dir / f"{uuid.uuid4()}_{title}.note" safe_title = title.replace("/", "_").replace("\\", "_")
dest = staging_dir / f"{uuid.uuid4()}_{safe_title}.note"
dest.write_text(text, encoding="utf-8") dest.write_text(text, encoding="utf-8")
logger.debug("Staged note: %s (%d chars)", dest, len(text)) logger.debug("Staged note: %s (%d chars)", dest, len(text))
return dest return dest
+24 -10
View File
@@ -44,11 +44,16 @@ _transport_security = TransportSecuritySettings(
mcp = FastMCP( mcp = FastMCP(
"kb", "kb",
instructions=( instructions=(
"Knowledge base MCP server. Provides tools for searching, adding, and " "Knowledge base MCP server with hybrid semantic + full-text search. "
"managing documents and notes. Use tags to organise and filter documents " "kb_search uses dense vector embeddings (semantic similarity) fused with "
"(e.g. tag notes with 'agent:mybot' and filter searches by that tag). " "BM25 full-text ranking, so it finds conceptually related content even "
"This server requires Bearer token authentication — all requests are " "when the exact words don't match — agents can ask natural-language "
"authenticated via the Authorization header at the HTTP transport layer." "questions rather than guessing keywords. Also provides tools for adding "
"notes, uploading files, and managing documents and tags. Use tags to "
"organise and filter documents (e.g. tag notes with 'agent:mybot' and "
"filter searches by that tag). This server requires Bearer token "
"authentication — all requests are authenticated via the Authorization "
"header at the HTTP transport layer."
), ),
transport_security=_transport_security, transport_security=_transport_security,
) )
@@ -62,17 +67,25 @@ async def kb_search(
doc_type: str | None = None, doc_type: str | None = None,
fts_only: bool = False, fts_only: bool = False,
) -> str: ) -> str:
"""Search the knowledge base for relevant documents and notes. """Hybrid semantic (vector) + full-text search over the knowledge base.
Returns ranked chunks matching the query, with text content, relevance scores, Combines dense vector embeddings (semantic similarity — finds conceptually
and document metadata. related content even when the wording differs) with BM25 keyword ranking,
fused via reciprocal rank fusion. Because the search is semantic, you can
ask natural-language questions ("what did we decide about X?") rather than
guessing the exact keywords used in the source documents.
Returns ranked chunks matching the query, with text content, relevance
scores, and document metadata.
Args: Args:
query: The search query. Can be a natural language question or keywords. query: The search query a natural language question or keywords.
top: Maximum number of results to return (default 10). top: Maximum number of results to return (default 10).
tags: Filter results to documents with ALL of these tags. tags: Filter results to documents with ALL of these tags.
doc_type: Filter by document type (e.g. "note", "pdf", "markdown", "code"). doc_type: Filter by document type (e.g. "note", "pdf", "markdown", "code").
fts_only: If true, use only full-text search (no vector similarity). fts_only: Disable the vector/semantic component and use only BM25
keyword matching. Default false (hybrid mode). Set true only when
you need exact-string matching (e.g. an error code, identifier).
Tips for complex queries: Tips for complex queries:
- Consider expanding into 2-3 variant phrasings and calling this tool multiple - Consider expanding into 2-3 variant phrasings and calling this tool multiple
@@ -80,6 +93,7 @@ async def kb_search(
"pension revaluation rules" and "how are pensions revalued" to cast a wider net. "pension revaluation rules" and "how are pensions revalued" to cast a wider net.
- For precision, rerank the returned results using your own judgement based on - For precision, rerank the returned results using your own judgement based on
relevance to the original question. relevance to the original question.
- Call kb_status to see which embedding model is in use.
""" """
result = engine.search( result = engine.search(
query=query, query=query,
+58
View File
@@ -0,0 +1,58 @@
# kb — Next Steps
UX improvements to make documents easier to find and inspect, prompted by a session where searching for an uploaded PDF (`M38T_PHEV_RHD_OM_EN_UK_20251209.pdf`, doc id 2077, 1801 chunks) surfaced lots of chunk hits but no obvious path back to the original document.
## Problems observed
### 1. `kb list` silently ignores positional arguments
```
kb list --type pdf "M38T_PHEV_RHD_OM_EN_UK_20251209"
```
The quoted term is dropped without warning; user gets the default newest-first listing and assumes the document is missing. `kb list` currently only supports `--tags` and `--type` filters.
### 2. `kb search` returns chunks with no `document_id`
Result objects expose `chunk_id`, `title`, `source_path`, `tags` — but not `document_id`. To get from a search hit back to the owning document you have to title-match against `kb list` output or call an undocumented endpoint. The skill docs even claim a `source.document_id` field that isn't actually present in the CLI output.
### 3. `kb info` dumps every chunk with no summary mode
`kb info 2077` returns ~1801 chunk objects. The document-level metadata (`id`, `title`, `original_filename`, `source_path`, `stored_path`, `doc_type`, `language`, `content_hash`, `has_file`, `tags`, `created_at`, `updated_at`) **is** present at the top level of the JSON, but in practice it's invisible — human format presumably dumps the chunk list and the user sees only chunks.
There's no way to ask for "just tell me about this document."
### 4. Search hits can look like noise on image-heavy PDFs
Top chunks for the M38T search were single characters (`"1"`, `"B"`, `"\""`). Almost certainly an FTS artefact on short tokens from a scan/image-heavy PDF — but it makes the result set look broken. Worth considering a minimum-text-length filter on indexed chunks, or down-weighting very short chunks in ranking.
## Proposed changes
### Small / high-value
- **`kb info --no-chunks`** (or make `--chunks` opt-in): default to metadata + chunk count, only include chunks when asked. Human format should always lead with the metadata block.
- **`kb list --title <substring>`** (or accept a positional query) for filename / title search. At minimum, error or warn when positional args are passed and ignored.
- **Include `document_id` in `kb search` result objects.** Either at the top of each result or under `source.document_id` (matching the skill docs).
### Medium
- **`kb find <query>`** as a doc-level search that aggregates chunk hits per document and returns ranked *documents* (with hit count, top chunk preview). This is what users usually want when they say "find my PDF about X."
- **Update the `kb` skill docs** to match actual CLI output shape, and to steer users toward `kb list | jq` for filename lookups until proper filtering lands.
### Larger
- **Quality filter for short chunks** during ingestion (e.g. drop chunks with < N alphanumeric chars, or fold them into neighbours). Stops scanned/image-heavy PDFs from polluting search.
- **OCR path for scan-heavy PDFs.** The M38T manual extracted enough real text to be useful, but other "scan" docs likely don't. Detect low text density per page and route through OCR.
## Quick reference (current workarounds)
```bash
# Find a doc by filename
kb list --type pdf --format json | jq '.[] | select(.title | contains("M38T"))'
# Get just metadata for a doc
kb info 2077 --format json | jq 'del(.chunks)'
# Download the original
kb export 2077 -o manual.pdf
```
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-04-06
@@ -0,0 +1,37 @@
## Context
The project currently ships three Docker image variants: CPU, NVIDIA, and AMD ROCm. The ROCm variant requires a 4.2GB pre-built torch wheel, a multi-stage Dockerfile with ROCm-specific runtime libraries, and additional build/push steps in the release pipeline. ROCm support is less tested and adds disproportionate complexity relative to its usage.
## Goals / Non-Goals
**Goals:**
- Remove all ROCm-specific files (Dockerfile, compose file, torch wheel)
- Remove ROCm build/push from the release pipeline
- Update all documentation to reflect CPU + NVIDIA only
- Update the docker-deployment spec to remove ROCm requirements
**Non-Goals:**
- Changing any engine application code (it is already GPU-vendor-agnostic via PyTorch)
- Modifying the CPU or NVIDIA Dockerfiles (beyond what's already in-flight)
- Providing a migration path for ROCm users (they can stay on 3.2.x or use CPU mode)
## Decisions
**1. Delete ROCm files outright rather than deprecating**
Remove `Dockerfile.rocm`, `compose.rocm.yaml`, and `assets/` immediately rather than marking them deprecated. There are no downstream consumers that depend on automated ROCm builds — anyone needing AMD support can pin to the last ROCm-supporting release.
*Alternative considered*: Keep files but stop publishing images. Rejected — dead code is confusing and still requires maintenance awareness.
**2. Leave archived openspec changes untouched**
Archived changes under `openspec/changes/archive/` contain historical ROCm references. These are historical records and should not be modified.
**3. Update GPU-vendor-agnostic requirement to reflect NVIDIA-only scope**
The existing spec requirement "Application code is GPU-vendor-agnostic" remains true at the code level (PyTorch abstracts GPU vendors), but the project no longer provides or tests ROCm images. The spec should be simplified to reflect that only NVIDIA and CPU are supported deployment targets.
## Risks / Trade-offs
- **[Breaking change for AMD users]** → Users on AMD GPUs must stay on 3.2.x or use CPU mode. Mitigated by the fact that ROCm support was already "less tested" per the original design risk assessment.
- **[Future re-addition harder]** → If ROCm support is needed later, the Dockerfile and compose file would need to be recreated. Mitigated by git history preserving the removed files.
@@ -0,0 +1,29 @@
## Why
AMD ROCm support adds significant complexity and maintenance burden to the project — the ROCm torch wheel alone is 4.2GB, the Dockerfile requires a multi-stage build with ROCm-specific runtime libraries, and the release pipeline must build/push additional images. The final container is >20Gb. ROCm support is less tested and less commonly used than CPU or NVIDIA. Removing it keeps the project focused and manageable.
## What Changes
- **BREAKING**: Remove AMD ROCm Docker image (`Dockerfile.rocm`) and compose file (`compose.rocm.yaml`)
- **BREAKING**: Remove ROCm image build/push/release-notes from the engine release script
- Remove pre-built ROCm torch wheel from `assets/`
- Remove all AMD/ROCm references from user-facing docs (README, DEVELOPER)
- Update docker-deployment spec to reflect CPU + NVIDIA only
## Capabilities
### New Capabilities
_(none)_
### Modified Capabilities
- `docker-deployment`: Remove AMD ROCm Docker image requirement and all ROCm-specific scenarios. Deployment now covers CPU and NVIDIA only.
## Impact
- **Docker images**: ROCm image variant no longer published
- **Users**: Anyone running KB on AMD GPUs will need to stay on the last version with ROCm support (3.2.x) or switch to CPU mode
- **Release pipeline**: `release-engine.sh` simplified — only CPU and NVIDIA images
- **Repository size**: ~4.2GB reduction by removing the torch wheel from `assets/`
- **Docs**: README and DEVELOPER updated to remove AMD quick-start and build instructions
@@ -0,0 +1,76 @@
## REMOVED Requirements
### Requirement: AMD ROCm Docker image
**Reason**: AMD ROCm support removed to reduce project complexity and binary size. The ROCm torch wheel is 4.2GB and the variant is less tested than CPU or NVIDIA.
**Migration**: Users on AMD GPUs should stay on engine v3.2.x or switch to CPU mode (`KB_DEVICE=cpu`).
---
## MODIFIED Requirements
### Requirement: Application code is GPU-vendor-agnostic
The Python engine code SHALL NOT reference CUDA directly. GPU abstraction SHALL be handled at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and CPU images without modification.
#### Scenario: Same engine code on both platforms
- **WHEN** the engine starts on an NVIDIA image and a CPU image with identical configuration
- **THEN** both SHALL load the model, accept requests, and return identical search results for the same query and data
---
### Requirement: Compose files for deployment
The project SHALL provide Docker Compose files for single-command deployment. Compose files SHALL use `build:` context for local development. Release notes SHALL document the versioned image tag for users pulling pre-built images.
#### Scenario: Start NVIDIA deployment
- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
- **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
#### Scenario: Automatic restart
- **WHEN** the engine process crashes or the host reboots
- **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)
#### Scenario: Configure via environment
- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, KB_MCP_ALLOWED_HOSTS, etc.)
- **THEN** the engine and MCP server SHALL use those values
#### Scenario: Pre-built image deployment
- **WHEN** an admin wants to use a pre-built engine image without building from source
- **THEN** the engine release notes SHALL include the exact `docker pull` command with the versioned tag (e.g. `docker.dcglab.co.uk/dcg/kb/engine:engine-v2.1.0-nvidia`)
#### Scenario: MCP allowed hosts in Compose
- **WHEN** the kb-mcp service is defined in a Compose file
- **THEN** the environment block SHALL include `KB_MCP_ALLOWED_HOSTS` with a comment explaining its format and purpose
---
### Requirement: Bind-mount data directory
The engine SHALL store all persistent state (SQLite database, HF model cache, staging directory) under a single configurable data directory. This directory SHALL be mounted from the host via bind mount.
#### Scenario: Data directory structure
- **WHEN** the engine starts for the first time
- **THEN** it SHALL create the following structure under the data directory:
- `kb.db` — SQLite database
- `hf_cache/` — HuggingFace model cache
- `staging/` — temporary files for queued ingestion jobs
#### Scenario: Portable data across hosts
- **WHEN** an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
- **THEN** the engine SHALL start successfully and serve all previously ingested documents without reprocessing
---
### Requirement: CPU-only fallback
The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.
#### Scenario: No GPU available
- **WHEN** the container starts without GPU passthrough (no `--gpus`)
- **THEN** the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable
#### Scenario: Explicit CPU mode
- **WHEN** `KB_DEVICE=cpu` and `KB_INGEST_DEVICE=cpu` are set in the environment
- **THEN** the engine SHALL use CPU regardless of GPU availability
@@ -0,0 +1,20 @@
## 1. Delete ROCm files
- [x] 1.1 Delete `engine/Dockerfile.rocm`
- [x] 1.2 Delete `engine/compose.rocm.yaml`
- [x] 1.3 Delete `assets/` directory (ROCm torch wheel)
## 2. Update release pipeline
- [x] 2.1 Remove ROCm image build, tag, and push from `release-engine.sh`
- [x] 2.2 Remove ROCm entries from release notes output in `release-engine.sh`
## 3. Update documentation
- [x] 3.1 Remove AMD GPU quick-start section and ROCm references from `README.md`
- [x] 3.2 Remove ROCm build instructions and `compose.rocm.yaml` references from `DEVELOPER.md`
- [x] 3.3 Remove `onnxruntime-rocm` migration note from `DEVELOPER.md`
## 4. Update specs
- [x] 4.1 Update `openspec/specs/docker-deployment/spec.md` — remove AMD ROCm requirement, remove ROCm scenarios, update GPU-agnostic requirement to CPU + NVIDIA scope
+1 -12
View File
@@ -10,7 +10,7 @@ DEVELOPER.md SHALL contain instructions for building both the engine and client
#### Scenario: Engine build from source #### Scenario: Engine build from source
- **WHEN** a developer reads DEVELOPER.md - **WHEN** a developer reads DEVELOPER.md
- **THEN** it SHALL include instructions for starting the engine from source using compose files (both NVIDIA and ROCm) - **THEN** it SHALL include instructions for starting the engine from source using compose files (NVIDIA and CPU)
#### Scenario: Client build from source #### Scenario: Client build from source
- **WHEN** a developer reads DEVELOPER.md - **WHEN** a developer reads DEVELOPER.md
@@ -31,13 +31,6 @@ DEVELOPER.md SHALL document the release process for both client and engine, incl
- **WHEN** a developer reads DEVELOPER.md - **WHEN** a developer reads DEVELOPER.md
- **THEN** it SHALL include how to check client and engine versions - **THEN** it SHALL include how to check client and engine versions
### Requirement: DEVELOPER.md contains developer notes
DEVELOPER.md SHALL include any forward-looking developer notes such as migration plans or technical debt items.
#### Scenario: ROCm migration note
- **WHEN** a developer reads DEVELOPER.md
- **THEN** it SHALL include the ROCm runtime migration note about onnxruntime and MIGraphX
### Requirement: README.md excludes developer-only content ### Requirement: README.md excludes developer-only content
README.md SHALL NOT contain build-from-source instructions, release processes, or developer-only notes. README.md SHALL NOT contain build-from-source instructions, release processes, or developer-only notes.
@@ -49,10 +42,6 @@ README.md SHALL NOT contain build-from-source instructions, release processes, o
- **WHEN** a user reads README.md - **WHEN** a user reads README.md
- **THEN** there SHALL be no "Building and releasing" section - **THEN** there SHALL be no "Building and releasing" section
#### Scenario: No developer notes in README
- **WHEN** a user reads README.md
- **THEN** there SHALL be no "Future: ROCm runtime migration" section
### Requirement: README.md cross-references DEVELOPER.md ### Requirement: README.md cross-references DEVELOPER.md
README.md SHALL include a link to DEVELOPER.md for users who want to build from source or contribute. README.md SHALL include a link to DEVELOPER.md for users who want to build from source or contribute.
+4 -26
View File
@@ -2,7 +2,7 @@
## Purpose ## Purpose
Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA and AMD platforms, along with Compose files for single-command deployment. Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA, along with Compose files for single-command deployment.
## Requirements ## Requirements
@@ -20,26 +20,12 @@ The project SHALL provide a `Dockerfile.nvidia` that builds the engine on an NVI
--- ---
### Requirement: AMD ROCm Docker image
The project SHALL provide a `Dockerfile.rocm` that builds the engine on an AMD ROCm base image with GPU support for PyTorch and ONNX Runtime.
#### Scenario: Build ROCm image
- **WHEN** an admin runs `docker compose -f compose.rocm.yaml build`
- **THEN** the build SHALL produce a working image with ROCm runtime, PyTorch with ROCm support, onnxruntime-rocm, and all engine dependencies
#### Scenario: GPU access in ROCm container
- **WHEN** the ROCm container starts with `--device=/dev/kfd --device=/dev/dri`
- **THEN** `torch.cuda.is_available()` SHALL return True (via HIP) and the engine SHALL load the embedding model on GPU
---
### Requirement: Application code is GPU-vendor-agnostic ### Requirement: Application code is GPU-vendor-agnostic
The Python engine code SHALL NOT reference CUDA or ROCm directly. GPU vendor abstraction SHALL be handled entirely at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and AMD images without modification. The Python engine code SHALL NOT reference CUDA directly. GPU abstraction SHALL be handled at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and CPU images without modification.
#### Scenario: Same engine code on both platforms #### Scenario: Same engine code on both platforms
- **WHEN** the engine starts on an NVIDIA image and an AMD image with identical configuration - **WHEN** the engine starts on an NVIDIA image and a CPU image with identical configuration
- **THEN** both SHALL load the model, accept requests, and return identical search results for the same query and data - **THEN** both SHALL load the model, accept requests, and return identical search results for the same query and data
--- ---
@@ -59,10 +45,6 @@ The engine SHALL store all persistent state (SQLite database, HF model cache, st
- **WHEN** an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path - **WHEN** an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
- **THEN** the engine SHALL start successfully and serve all previously ingested documents without reprocessing - **THEN** the engine SHALL start successfully and serve all previously ingested documents without reprocessing
#### Scenario: Portable data across GPU vendors
- **WHEN** an admin moves the data directory from an NVIDIA host to an AMD host (same model name)
- **THEN** the engine SHALL start successfully. Embeddings in the database remain valid (they are model-specific, not GPU-vendor-specific)
--- ---
### Requirement: Compose files for deployment ### Requirement: Compose files for deployment
@@ -73,10 +55,6 @@ The project SHALL provide Docker Compose files for single-command deployment. Co
- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d` - **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
- **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port - **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
#### Scenario: Start ROCm deployment
- **WHEN** an admin runs `docker compose -f compose.rocm.yaml up -d`
- **THEN** the engine SHALL start with GPU access via ROCm device passthrough, bind-mount the data directory, and be reachable on the configured port
#### Scenario: Automatic restart #### Scenario: Automatic restart
- **WHEN** the engine process crashes or the host reboots - **WHEN** the engine process crashes or the host reboots
- **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`) - **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)
@@ -130,7 +108,7 @@ The MCP server SHALL accept a `KB_MCP_ALLOWED_HOSTS` environment variable contai
The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations. The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.
#### Scenario: No GPU available #### Scenario: No GPU available
- **WHEN** the container starts without GPU passthrough (no `--gpus`, no `/dev/kfd`) - **WHEN** the container starts without GPU passthrough (no `--gpus`)
- **THEN** the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable - **THEN** the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable
#### Scenario: Explicit CPU mode #### Scenario: Explicit CPU mode
-9
View File
@@ -151,14 +151,11 @@ fi
echo "==> Building Docker engine images ($VERSION)" echo "==> Building Docker engine images ($VERSION)"
NVIDIA_IMAGE="${IMAGE_BASE}/engine:${DOCKER_TAG}-nvidia" NVIDIA_IMAGE="${IMAGE_BASE}/engine:${DOCKER_TAG}-nvidia"
ROCM_IMAGE="${IMAGE_BASE}/engine:${DOCKER_TAG}-rocm"
CPU_IMAGE="${IMAGE_BASE}/engine:${DOCKER_TAG}-cpu" CPU_IMAGE="${IMAGE_BASE}/engine:${DOCKER_TAG}-cpu"
NVIDIA_LATEST="${IMAGE_BASE}/engine:latest-nvidia" NVIDIA_LATEST="${IMAGE_BASE}/engine:latest-nvidia"
ROCM_LATEST="${IMAGE_BASE}/engine:latest-rocm"
CPU_LATEST="${IMAGE_BASE}/engine:latest-cpu" CPU_LATEST="${IMAGE_BASE}/engine:latest-cpu"
run docker build -t "$NVIDIA_IMAGE" -t "$NVIDIA_LATEST" -f "$ENGINE_DIR/Dockerfile.nvidia" "$ENGINE_DIR" run docker build -t "$NVIDIA_IMAGE" -t "$NVIDIA_LATEST" -f "$ENGINE_DIR/Dockerfile.nvidia" "$ENGINE_DIR"
run docker build -t "$ROCM_IMAGE" -t "$ROCM_LATEST" -f "$ENGINE_DIR/Dockerfile.rocm" "$ENGINE_DIR"
run docker build -t "$CPU_IMAGE" -t "$CPU_LATEST" -f "$ENGINE_DIR/Dockerfile.cpu" "$ENGINE_DIR" run docker build -t "$CPU_IMAGE" -t "$CPU_LATEST" -f "$ENGINE_DIR/Dockerfile.cpu" "$ENGINE_DIR"
echo "" echo ""
@@ -207,9 +204,6 @@ RELEASE_NOTES="## Docker images
# NVIDIA GPU # NVIDIA GPU
docker pull ${NVIDIA_IMAGE} docker pull ${NVIDIA_IMAGE}
# AMD GPU (ROCm)
docker pull ${ROCM_IMAGE}
# CPU only # CPU only
docker pull ${CPU_IMAGE} docker pull ${CPU_IMAGE}
\`\`\` \`\`\`
@@ -241,8 +235,6 @@ echo "==> Pushing Docker images to $REGISTRY"
run docker push "$NVIDIA_IMAGE" run docker push "$NVIDIA_IMAGE"
run docker push "$NVIDIA_LATEST" run docker push "$NVIDIA_LATEST"
run docker push "$ROCM_IMAGE"
run docker push "$ROCM_LATEST"
run docker push "$CPU_IMAGE" run docker push "$CPU_IMAGE"
run docker push "$CPU_LATEST" run docker push "$CPU_LATEST"
@@ -256,7 +248,6 @@ echo "==> Release $GIT_TAG complete!"
echo "" echo ""
echo " Images:" echo " Images:"
echo " $NVIDIA_IMAGE" echo " $NVIDIA_IMAGE"
echo " $ROCM_IMAGE"
echo " $CPU_IMAGE" echo " $CPU_IMAGE"
if [[ -n "${MCP_IMAGE:-}" ]]; then if [[ -n "${MCP_IMAGE:-}" ]]; then
echo " $MCP_IMAGE" echo " $MCP_IMAGE"