Remove AMD ROCm support — CPU and NVIDIA only

BREAKING: Remove Dockerfile.rocm, compose.rocm.yaml, and ROCm image
build/push from the release pipeline. Remove AMD quick-start and ROCm
references from README and DEVELOPER docs. Update docker-deployment
and developer-docs specs to reflect CPU + NVIDIA only.

The ROCm variant added significant complexity (4.2GB torch wheel,
>20GB container) with limited usage. Users on AMD GPUs should stay
on engine v3.2.x or switch to CPU mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-06 16:39:37 +01:00
parent 17b19999de
commit 574370e8d1
12 changed files with 174 additions and 185 deletions
+1 -12
View File
@@ -10,7 +10,7 @@ DEVELOPER.md SHALL contain instructions for building both the engine and client
#### Scenario: Engine build from source
- **WHEN** a developer reads DEVELOPER.md
- **THEN** it SHALL include instructions for starting the engine from source using compose files (both NVIDIA and ROCm)
- **THEN** it SHALL include instructions for starting the engine from source using compose files (NVIDIA and CPU)
#### Scenario: Client build from source
- **WHEN** a developer reads DEVELOPER.md
@@ -31,13 +31,6 @@ DEVELOPER.md SHALL document the release process for both client and engine, incl
- **WHEN** a developer reads DEVELOPER.md
- **THEN** it SHALL include how to check client and engine versions
### Requirement: DEVELOPER.md contains developer notes
DEVELOPER.md SHALL include any forward-looking developer notes such as migration plans or technical debt items.
#### Scenario: ROCm migration note
- **WHEN** a developer reads DEVELOPER.md
- **THEN** it SHALL include the ROCm runtime migration note about onnxruntime and MIGraphX
### Requirement: README.md excludes developer-only content
README.md SHALL NOT contain build-from-source instructions, release processes, or developer-only notes.
@@ -49,10 +42,6 @@ README.md SHALL NOT contain build-from-source instructions, release processes, o
- **WHEN** a user reads README.md
- **THEN** there SHALL be no "Building and releasing" section
#### Scenario: No developer notes in README
- **WHEN** a user reads README.md
- **THEN** there SHALL be no "Future: ROCm runtime migration" section
### Requirement: README.md cross-references DEVELOPER.md
README.md SHALL include a link to DEVELOPER.md for users who want to build from source or contribute.
+4 -26
View File
@@ -2,7 +2,7 @@
## Purpose
Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA and AMD platforms, along with Compose files for single-command deployment.
Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA, along with Compose files for single-command deployment.
## Requirements
@@ -20,26 +20,12 @@ The project SHALL provide a `Dockerfile.nvidia` that builds the engine on an NVI
---
### Requirement: AMD ROCm Docker image
The project SHALL provide a `Dockerfile.rocm` that builds the engine on an AMD ROCm base image with GPU support for PyTorch and ONNX Runtime.
#### Scenario: Build ROCm image
- **WHEN** an admin runs `docker compose -f compose.rocm.yaml build`
- **THEN** the build SHALL produce a working image with ROCm runtime, PyTorch with ROCm support, onnxruntime-rocm, and all engine dependencies
#### Scenario: GPU access in ROCm container
- **WHEN** the ROCm container starts with `--device=/dev/kfd --device=/dev/dri`
- **THEN** `torch.cuda.is_available()` SHALL return True (via HIP) and the engine SHALL load the embedding model on GPU
---
### Requirement: Application code is GPU-vendor-agnostic
The Python engine code SHALL NOT reference CUDA or ROCm directly. GPU vendor abstraction SHALL be handled entirely at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and AMD images without modification.
The Python engine code SHALL NOT reference CUDA directly. GPU abstraction SHALL be handled at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and CPU images without modification.
#### Scenario: Same engine code on both platforms
- **WHEN** the engine starts on an NVIDIA image and an AMD image with identical configuration
- **WHEN** the engine starts on an NVIDIA image and a CPU image with identical configuration
- **THEN** both SHALL load the model, accept requests, and return identical search results for the same query and data
---
@@ -59,10 +45,6 @@ The engine SHALL store all persistent state (SQLite database, HF model cache, st
- **WHEN** an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
- **THEN** the engine SHALL start successfully and serve all previously ingested documents without reprocessing
#### Scenario: Portable data across GPU vendors
- **WHEN** an admin moves the data directory from an NVIDIA host to an AMD host (same model name)
- **THEN** the engine SHALL start successfully. Embeddings in the database remain valid (they are model-specific, not GPU-vendor-specific)
---
### Requirement: Compose files for deployment
@@ -73,10 +55,6 @@ The project SHALL provide Docker Compose files for single-command deployment. Co
- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
- **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
#### Scenario: Start ROCm deployment
- **WHEN** an admin runs `docker compose -f compose.rocm.yaml up -d`
- **THEN** the engine SHALL start with GPU access via ROCm device passthrough, bind-mount the data directory, and be reachable on the configured port
#### Scenario: Automatic restart
- **WHEN** the engine process crashes or the host reboots
- **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)
@@ -130,7 +108,7 @@ The MCP server SHALL accept a `KB_MCP_ALLOWED_HOSTS` environment variable contai
The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.
#### Scenario: No GPU available
- **WHEN** the container starts without GPU passthrough (no `--gpus`, no `/dev/kfd`)
- **WHEN** the container starts without GPU passthrough (no `--gpus`)
- **THEN** the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable
#### Scenario: Explicit CPU mode