BREAKING: Remove Dockerfile.rocm, compose.rocm.yaml, and ROCm image build/push from the release pipeline. Remove AMD quick-start and ROCm references from README and DEVELOPER docs. Update docker-deployment and developer-docs specs to reflect CPU + NVIDIA only. The ROCm variant added significant complexity (4.2GB torch wheel, >20GB container) with limited usage. Users on AMD GPUs should stay on engine v3.2.x or switch to CPU mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3.7 KiB
REMOVED Requirements
Requirement: AMD ROCm Docker image
Reason: AMD ROCm support removed to reduce project complexity and binary size. The ROCm torch wheel is 4.2GB and the variant is less tested than CPU or NVIDIA.
Migration: Users on AMD GPUs should stay on engine v3.2.x or switch to CPU mode (KB_DEVICE=cpu).
MODIFIED Requirements
Requirement: Application code is GPU-vendor-agnostic
The Python engine code SHALL NOT reference CUDA directly. GPU abstraction SHALL be handled at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and CPU images without modification.
Scenario: Same engine code on both platforms
- WHEN the engine starts on an NVIDIA image and a CPU image with identical configuration
- THEN both SHALL load the model, accept requests, and return identical search results for the same query and data
Requirement: Compose files for deployment
The project SHALL provide Docker Compose files for single-command deployment. Compose files SHALL use build: context for local development. Release notes SHALL document the versioned image tag for users pulling pre-built images.
Scenario: Start NVIDIA deployment
- WHEN an admin runs
docker compose -f compose.nvidia.yaml up -d - THEN the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
Scenario: Automatic restart
- WHEN the engine process crashes or the host reboots
- THEN Docker SHALL automatically restart the container (restart policy
unless-stopped)
Scenario: Configure via environment
- WHEN an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, KB_MCP_ALLOWED_HOSTS, etc.)
- THEN the engine and MCP server SHALL use those values
Scenario: Pre-built image deployment
- WHEN an admin wants to use a pre-built engine image without building from source
- THEN the engine release notes SHALL include the exact
docker pullcommand with the versioned tag (e.g.docker.dcglab.co.uk/dcg/kb/engine:engine-v2.1.0-nvidia)
Scenario: MCP allowed hosts in Compose
- WHEN the kb-mcp service is defined in a Compose file
- THEN the environment block SHALL include
KB_MCP_ALLOWED_HOSTSwith a comment explaining its format and purpose
Requirement: Bind-mount data directory
The engine SHALL store all persistent state (SQLite database, HF model cache, staging directory) under a single configurable data directory. This directory SHALL be mounted from the host via bind mount.
Scenario: Data directory structure
- WHEN the engine starts for the first time
- THEN it SHALL create the following structure under the data directory:
kb.db— SQLite databasehf_cache/— HuggingFace model cachestaging/— temporary files for queued ingestion jobs
Scenario: Portable data across hosts
- WHEN an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
- THEN the engine SHALL start successfully and serve all previously ingested documents without reprocessing
Requirement: CPU-only fallback
The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.
Scenario: No GPU available
- WHEN the container starts without GPU passthrough (no
--gpus) - THEN the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable
Scenario: Explicit CPU mode
- WHEN
KB_DEVICE=cpuandKB_INGEST_DEVICE=cpuare set in the environment - THEN the engine SHALL use CPU regardless of GPU availability