steve/kb

Fork 0

Files

T

steve 223ff2cf5d Latest changes all archived

2026-04-04 22:50:19 +01:00

7.2 KiB

Raw Blame History

Docker Deployment

Purpose

Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA and AMD platforms, along with Compose files for single-command deployment.

Requirements

Requirement: NVIDIA CUDA Docker image

The project SHALL provide a Dockerfile.nvidia that builds the engine on an NVIDIA CUDA runtime base image with GPU support for PyTorch and ONNX Runtime.

Scenario: Build NVIDIA image

WHEN an admin runs docker compose -f compose.nvidia.yaml build
THEN the build SHALL produce a working image with CUDA runtime, PyTorch with CUDA support, onnxruntime-gpu, and all engine dependencies

Scenario: GPU access in NVIDIA container

WHEN the NVIDIA container starts with --gpus all or the NVIDIA runtime
THEN torch.cuda.is_available() SHALL return True and the engine SHALL load the embedding model on GPU

Requirement: AMD ROCm Docker image

The project SHALL provide a Dockerfile.rocm that builds the engine on an AMD ROCm base image with GPU support for PyTorch and ONNX Runtime.

Scenario: Build ROCm image

WHEN an admin runs docker compose -f compose.rocm.yaml build
THEN the build SHALL produce a working image with ROCm runtime, PyTorch with ROCm support, onnxruntime-rocm, and all engine dependencies

Scenario: GPU access in ROCm container

WHEN the ROCm container starts with --device=/dev/kfd --device=/dev/dri
THEN torch.cuda.is_available() SHALL return True (via HIP) and the engine SHALL load the embedding model on GPU

Requirement: Application code is GPU-vendor-agnostic

The Python engine code SHALL NOT reference CUDA or ROCm directly. GPU vendor abstraction SHALL be handled entirely at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and AMD images without modification.

Scenario: Same engine code on both platforms

WHEN the engine starts on an NVIDIA image and an AMD image with identical configuration
THEN both SHALL load the model, accept requests, and return identical search results for the same query and data

Requirement: Bind-mount data directory

The engine SHALL store all persistent state (SQLite database, HF model cache, staging directory) under a single configurable data directory. This directory SHALL be mounted from the host via bind mount.

Scenario: Data directory structure

WHEN the engine starts for the first time
THEN it SHALL create the following structure under the data directory:
- kb.db — SQLite database
- hf_cache/ — HuggingFace model cache
- staging/ — temporary files for queued ingestion jobs

Scenario: Portable data across hosts

WHEN an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
THEN the engine SHALL start successfully and serve all previously ingested documents without reprocessing

Scenario: Portable data across GPU vendors

WHEN an admin moves the data directory from an NVIDIA host to an AMD host (same model name)
THEN the engine SHALL start successfully. Embeddings in the database remain valid (they are model-specific, not GPU-vendor-specific)

Requirement: Compose files for deployment

The project SHALL provide Docker Compose files for single-command deployment. Compose files SHALL use build: context for local development. Release notes SHALL document the versioned image tag for users pulling pre-built images.

Scenario: Start NVIDIA deployment

WHEN an admin runs docker compose -f compose.nvidia.yaml up -d
THEN the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port

Scenario: Start ROCm deployment

WHEN an admin runs docker compose -f compose.rocm.yaml up -d
THEN the engine SHALL start with GPU access via ROCm device passthrough, bind-mount the data directory, and be reachable on the configured port

Scenario: Automatic restart

WHEN the engine process crashes or the host reboots
THEN Docker SHALL automatically restart the container (restart policy unless-stopped)

Scenario: Configure via environment

WHEN an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, KB_MCP_ALLOWED_HOSTS, etc.)
THEN the engine and MCP server SHALL use those values

Scenario: Pre-built image deployment

WHEN an admin wants to use a pre-built engine image without building from source
THEN the engine release notes SHALL include the exact docker pull command with the versioned tag (e.g. docker.dcglab.co.uk/dcg/kb/engine:engine-v2.1.0-nvidia)

Scenario: MCP allowed hosts in Compose

WHEN the kb-mcp service is defined in a Compose file
THEN the environment block SHALL include KB_MCP_ALLOWED_HOSTS with a comment explaining its format and purpose

Requirement: Configurable MCP allowed hosts

The MCP server SHALL accept a KB_MCP_ALLOWED_HOSTS environment variable containing a comma-separated list of additional hosts (IP addresses or FQDNs) that are permitted to connect. The server SHALL always allow 127.0.0.1, localhost, and [::1] regardless of this setting. DNS rebinding protection SHALL always be enabled.

Scenario: Remote client connects with allowed host

WHEN KB_MCP_ALLOWED_HOSTS is set to 192.168.1.50 and a client connects with Host: 192.168.1.50:3000
THEN the server SHALL accept the request and process it normally

Scenario: Remote client connects with disallowed host

WHEN KB_MCP_ALLOWED_HOSTS is set to 192.168.1.50 and a client connects with Host: 10.0.0.99:3000
THEN the server SHALL return HTTP 421 "Invalid Host header"

Scenario: Multiple allowed hosts

WHEN KB_MCP_ALLOWED_HOSTS is set to 192.168.1.50,kb.example.com
THEN the server SHALL accept requests with Host matching either 192.168.1.50 or kb.example.com on any port

Scenario: Variable unset or empty

WHEN KB_MCP_ALLOWED_HOSTS is unset or empty
THEN the server SHALL allow only localhost addresses (127.0.0.1, localhost, [::1]) with any port

Scenario: Localhost always allowed

WHEN KB_MCP_ALLOWED_HOSTS is set to 192.168.1.50
THEN the server SHALL still accept requests with Host: localhost:3000 or Host: 127.0.0.1:3000

Scenario: Allowed origins derived from allowed hosts

WHEN KB_MCP_ALLOWED_HOSTS includes 192.168.1.50
THEN the server SHALL accept Origin: http://192.168.1.50:3000 (and any port) in addition to localhost origins

Requirement: CPU-only fallback

The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.

Scenario: No GPU available

WHEN the container starts without GPU passthrough (no --gpus, no /dev/kfd)
THEN the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable

Scenario: Explicit CPU mode

WHEN KB_DEVICE=cpu and KB_INGEST_DEVICE=cpu are set in the environment
THEN the engine SHALL use CPU regardless of GPU availability

7.2 KiB Raw Blame History