# Docker Deployment

## Purpose

Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA, along with Compose files for single-command deployment.

## Requirements

### Requirement: NVIDIA CUDA Docker image

The project SHALL provide a `Dockerfile.nvidia` that builds the engine on an NVIDIA CUDA runtime base image with GPU support for PyTorch and ONNX Runtime.

#### Scenario: Build NVIDIA image
- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml build`
- **THEN** the build SHALL produce a working image with CUDA runtime, PyTorch with CUDA support, onnxruntime-gpu, and all engine dependencies

#### Scenario: GPU access in NVIDIA container
- **WHEN** the NVIDIA container starts with `--gpus all` or the NVIDIA runtime
- **THEN** `torch.cuda.is_available()` SHALL return True and the engine SHALL load the embedding model on GPU

---

### Requirement: Application code is GPU-vendor-agnostic

The Python engine code SHALL NOT reference CUDA directly. GPU abstraction SHALL be handled at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and CPU images without modification.

#### Scenario: Same engine code on both platforms
- **WHEN** the engine starts on an NVIDIA image and a CPU image with identical configuration
- **THEN** both SHALL load the model, accept requests, and return identical search results for the same query and data

---

### Requirement: Bind-mount data directory

The engine SHALL store all persistent state (SQLite database, HF model cache, staging directory) under a single configurable data directory. This directory SHALL be mounted from the host via bind mount.

#### Scenario: Data directory structure
- **WHEN** the engine starts for the first time
- **THEN** it SHALL create the following structure under the data directory:
  - `kb.db` — SQLite database
  - `hf_cache/` — HuggingFace model cache
  - `staging/` — temporary files for queued ingestion jobs

#### Scenario: Portable data across hosts
- **WHEN** an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
- **THEN** the engine SHALL start successfully and serve all previously ingested documents without reprocessing

---

### Requirement: Compose files for deployment

The project SHALL provide Docker Compose files for single-command deployment. Compose files SHALL use `build:` context for local development. Release notes SHALL document the versioned image tag for users pulling pre-built images.

#### Scenario: Start NVIDIA deployment
- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
- **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port

#### Scenario: Automatic restart
- **WHEN** the engine process crashes or the host reboots
- **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)

#### Scenario: Configure via environment
- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, KB_MCP_ALLOWED_HOSTS, etc.)
- **THEN** the engine and MCP server SHALL use those values

#### Scenario: Pre-built image deployment
- **WHEN** an admin wants to use a pre-built engine image without building from source
- **THEN** the engine release notes SHALL include the exact `docker pull` command with the versioned tag (e.g. `docker.dcglab.co.uk/dcg/kb/engine:engine-v2.1.0-nvidia`)

#### Scenario: MCP allowed hosts in Compose
- **WHEN** the kb-mcp service is defined in a Compose file
- **THEN** the environment block SHALL include `KB_MCP_ALLOWED_HOSTS` with a comment explaining its format and purpose

---

### Requirement: Configurable MCP allowed hosts

The MCP server SHALL accept a `KB_MCP_ALLOWED_HOSTS` environment variable containing a comma-separated list of additional hosts (IP addresses or FQDNs) that are permitted to connect. The server SHALL always allow `127.0.0.1`, `localhost`, and `[::1]` regardless of this setting. DNS rebinding protection SHALL always be enabled.

#### Scenario: Remote client connects with allowed host
- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50` and a client connects with `Host: 192.168.1.50:3000`
- **THEN** the server SHALL accept the request and process it normally

#### Scenario: Remote client connects with disallowed host
- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50` and a client connects with `Host: 10.0.0.99:3000`
- **THEN** the server SHALL return HTTP 421 "Invalid Host header"

#### Scenario: Multiple allowed hosts
- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50,kb.example.com`
- **THEN** the server SHALL accept requests with `Host` matching either `192.168.1.50` or `kb.example.com` on any port

#### Scenario: Variable unset or empty
- **WHEN** `KB_MCP_ALLOWED_HOSTS` is unset or empty
- **THEN** the server SHALL allow only localhost addresses (`127.0.0.1`, `localhost`, `[::1]`) with any port

#### Scenario: Localhost always allowed
- **WHEN** `KB_MCP_ALLOWED_HOSTS` is set to `192.168.1.50`
- **THEN** the server SHALL still accept requests with `Host: localhost:3000` or `Host: 127.0.0.1:3000`

#### Scenario: Allowed origins derived from allowed hosts
- **WHEN** `KB_MCP_ALLOWED_HOSTS` includes `192.168.1.50`
- **THEN** the server SHALL accept `Origin: http://192.168.1.50:3000` (and any port) in addition to localhost origins

---

### Requirement: CPU-only fallback

The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.

#### Scenario: No GPU available
- **WHEN** the container starts without GPU passthrough (no `--gpus`)
- **THEN** the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable

#### Scenario: Explicit CPU mode
- **WHEN** `KB_DEVICE=cpu` and `KB_INGEST_DEVICE=cpu` are set in the environment
- **THEN** the engine SHALL use CPU regardless of GPU availability