Add CPU-only Docker image and fix release tag naming

- Add Dockerfile.cpu and compose.cpu.yaml for CPU-only deployments - Use sentence-transformers[onnx] + CPU-only torch for ~4x smaller image - Fix release script: separate git tags (engine-v*) from Docker tags (v*) - Add CPU image to release build/push pipeline - Update README with CPU deployment instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:02:00 +01:00
parent c5191df9c0
commit a6bab5e55e
4 changed files with 99 additions and 51 deletions
@@ -2,7 +2,7 @@

 Personal knowledge base with hybrid search (full-text + semantic vector search).

-v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP.
+v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with optional GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP.

 ## Architecture

@@ -10,7 +10,7 @@ v2 uses a client-server architecture: a **FastAPI engine** running in Docker (wi
 Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
 ```

- **Engine**: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
+- **Engine**: Keeps the embedding model warm in memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA GPU, AMD GPU (ROCm), or CPU-only support.
 - **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
 - **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.

@@ -43,49 +43,33 @@ docker run -d --name kb-engine \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-rocm
+
+# CPU only (no GPU required — smaller image)
+docker run -d --name kb-engine \
+  -p 8000:8000 \
+  -v ~/kb-data:/data \
+  -e KB_MODEL=all-MiniLM-L6-v2 \
+  -e KB_API_KEY=your-secret-key \
+  --restart unless-stopped \
+  docker.dcglab.co.uk/dcg/kb/engine:latest-cpu
 ```

-Or use a compose file — create `compose.yaml`:
-
-```yaml
-services:
-  kb-engine:
-    image: docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia  # or latest-rocm
-    runtime: nvidia  # remove for ROCm
-    deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: 1
-              capabilities: [gpu]
-    # For ROCm, replace the above runtime/deploy block with:
-    # devices:
-    #   - "/dev/kfd"
-    #   - "/dev/dri"
-    # group_add:
-    #   - "video"
-    ports:
-      - "${KB_PORT:-8000}:8000"
-    volumes:
-      - ${KB_DATA_PATH:-./data}:/data
-    environment:
-      - KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
-      - KB_DEVICE=${KB_DEVICE:-auto}
-      - KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
-      - KB_API_KEY=${KB_API_KEY:-}
-      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
-      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
-    restart: unless-stopped
-```
+Or use a compose file from the repo:

 ```bash
-KB_DATA_PATH=~/kb-data docker compose up -d
+# NVIDIA GPU
+KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d
+
+# AMD GPU (ROCm)
+KB_DATA_PATH=~/kb-data docker compose -f engine/compose.rocm.yaml up -d
+
+# CPU only
+KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d
 ```

 See [DEVELOPER.md](DEVELOPER.md) to run the engine from source.

-The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:
+The engine will download the embedding model on first start (~90MB) and load it into memory (GPU or CPU). Check readiness:

 ```bash
 curl http://localhost:8000/api/v1/health
@@ -196,7 +180,7 @@ rsync -a ~/kb-data/ user@target:/home/user/kb-data/
 KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
 ```

-Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.
+Data is device-agnostic — you can ingest on NVIDIA and serve from AMD or CPU (or any combination) with the same data directory.

 ## Claude Code skill