Add CPU-only Docker image and fix release tag naming

- Add Dockerfile.cpu and compose.cpu.yaml for CPU-only deployments
- Use sentence-transformers[onnx] + CPU-only torch for ~4x smaller image
- Fix release script: separate git tags (engine-v*) from Docker tags (v*)
- Add CPU image to release build/push pipeline
- Update README with CPU deployment instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-02 16:02:00 +01:00
parent c5191df9c0
commit a6bab5e55e
4 changed files with 99 additions and 51 deletions
+22 -38
View File
@@ -2,7 +2,7 @@
Personal knowledge base with hybrid search (full-text + semantic vector search).
v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP.
v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with optional GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP.
## Architecture
@@ -10,7 +10,7 @@ v2 uses a client-server architecture: a **FastAPI engine** running in Docker (wi
Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
```
- **Engine**: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
- **Engine**: Keeps the embedding model warm in memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA GPU, AMD GPU (ROCm), or CPU-only support.
- **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
- **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.
@@ -43,49 +43,33 @@ docker run -d --name kb-engine \
-e KB_API_KEY=your-secret-key \
--restart unless-stopped \
docker.dcglab.co.uk/dcg/kb/engine:latest-rocm
# CPU only (no GPU required — smaller image)
docker run -d --name kb-engine \
-p 8000:8000 \
-v ~/kb-data:/data \
-e KB_MODEL=all-MiniLM-L6-v2 \
-e KB_API_KEY=your-secret-key \
--restart unless-stopped \
docker.dcglab.co.uk/dcg/kb/engine:latest-cpu
```
Or use a compose file — create `compose.yaml`:
```yaml
services:
kb-engine:
image: docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia # or latest-rocm
runtime: nvidia # remove for ROCm
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
# For ROCm, replace the above runtime/deploy block with:
# devices:
# - "/dev/kfd"
# - "/dev/dri"
# group_add:
# - "video"
ports:
- "${KB_PORT:-8000}:8000"
volumes:
- ${KB_DATA_PATH:-./data}:/data
environment:
- KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
- KB_DEVICE=${KB_DEVICE:-auto}
- KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
- KB_API_KEY=${KB_API_KEY:-}
- KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
- HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
restart: unless-stopped
```
Or use a compose file from the repo:
```bash
KB_DATA_PATH=~/kb-data docker compose up -d
# NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d
# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.rocm.yaml up -d
# CPU only
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d
```
See [DEVELOPER.md](DEVELOPER.md) to run the engine from source.
The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:
The engine will download the embedding model on first start (~90MB) and load it into memory (GPU or CPU). Check readiness:
```bash
curl http://localhost:8000/api/v1/health
@@ -196,7 +180,7 @@ rsync -a ~/kb-data/ user@target:/home/user/kb-data/
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
```
Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.
Data is device-agnostic — you can ingest on NVIDIA and serve from AMD or CPU (or any combination) with the same data directory.
## Claude Code skill