Add CPU-only Docker image and fix release tag naming
- Add Dockerfile.cpu and compose.cpu.yaml for CPU-only deployments - Use sentence-transformers[onnx] + CPU-only torch for ~4x smaller image - Fix release script: separate git tags (engine-v*) from Docker tags (v*) - Add CPU image to release build/push pipeline - Update README with CPU deployment instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
Personal knowledge base with hybrid search (full-text + semantic vector search).
|
||||
|
||||
v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP.
|
||||
v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with optional GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP.
|
||||
|
||||
## Architecture
|
||||
|
||||
@@ -10,7 +10,7 @@ v2 uses a client-server architecture: a **FastAPI engine** running in Docker (wi
|
||||
Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
|
||||
```
|
||||
|
||||
- **Engine**: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
|
||||
- **Engine**: Keeps the embedding model warm in memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA GPU, AMD GPU (ROCm), or CPU-only support.
|
||||
- **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
|
||||
- **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.
|
||||
|
||||
@@ -43,49 +43,33 @@ docker run -d --name kb-engine \
|
||||
-e KB_API_KEY=your-secret-key \
|
||||
--restart unless-stopped \
|
||||
docker.dcglab.co.uk/dcg/kb/engine:latest-rocm
|
||||
|
||||
# CPU only (no GPU required — smaller image)
|
||||
docker run -d --name kb-engine \
|
||||
-p 8000:8000 \
|
||||
-v ~/kb-data:/data \
|
||||
-e KB_MODEL=all-MiniLM-L6-v2 \
|
||||
-e KB_API_KEY=your-secret-key \
|
||||
--restart unless-stopped \
|
||||
docker.dcglab.co.uk/dcg/kb/engine:latest-cpu
|
||||
```
|
||||
|
||||
Or use a compose file — create `compose.yaml`:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
kb-engine:
|
||||
image: docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia # or latest-rocm
|
||||
runtime: nvidia # remove for ROCm
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
# For ROCm, replace the above runtime/deploy block with:
|
||||
# devices:
|
||||
# - "/dev/kfd"
|
||||
# - "/dev/dri"
|
||||
# group_add:
|
||||
# - "video"
|
||||
ports:
|
||||
- "${KB_PORT:-8000}:8000"
|
||||
volumes:
|
||||
- ${KB_DATA_PATH:-./data}:/data
|
||||
environment:
|
||||
- KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
|
||||
- KB_DEVICE=${KB_DEVICE:-auto}
|
||||
- KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
|
||||
- KB_API_KEY=${KB_API_KEY:-}
|
||||
- KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
|
||||
- HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
|
||||
restart: unless-stopped
|
||||
```
|
||||
Or use a compose file from the repo:
|
||||
|
||||
```bash
|
||||
KB_DATA_PATH=~/kb-data docker compose up -d
|
||||
# NVIDIA GPU
|
||||
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d
|
||||
|
||||
# AMD GPU (ROCm)
|
||||
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.rocm.yaml up -d
|
||||
|
||||
# CPU only
|
||||
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d
|
||||
```
|
||||
|
||||
See [DEVELOPER.md](DEVELOPER.md) to run the engine from source.
|
||||
|
||||
The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:
|
||||
The engine will download the embedding model on first start (~90MB) and load it into memory (GPU or CPU). Check readiness:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/api/v1/health
|
||||
@@ -196,7 +180,7 @@ rsync -a ~/kb-data/ user@target:/home/user/kb-data/
|
||||
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
|
||||
```
|
||||
|
||||
Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.
|
||||
Data is device-agnostic — you can ingest on NVIDIA and serve from AMD or CPU (or any combination) with the same data directory.
|
||||
|
||||
## Claude Code skill
|
||||
|
||||
|
||||
Reference in New Issue
Block a user