steve a6bab5e55e Add CPU-only Docker image and fix release tag naming
- Add Dockerfile.cpu and compose.cpu.yaml for CPU-only deployments
- Use sentence-transformers[onnx] + CPU-only torch for ~4x smaller image
- Fix release script: separate git tags (engine-v*) from Docker tags (v*)
- Add CPU image to release build/push pipeline
- Update README with CPU deployment instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:02:00 +01:00
2026-03-31 20:50:17 +01:00
2026-03-30 07:26:16 +01:00

kb

Personal knowledge base with hybrid search (full-text + semantic vector search).

v2 uses a client-server architecture: a FastAPI engine running in Docker (with optional GPU acceleration) and a lightweight Go CLI client that talks to it over HTTP.

Architecture

Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
  • Engine: Keeps the embedding model warm in memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA GPU, AMD GPU (ROCm), or CPU-only support.
  • Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
  • Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.

Quick start

1. Start the engine

From pre-built images (recommended):

# NVIDIA GPU
docker run -d --name kb-engine \
  --gpus all \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia

# AMD GPU (ROCm)
docker run -d --name kb-engine \
  --device /dev/kfd --device /dev/dri \
  --group-add video \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-rocm

# CPU only (no GPU required — smaller image)
docker run -d --name kb-engine \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-cpu

Or use a compose file from the repo:

# NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d

# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.rocm.yaml up -d

# CPU only
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d

See DEVELOPER.md to run the engine from source.

The engine will download the embedding model on first start (~90MB) and load it into memory (GPU or CPU). Check readiness:

curl http://localhost:8000/api/v1/health
# {"status": "healthy"}

2. Install the client

From a release (recommended):

Check releases for the latest client tag, then:

# Set the version tag
TAG=client-v2.1.0

# Linux (amd64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-amd64

# Linux (arm64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-arm64

# macOS (Apple Silicon)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-arm64

# macOS (Intel)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-amd64

# Then install
chmod +x kb
sudo mv kb /usr/local/bin/

See DEVELOPER.md to build the client from source.

3. Configure the client

The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:

engine_url: http://localhost:8000
api_key: ""
default_format: human

Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).

4. Use it

# Add notes
kb addnote "Always restart nginx after config changes"
kb addnote "Server room is building 3, floor 2" --tags ops

# Add files (async — uploads and exits immediately)
kb addfile ~/docs/manual.pdf --tags admin
kb addfile ~/notes/ --recursive

# Check ingestion progress
kb jobs

# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf

# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb export 1 -o manual.pdf    # download original file
kb remove 3 --yes
kb status

How it works

  • Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
  • Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
  • Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use --format json on any command.

Engine configuration

The engine is configured via environment variables (set in the compose file or via docker compose CLI):

Variable Default Description
KB_DATA_DIR /data Data directory inside the container (bind-mounted)
KB_MODEL all-MiniLM-L6-v2 HuggingFace embedding model name
KB_DEVICE auto Embedding/search device: auto, cpu, or cuda
KB_INGEST_DEVICE auto Docling layout detection device: auto, cpu, or cuda
KB_API_KEY (none) Optional Bearer token for API authentication
KB_SEARCH_THRESHOLD 0.01 Minimum score for search results (filters noise)
KB_PORT 8000 Port to expose
KB_HOST 0.0.0.0 Host to bind to
HF_HUB_OFFLINE (none) Set to 1 to prevent model downloads (use cached only)
KB_DATA_PATH ./data Host path for bind mount (compose variable, not used by engine)

Data portability

The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:

# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/

# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

Data is device-agnostic — you can ingest on NVIDIA and serve from AMD or CPU (or any combination) with the same data directory.

Claude Code skill

This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.

S
Description
Personal knowledge base with hybrid search (full-text + semantic). FastAPI engine with GPU-accelerated embeddings, Go CLI client, SQLite storage.
Readme 7.9 MiB
2026-05-15 18:37:12 +01:00
Languages
Python 62.5%
Go 27.1%
Shell 9.9%
Makefile 0.4%
Dockerfile 0.1%