Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kb
Personal knowledge base with hybrid search (full-text + semantic vector search).
v2 uses a client-server architecture: a FastAPI engine running in Docker (with GPU acceleration) and a lightweight Go CLI client that talks to it over HTTP.
Architecture
Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
- Engine: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
- Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
- Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.
Quick start
1. Start the engine
From pre-built images (recommended):
# NVIDIA GPU
docker run -d --name kb-engine \
--gpus all \
-p 8000:8000 \
-v ~/kb-data:/data \
-e KB_MODEL=all-MiniLM-L6-v2 \
-e KB_DEVICE=auto \
-e KB_API_KEY=your-secret-key \
--restart unless-stopped \
docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia
# AMD GPU (ROCm)
docker run -d --name kb-engine \
--device /dev/kfd --device /dev/dri \
--group-add video \
-p 8000:8000 \
-v ~/kb-data:/data \
-e KB_MODEL=all-MiniLM-L6-v2 \
-e KB_DEVICE=auto \
-e KB_API_KEY=your-secret-key \
--restart unless-stopped \
docker.dcglab.co.uk/dcg/kb/engine:latest-rocm
Or use a compose file — create compose.yaml:
services:
kb-engine:
image: docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia # or latest-rocm
runtime: nvidia # remove for ROCm
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
# For ROCm, replace the above runtime/deploy block with:
# devices:
# - "/dev/kfd"
# - "/dev/dri"
# group_add:
# - "video"
ports:
- "${KB_PORT:-8000}:8000"
volumes:
- ${KB_DATA_PATH:-./data}:/data
environment:
- KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
- KB_DEVICE=${KB_DEVICE:-auto}
- KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
- KB_API_KEY=${KB_API_KEY:-}
- KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
- HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
restart: unless-stopped
KB_DATA_PATH=~/kb-data docker compose up -d
From source (for development):
cd engine
# NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d
The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:
curl http://localhost:8000/api/v1/health
# {"status": "healthy"}
2. Install the client
From a release (recommended):
Check releases for the latest client tag, then:
# Set the version tag
TAG=client-v2.1.0
# Linux (amd64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-amd64
# Linux (arm64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-arm64
# macOS (Apple Silicon)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-arm64
# macOS (Intel)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-amd64
# Then install
chmod +x kb
sudo mv kb /usr/local/bin/
From source (for development):
cd client
make build # produces ./kb binary
make all # or cross-compile: dist/kb-{os}-{arch}
3. Configure the client
The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:
engine_url: http://localhost:8000
api_key: ""
default_format: human
Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).
4. Use it
# Quick notes (shorthand — no subcommand needed)
kb "Always restart nginx after config changes"
kb "Server room is building 3, floor 2" --tags ops
# Add files (async — uploads and exits immediately)
kb addfile ~/docs/manual.pdf --tags admin
kb addfile ~/notes/ --recursive
# Check ingestion progress
kb jobs
# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf
# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb export 1 -o manual.pdf # download original file
kb remove 3 --yes
kb status
How it works
- Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
- Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
- Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use
--format jsonon any command.
Engine configuration
The engine is configured via environment variables (set in the compose file or via docker compose CLI):
| Variable | Default | Description |
|---|---|---|
KB_DATA_DIR |
/data |
Data directory inside the container (bind-mounted) |
KB_MODEL |
all-MiniLM-L6-v2 |
HuggingFace embedding model name |
KB_DEVICE |
auto |
Embedding/search device: auto, cpu, or cuda |
KB_INGEST_DEVICE |
auto |
Docling layout detection device: auto, cpu, or cuda |
KB_API_KEY |
(none) | Optional Bearer token for API authentication |
KB_SEARCH_THRESHOLD |
0.01 |
Minimum score for search results (filters noise) |
KB_PORT |
8000 |
Port to expose |
KB_HOST |
0.0.0.0 |
Host to bind to |
HF_HUB_OFFLINE |
(none) | Set to 1 to prevent model downloads (use cached only) |
KB_DATA_PATH |
./data |
Host path for bind mount (compose variable, not used by engine) |
Data portability
The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:
# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/
# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.
API reference
All endpoints are under /api/v1/. Requires Authorization: Bearer <key> header when KB_API_KEY is set.
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check (bypasses auth) |
POST |
/search |
Hybrid search (JSON body) |
POST |
/jobs |
Upload file/note for ingestion (multipart, returns 202 or 409 if duplicate) |
GET |
/jobs |
List ingestion jobs |
GET |
/jobs/{id} |
Job details |
GET |
/documents |
List documents |
GET |
/documents/{id} |
Document details with chunks |
GET |
/documents/{id}/file |
Download original file |
DELETE |
/documents/{id} |
Remove a document (and stored file) |
PUT |
/documents/{id}/tags |
Add/remove tags |
GET |
/tags |
List all tags |
GET |
/status |
Engine status, GPU info, DB stats |
POST |
/reindex |
Re-embed all chunks |
Building and releasing
Client and engine are versioned independently via client/VERSION and engine/VERSION. Each has its own release script and git tag prefix.
Release client
./release-client.sh --gitea # patch bump, release via Gitea
./release-client.sh --github --minor # minor bump, release via GitHub
./release-client.sh --gitea --no-increment # release current version as-is
./release-client.sh --gitea --dry-run # preview without doing anything
Creates tag client-vX.Y.Z, builds Go binaries for all platforms, and creates a Gitea/GitHub release with binaries attached.
The client embeds a MinEngineVersion (from client/MIN_ENGINE_VERSION) and will hard-fail if the connected engine is too old.
Release engine
./release-engine.sh --gitea # patch bump, release via Gitea
./release-engine.sh --github --minor # minor bump, release via GitHub
./release-engine.sh --gitea --no-increment # release current version as-is
./release-engine.sh --gitea --dry-run # preview without doing anything
Creates tag engine-vX.Y.Z, builds NVIDIA and ROCm Docker images, creates a Gitea/GitHub release, and pushes images to the registry.
Checking versions
# Client
kb --version
# Engine
curl http://localhost:8000/api/v1/status | jq .version
Docker images
Images are pushed to docker.dcglab.co.uk/dcg/kb/engine with tags:
engine-v2.0.6-nvidia/engine-v2.0.6-rocm— versionedlatest-nvidia/latest-rocm— latest release
Override the registry and org via environment variables:
REGISTRY=ghcr.io IMAGE_ORG=myorg ./release-engine.sh --github
Future: ROCm runtime migration
The onnxruntime-rocm execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the MIGraphX execution provider as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from onnxruntime-rocm to onnxruntime with the MIGraphX EP and install the migraphx runtime libraries instead.
Claude Code skill
This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.