Two changes: 1. structured-add-commands: The implicit note shorthand (kb "text") caused accidental note creation from mistyped commands. Replaced with explicit kb addnote <text> command. Root command reverts to standard Cobra behaviour. Updated examples, tests, SKILL.md, and specs. 2. split-readme-developer-docs: Moved build-from-source instructions, release process, API reference, and ROCm migration notes from README.md into a new DEVELOPER.md. README now links to DEVELOPER.md for dev workflows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kb
Personal knowledge base with hybrid search (full-text + semantic vector search).
v2 uses a client-server architecture: a FastAPI engine running in Docker (with GPU acceleration) and a lightweight Go CLI client that talks to it over HTTP.
Architecture
Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
- Engine: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
- Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
- Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.
Quick start
1. Start the engine
From pre-built images (recommended):
# NVIDIA GPU
docker run -d --name kb-engine \
--gpus all \
-p 8000:8000 \
-v ~/kb-data:/data \
-e KB_MODEL=all-MiniLM-L6-v2 \
-e KB_DEVICE=auto \
-e KB_API_KEY=your-secret-key \
--restart unless-stopped \
docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia
# AMD GPU (ROCm)
docker run -d --name kb-engine \
--device /dev/kfd --device /dev/dri \
--group-add video \
-p 8000:8000 \
-v ~/kb-data:/data \
-e KB_MODEL=all-MiniLM-L6-v2 \
-e KB_DEVICE=auto \
-e KB_API_KEY=your-secret-key \
--restart unless-stopped \
docker.dcglab.co.uk/dcg/kb/engine:latest-rocm
Or use a compose file — create compose.yaml:
services:
kb-engine:
image: docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia # or latest-rocm
runtime: nvidia # remove for ROCm
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
# For ROCm, replace the above runtime/deploy block with:
# devices:
# - "/dev/kfd"
# - "/dev/dri"
# group_add:
# - "video"
ports:
- "${KB_PORT:-8000}:8000"
volumes:
- ${KB_DATA_PATH:-./data}:/data
environment:
- KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
- KB_DEVICE=${KB_DEVICE:-auto}
- KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
- KB_API_KEY=${KB_API_KEY:-}
- KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
- HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
restart: unless-stopped
KB_DATA_PATH=~/kb-data docker compose up -d
See DEVELOPER.md to run the engine from source.
The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:
curl http://localhost:8000/api/v1/health
# {"status": "healthy"}
2. Install the client
From a release (recommended):
Check releases for the latest client tag, then:
# Set the version tag
TAG=client-v2.1.0
# Linux (amd64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-amd64
# Linux (arm64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-arm64
# macOS (Apple Silicon)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-arm64
# macOS (Intel)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-amd64
# Then install
chmod +x kb
sudo mv kb /usr/local/bin/
See DEVELOPER.md to build the client from source.
3. Configure the client
The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:
engine_url: http://localhost:8000
api_key: ""
default_format: human
Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).
4. Use it
# Add notes
kb addnote "Always restart nginx after config changes"
kb addnote "Server room is building 3, floor 2" --tags ops
# Add files (async — uploads and exits immediately)
kb addfile ~/docs/manual.pdf --tags admin
kb addfile ~/notes/ --recursive
# Check ingestion progress
kb jobs
# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf
# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb export 1 -o manual.pdf # download original file
kb remove 3 --yes
kb status
How it works
- Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
- Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
- Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use
--format jsonon any command.
Engine configuration
The engine is configured via environment variables (set in the compose file or via docker compose CLI):
| Variable | Default | Description |
|---|---|---|
KB_DATA_DIR |
/data |
Data directory inside the container (bind-mounted) |
KB_MODEL |
all-MiniLM-L6-v2 |
HuggingFace embedding model name |
KB_DEVICE |
auto |
Embedding/search device: auto, cpu, or cuda |
KB_INGEST_DEVICE |
auto |
Docling layout detection device: auto, cpu, or cuda |
KB_API_KEY |
(none) | Optional Bearer token for API authentication |
KB_SEARCH_THRESHOLD |
0.01 |
Minimum score for search results (filters noise) |
KB_PORT |
8000 |
Port to expose |
KB_HOST |
0.0.0.0 |
Host to bind to |
HF_HUB_OFFLINE |
(none) | Set to 1 to prevent model downloads (use cached only) |
KB_DATA_PATH |
./data |
Host path for bind mount (compose variable, not used by engine) |
Data portability
The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:
# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/
# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.
Claude Code skill
This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.