steve/kb

T

steve a6bab5e55e Add CPU-only Docker image and fix release tag naming

- Add Dockerfile.cpu and compose.cpu.yaml for CPU-only deployments
- Use sentence-transformers[onnx] + CPU-only torch for ~4x smaller image
- Fix release script: separate git tags (engine-v*) from Docker tags (v*)
- Add CPU image to release build/push pipeline
- Update README with CPU deployment instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-02 16:02:00 +01:00

.claude

Add GPU device control, Docker support, and v2 client-server design

2026-03-25 20:17:31 +00:00

client

Bump client version to 2.2.0

2026-03-31 20:50:17 +01:00

engine

Add CPU-only Docker image and fix release tag naming

2026-04-02 16:02:00 +01:00

openspec

Replace implicit note shorthand with explicit addnote command and split README

2026-03-31 20:48:22 +01:00

.gitignore

Added pycache to gitignore

2026-03-30 07:26:16 +01:00

DEVELOPER.md

Replace implicit note shorthand with explicit addnote command and split README

2026-03-31 20:48:22 +01:00

README.md

Add CPU-only Docker image and fix release tag naming

2026-04-02 16:02:00 +01:00

release-client.sh

Independent client/engine versioning with compatibility check

2026-03-28 15:59:16 +00:00

release-engine.sh

Add CPU-only Docker image and fix release tag naming

2026-04-02 16:02:00 +01:00

SKILL.md

Replace implicit note shorthand with explicit addnote command and split README

2026-03-31 20:48:22 +01:00

README.md

kb

Personal knowledge base with hybrid search (full-text + semantic vector search).

v2 uses a client-server architecture: a FastAPI engine running in Docker (with optional GPU acceleration) and a lightweight Go CLI client that talks to it over HTTP.

Architecture

Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU

Engine: Keeps the embedding model warm in memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA GPU, AMD GPU (ROCm), or CPU-only support.
Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.

Quick start

1. Start the engine

From pre-built images (recommended):

# NVIDIA GPU
docker run -d --name kb-engine \
  --gpus all \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia

# AMD GPU (ROCm)
docker run -d --name kb-engine \
  --device /dev/kfd --device /dev/dri \
  --group-add video \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-rocm

# CPU only (no GPU required — smaller image)
docker run -d --name kb-engine \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-cpu

Or use a compose file from the repo:

# NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d

# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.rocm.yaml up -d

# CPU only
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d

See DEVELOPER.md to run the engine from source.

The engine will download the embedding model on first start (~90MB) and load it into memory (GPU or CPU). Check readiness:

curl http://localhost:8000/api/v1/health
# {"status": "healthy"}

2. Install the client

From a release (recommended):

Check releases for the latest client tag, then:

# Set the version tag
TAG=client-v2.1.0

# Linux (amd64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-amd64

# Linux (arm64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-arm64

# macOS (Apple Silicon)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-arm64

# macOS (Intel)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-amd64

# Then install
chmod +x kb
sudo mv kb /usr/local/bin/

See DEVELOPER.md to build the client from source.

3. Configure the client

The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:

engine_url: http://localhost:8000
api_key: ""
default_format: human

Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).

4. Use it

# Add notes
kb addnote "Always restart nginx after config changes"
kb addnote "Server room is building 3, floor 2" --tags ops

# Add files (async — uploads and exits immediately)
kb addfile ~/docs/manual.pdf --tags admin
kb addfile ~/notes/ --recursive

# Check ingestion progress
kb jobs

# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf

# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb export 1 -o manual.pdf    # download original file
kb remove 3 --yes
kb status

How it works

Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use --format json on any command.

Engine configuration

The engine is configured via environment variables (set in the compose file or via docker compose CLI):

Variable	Default	Description
`KB_DATA_DIR`	`/data`	Data directory inside the container (bind-mounted)
`KB_MODEL`	`all-MiniLM-L6-v2`	HuggingFace embedding model name
`KB_DEVICE`	`auto`	Embedding/search device: `auto`, `cpu`, or `cuda`
`KB_INGEST_DEVICE`	`auto`	Docling layout detection device: `auto`, `cpu`, or `cuda`
`KB_API_KEY`	(none)	Optional Bearer token for API authentication
`KB_SEARCH_THRESHOLD`	`0.01`	Minimum score for search results (filters noise)
`KB_PORT`	`8000`	Port to expose
`KB_HOST`	`0.0.0.0`	Host to bind to
`HF_HUB_OFFLINE`	(none)	Set to `1` to prevent model downloads (use cached only)
`KB_DATA_PATH`	`./data`	Host path for bind mount (compose variable, not used by engine)

Data portability

The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:

# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/

# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

Data is device-agnostic — you can ingest on NVIDIA and serve from AMD or CPU (or any combination) with the same data directory.

Claude Code skill

This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.

Releases 18

Engine engine-v3.2.4 Latest

2026-05-15 18:37:12 +01:00

Languages

Python 62.5%

Go 27.1%

Shell 9.9%

Makefile 0.4%

Dockerfile 0.1%