- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv) - Add Go client with cross-platform build (client/) - Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/) - Add VERSION files for client and engine, wired into builds - Add release.sh for automated build, tag, release, and Docker push - Update README with build/release docs and ROCm migration note - Clean up .gitignore for v2 project structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.6 KiB
kb-search
Personal knowledge base with hybrid search (full-text + semantic vector search).
v2 uses a client-server architecture: a FastAPI engine running in Docker (with GPU acceleration) and a lightweight Go CLI client that talks to it over HTTP.
Architecture
Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
- Engine: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
- Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
- Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.
Quick start
1. Start the engine
cd engine
# NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d
The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:
curl http://localhost:8000/api/v1/health
# {"status": "healthy"}
2. Install the client
Build from source:
cd client
make build # produces ./kb binary
Or cross-compile for all platforms:
make all # produces dist/kb-{os}-{arch} binaries
3. Configure the client
The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:
engine_url: http://localhost:8000
api_key: ""
default_format: human
Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).
4. Use it
# Add documents (async — uploads and exits immediately)
kb add ~/docs/manual.pdf --tags admin
kb add ~/notes/ --recursive
kb add --note "Always restart nginx after config changes" --tags ops
# Check ingestion progress
kb jobs
# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf
# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb remove 3 --yes
kb status
How it works
- Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
- Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
- Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use
--format jsonon any command.
Engine configuration
The engine is configured via environment variables (set in the compose file or via docker compose CLI):
| Variable | Default | Description |
|---|---|---|
KB_DATA_DIR |
/data |
Data directory inside the container (bind-mounted) |
KB_MODEL |
all-MiniLM-L6-v2 |
HuggingFace embedding model name |
KB_DEVICE |
auto |
Embedding device: auto, cpu, or cuda |
KB_INGEST_DEVICE |
auto |
Docling layout detection device |
KB_API_KEY |
(none) | Optional Bearer token for API authentication |
KB_SEARCH_THRESHOLD |
0.01 |
Minimum score for search results (filters noise) |
KB_PORT |
8000 |
Port to expose |
KB_DATA_PATH |
./data |
Host path for bind mount (compose variable) |
Data portability
The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:
# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/
# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.
API reference
All endpoints are under /api/v1/. Requires Authorization: Bearer <key> header when KB_API_KEY is set.
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check (bypasses auth) |
POST |
/search |
Hybrid search (JSON body) |
POST |
/jobs |
Upload file/note for ingestion (multipart, returns 202) |
GET |
/jobs |
List ingestion jobs |
GET |
/jobs/{id} |
Job details |
GET |
/documents |
List documents |
GET |
/documents/{id} |
Document details with chunks |
DELETE |
/documents/{id} |
Remove a document |
PUT |
/documents/{id}/tags |
Add/remove tags |
GET |
/tags |
List all tags |
GET |
/status |
Engine status, GPU info, DB stats |
POST |
/reindex |
Re-embed all chunks |
Building and releasing
Versioning is managed via client/VERSION and engine/VERSION files. The release script bumps these, builds all artifacts, tags, and publishes in one step.
Release
./release.sh --gitea # patch bump (e.g. 2.0.0 → 2.0.1), release via Gitea
./release.sh --github --minor # minor bump (e.g. 2.0.1 → 2.1.0), release via GitHub
./release.sh --gitea --major # major bump (e.g. 2.1.0 → 3.0.0)
./release.sh --gitea --no-increment # release current version as-is
./release.sh --gitea --dry-run # preview without doing anything
The script will:
- Bump the version in both
client/VERSIONandengine/VERSION(unless--no-increment) - Build Go client binaries for all platforms (linux/darwin/windows, amd64/arm64)
- Build Docker engine images for NVIDIA and ROCm
- Commit the version bump, create an annotated git tag, and push
- Create a release (with client binaries attached) via
teaorgh - Push Docker images to the registry
Checking versions
# Client
kb --version
# Engine
curl http://localhost:8000/api/v1/status | jq .version
Docker images
Images are pushed to docker.dcglab.co.uk/dcg/kb/engine with tags:
v2.1.0-nvidia/v2.1.0-rocm— versionedlatest-nvidia/latest-rocm— latest release
Override the registry and org via environment variables:
REGISTRY=ghcr.io IMAGE_ORG=myorg ./release.sh --github
Future: ROCm runtime migration
The onnxruntime-rocm execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the MIGraphX execution provider as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from onnxruntime-rocm to onnxruntime with the MIGraphX EP and install the migraphx runtime libraries instead.
Claude Code skill
This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.