steve/kb

Files

T

steve 9aab79d49b v2 restructure: Go client, Docker engine, release tooling

- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv)
- Add Go client with cross-platform build (client/)
- Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/)
- Add VERSION files for client and engine, wired into builds
- Add release.sh for automated build, tag, release, and Docker push
- Update README with build/release docs and ROCm migration note
- Clean up .gitignore for v2 project structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-26 21:52:25 +00:00

6.6 KiB

Raw Permalink Blame History

kb-search

Personal knowledge base with hybrid search (full-text + semantic vector search).

v2 uses a client-server architecture: a FastAPI engine running in Docker (with GPU acceleration) and a lightweight Go CLI client that talks to it over HTTP.

Architecture

Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU

Engine: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.

Quick start

1. Start the engine

cd engine

# NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d

The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:

curl http://localhost:8000/api/v1/health
# {"status": "healthy"}

2. Install the client

Build from source:

cd client
make build    # produces ./kb binary

Or cross-compile for all platforms:

make all      # produces dist/kb-{os}-{arch} binaries

3. Configure the client

The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:

engine_url: http://localhost:8000
api_key: ""
default_format: human

Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).

4. Use it

# Add documents (async — uploads and exits immediately)
kb add ~/docs/manual.pdf --tags admin
kb add ~/notes/ --recursive
kb add --note "Always restart nginx after config changes" --tags ops

# Check ingestion progress
kb jobs

# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf

# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb remove 3 --yes
kb status

How it works

Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use --format json on any command.

Engine configuration

The engine is configured via environment variables (set in the compose file or via docker compose CLI):

Variable	Default	Description
`KB_DATA_DIR`	`/data`	Data directory inside the container (bind-mounted)
`KB_MODEL`	`all-MiniLM-L6-v2`	HuggingFace embedding model name
`KB_DEVICE`	`auto`	Embedding device: `auto`, `cpu`, or `cuda`
`KB_INGEST_DEVICE`	`auto`	Docling layout detection device
`KB_API_KEY`	(none)	Optional Bearer token for API authentication
`KB_SEARCH_THRESHOLD`	`0.01`	Minimum score for search results (filters noise)
`KB_PORT`	`8000`	Port to expose
`KB_DATA_PATH`	`./data`	Host path for bind mount (compose variable)

Data portability

The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:

# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/

# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.

API reference

All endpoints are under /api/v1/. Requires Authorization: Bearer <key> header when KB_API_KEY is set.

Method	Endpoint	Description
`GET`	`/health`	Health check (bypasses auth)
`POST`	`/search`	Hybrid search (JSON body)
`POST`	`/jobs`	Upload file/note for ingestion (multipart, returns 202)
`GET`	`/jobs`	List ingestion jobs
`GET`	`/jobs/{id}`	Job details
`GET`	`/documents`	List documents
`GET`	`/documents/{id}`	Document details with chunks
`DELETE`	`/documents/{id}`	Remove a document
`PUT`	`/documents/{id}/tags`	Add/remove tags
`GET`	`/tags`	List all tags
`GET`	`/status`	Engine status, GPU info, DB stats
`POST`	`/reindex`	Re-embed all chunks

Building and releasing

Versioning is managed via client/VERSION and engine/VERSION files. The release script bumps these, builds all artifacts, tags, and publishes in one step.

Release

./release.sh --gitea              # patch bump (e.g. 2.0.0 → 2.0.1), release via Gitea
./release.sh --github --minor     # minor bump (e.g. 2.0.1 → 2.1.0), release via GitHub
./release.sh --gitea --major      # major bump (e.g. 2.1.0 → 3.0.0)
./release.sh --gitea --no-increment  # release current version as-is
./release.sh --gitea --dry-run    # preview without doing anything

The script will:

Bump the version in both client/VERSION and engine/VERSION (unless --no-increment)
Build Go client binaries for all platforms (linux/darwin/windows, amd64/arm64)
Build Docker engine images for NVIDIA and ROCm
Commit the version bump, create an annotated git tag, and push
Create a release (with client binaries attached) via tea or gh
Push Docker images to the registry

Checking versions

# Client
kb --version

# Engine
curl http://localhost:8000/api/v1/status | jq .version

Docker images

Images are pushed to docker.dcglab.co.uk/dcg/kb/engine with tags:

v2.1.0-nvidia / v2.1.0-rocm — versioned
latest-nvidia / latest-rocm — latest release

Override the registry and org via environment variables:

REGISTRY=ghcr.io IMAGE_ORG=myorg ./release.sh --github

Future: ROCm runtime migration

The onnxruntime-rocm execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the MIGraphX execution provider as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from onnxruntime-rocm to onnxruntime with the MIGraphX EP and install the migraphx runtime libraries instead.

Claude Code skill

This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.

6.6 KiB Raw Permalink Blame History