steve b04823e67b Store original documents for download after ingestion
Persist uploaded files to {data_dir}/documents/{content_hash}{ext} after
successful ingestion. Add GET /documents/{id}/file endpoint for retrieval,
delete stored files on document deletion, and add `kb export` client command.
Includes schema migration, tests, and spec updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:16:27 +00:00
2026-03-23 20:38:42 +00:00

kb

Personal knowledge base with hybrid search (full-text + semantic vector search).

v2 uses a client-server architecture: a FastAPI engine running in Docker (with GPU acceleration) and a lightweight Go CLI client that talks to it over HTTP.

Architecture

Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
  • Engine: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
  • Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
  • Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.

Quick start

1. Start the engine

cd engine

# NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d

The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:

curl http://localhost:8000/api/v1/health
# {"status": "healthy"}

2. Install the client

Build from source:

cd client
make build    # produces ./kb binary

Or cross-compile for all platforms:

make all      # produces dist/kb-{os}-{arch} binaries

3. Configure the client

The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:

engine_url: http://localhost:8000
api_key: ""
default_format: human

Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).

4. Use it

# Add documents (async — uploads and exits immediately)
kb add ~/docs/manual.pdf --tags admin
kb add ~/notes/ --recursive
kb add --note "Always restart nginx after config changes" --tags ops

# Check ingestion progress
kb jobs

# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf

# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb remove 3 --yes
kb status

How it works

  • Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
  • Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
  • Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use --format json on any command.

Engine configuration

The engine is configured via environment variables (set in the compose file or via docker compose CLI):

Variable Default Description
KB_DATA_DIR /data Data directory inside the container (bind-mounted)
KB_MODEL all-MiniLM-L6-v2 HuggingFace embedding model name
KB_DEVICE auto Embedding device: auto, cpu, or cuda
KB_INGEST_DEVICE auto Docling layout detection device
KB_API_KEY (none) Optional Bearer token for API authentication
KB_SEARCH_THRESHOLD 0.01 Minimum score for search results (filters noise)
KB_PORT 8000 Port to expose
KB_DATA_PATH ./data Host path for bind mount (compose variable)

Data portability

The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:

# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/

# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.

API reference

All endpoints are under /api/v1/. Requires Authorization: Bearer <key> header when KB_API_KEY is set.

Method Endpoint Description
GET /health Health check (bypasses auth)
POST /search Hybrid search (JSON body)
POST /jobs Upload file/note for ingestion (multipart, returns 202 or 409 if duplicate)
GET /jobs List ingestion jobs
GET /jobs/{id} Job details
GET /documents List documents
GET /documents/{id} Document details with chunks
DELETE /documents/{id} Remove a document
PUT /documents/{id}/tags Add/remove tags
GET /tags List all tags
GET /status Engine status, GPU info, DB stats
POST /reindex Re-embed all chunks

Building and releasing

Versioning is managed via client/VERSION and engine/VERSION files. The release script bumps these, builds all artifacts, tags, and publishes in one step.

Release

./release.sh --gitea              # patch bump (e.g. 2.0.0 → 2.0.1), release via Gitea
./release.sh --github --minor     # minor bump (e.g. 2.0.1 → 2.1.0), release via GitHub
./release.sh --gitea --major      # major bump (e.g. 2.1.0 → 3.0.0)
./release.sh --gitea --no-increment  # release current version as-is
./release.sh --gitea --dry-run    # preview without doing anything

The script will:

  1. Bump the version in both client/VERSION and engine/VERSION (unless --no-increment)
  2. Build Go client binaries for all platforms (linux/darwin/windows, amd64/arm64)
  3. Build Docker engine images for NVIDIA and ROCm
  4. Commit the version bump, create an annotated git tag, and push
  5. Create a release (with client binaries attached) via tea or gh
  6. Push Docker images to the registry

Checking versions

# Client
kb --version

# Engine
curl http://localhost:8000/api/v1/status | jq .version

Docker images

Images are pushed to docker.dcglab.co.uk/dcg/kb/engine with tags:

  • v2.1.0-nvidia / v2.1.0-rocm — versioned
  • latest-nvidia / latest-rocm — latest release

Override the registry and org via environment variables:

REGISTRY=ghcr.io IMAGE_ORG=myorg ./release.sh --github

Future: ROCm runtime migration

The onnxruntime-rocm execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the MIGraphX execution provider as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from onnxruntime-rocm to onnxruntime with the MIGraphX EP and install the migraphx runtime libraries instead.

Claude Code skill

This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.

S
Description
Personal knowledge base with hybrid search (full-text + semantic). FastAPI engine with GPU-accelerated embeddings, Go CLI client, SQLite storage.
Readme 7.9 MiB
2026-05-15 18:37:12 +01:00
Languages
Python 62.5%
Go 27.1%
Shell 9.9%
Makefile 0.4%
Dockerfile 0.1%