# kb Personal knowledge base with hybrid search (full-text + semantic vector search). v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP. ## Architecture ``` Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU ``` - **Engine**: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support. - **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP. - **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts. ## Quick start ### 1. Start the engine **From pre-built images** (recommended): ```bash # NVIDIA GPU docker run -d --name kb-engine \ --gpus all \ -p 8000:8000 \ -v ~/kb-data:/data \ -e KB_MODEL=all-MiniLM-L6-v2 \ -e KB_DEVICE=auto \ -e KB_API_KEY=your-secret-key \ --restart unless-stopped \ docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia # AMD GPU (ROCm) docker run -d --name kb-engine \ --device /dev/kfd --device /dev/dri \ --group-add video \ -p 8000:8000 \ -v ~/kb-data:/data \ -e KB_MODEL=all-MiniLM-L6-v2 \ -e KB_DEVICE=auto \ -e KB_API_KEY=your-secret-key \ --restart unless-stopped \ docker.dcglab.co.uk/dcg/kb/engine:latest-rocm ``` Or use a compose file — create `compose.yaml`: ```yaml services: kb-engine: image: docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia # or latest-rocm runtime: nvidia # remove for ROCm deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] # For ROCm, replace the above runtime/deploy block with: # devices: # - "/dev/kfd" # - "/dev/dri" # group_add: # - "video" ports: - "${KB_PORT:-8000}:8000" volumes: - ${KB_DATA_PATH:-./data}:/data environment: - KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2} - KB_DEVICE=${KB_DEVICE:-auto} - KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto} - KB_API_KEY=${KB_API_KEY:-} - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01} - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-} restart: unless-stopped ``` ```bash KB_DATA_PATH=~/kb-data docker compose up -d ``` **From source** (for development): ```bash cd engine # NVIDIA GPU KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d # AMD GPU (ROCm) KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d ``` The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness: ```bash curl http://localhost:8000/api/v1/health # {"status": "healthy"} ``` ### 2. Install the client Build from source: ```bash cd client make build # produces ./kb binary ``` Or cross-compile for all platforms: ```bash make all # produces dist/kb-{os}-{arch} binaries ``` ### 3. Configure the client The client works with zero configuration if the engine is on localhost:8000. To customise, create `~/.kb/client.yaml`: ```yaml engine_url: http://localhost:8000 api_key: "" default_format: human ``` Override via environment variables (`KB_ENGINE_URL`, `KB_API_KEY`) or CLI flags (`--engine`, `--api-key`, `--format`). ### 4. Use it ```bash # Quick notes (shorthand — no subcommand needed) kb "Always restart nginx after config changes" kb "Server room is building 3, floor 2" --tags ops # Add files (async — uploads and exits immediately) kb addfile ~/docs/manual.pdf --tags admin kb addfile ~/notes/ --recursive # Check ingestion progress kb jobs # Search kb search "how to install git" kb search "deploy process" --tags ops --type pdf # Manage kb list kb info 1 kb tags kb tag 1 --add important kb export 1 -o manual.pdf # download original file kb remove 3 --yes kb status ``` ## How it works - **Ingestion**: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite. - **Search**: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model. - **Output**: JSON (for scripts/LLM tool use) or human-readable terminal format. Use `--format json` on any command. ## Engine configuration The engine is configured via environment variables (set in the compose file or via `docker compose` CLI): | Variable | Default | Description | |---|---|---| | `KB_DATA_DIR` | `/data` | Data directory inside the container (bind-mounted) | | `KB_MODEL` | `all-MiniLM-L6-v2` | HuggingFace embedding model name | | `KB_DEVICE` | `auto` | Embedding/search device: `auto`, `cpu`, or `cuda` | | `KB_INGEST_DEVICE` | `auto` | Docling layout detection device: `auto`, `cpu`, or `cuda` | | `KB_API_KEY` | (none) | Optional Bearer token for API authentication | | `KB_SEARCH_THRESHOLD` | `0.01` | Minimum score for search results (filters noise) | | `KB_PORT` | `8000` | Port to expose | | `KB_HOST` | `0.0.0.0` | Host to bind to | | `HF_HUB_OFFLINE` | (none) | Set to `1` to prevent model downloads (use cached only) | | `KB_DATA_PATH` | `./data` | Host path for bind mount (compose variable, not used by engine) | ## Data portability The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts: ```bash # On source host rsync -a ~/kb-data/ user@target:/home/user/kb-data/ # On target host KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d ``` Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory. ## API reference All endpoints are under `/api/v1/`. Requires `Authorization: Bearer ` header when `KB_API_KEY` is set. | Method | Endpoint | Description | |---|---|---| | `GET` | `/health` | Health check (bypasses auth) | | `POST` | `/search` | Hybrid search (JSON body) | | `POST` | `/jobs` | Upload file/note for ingestion (multipart, returns 202 or 409 if duplicate) | | `GET` | `/jobs` | List ingestion jobs | | `GET` | `/jobs/{id}` | Job details | | `GET` | `/documents` | List documents | | `GET` | `/documents/{id}` | Document details with chunks | | `GET` | `/documents/{id}/file` | Download original file | | `DELETE` | `/documents/{id}` | Remove a document (and stored file) | | `PUT` | `/documents/{id}/tags` | Add/remove tags | | `GET` | `/tags` | List all tags | | `GET` | `/status` | Engine status, GPU info, DB stats | | `POST` | `/reindex` | Re-embed all chunks | ## Building and releasing Client and engine are versioned independently via `client/VERSION` and `engine/VERSION`. Each has its own release script and git tag prefix. ### Release client ```bash ./release-client.sh --gitea # patch bump, release via Gitea ./release-client.sh --github --minor # minor bump, release via GitHub ./release-client.sh --gitea --no-increment # release current version as-is ./release-client.sh --gitea --dry-run # preview without doing anything ``` Creates tag `client-vX.Y.Z`, builds Go binaries for all platforms, and creates a Gitea/GitHub release with binaries attached. The client embeds a `MinEngineVersion` (from `client/MIN_ENGINE_VERSION`) and will hard-fail if the connected engine is too old. ### Release engine ```bash ./release-engine.sh --gitea # patch bump, release via Gitea ./release-engine.sh --github --minor # minor bump, release via GitHub ./release-engine.sh --gitea --no-increment # release current version as-is ./release-engine.sh --gitea --dry-run # preview without doing anything ``` Creates tag `engine-vX.Y.Z`, builds NVIDIA and ROCm Docker images, creates a Gitea/GitHub release, and pushes images to the registry. ### Checking versions ```bash # Client kb --version # Engine curl http://localhost:8000/api/v1/status | jq .version ``` ### Docker images Images are pushed to `docker.dcglab.co.uk/dcg/kb/engine` with tags: - `engine-v2.0.6-nvidia` / `engine-v2.0.6-rocm` — versioned - `latest-nvidia` / `latest-rocm` — latest release Override the registry and org via environment variables: ```bash REGISTRY=ghcr.io IMAGE_ORG=myorg ./release-engine.sh --github ``` ## Future: ROCm runtime migration The `onnxruntime-rocm` execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the **MIGraphX execution provider** as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from `onnxruntime-rocm` to `onnxruntime` with the MIGraphX EP and install the `migraphx` runtime libraries instead. ## Claude Code skill This tool is designed to be wrapped as a Claude Code skill. See `SKILL.md` for the skill definition.