Files
kb/README.md
T
steve 7f4decee26 Reindex command, implicit note shorthand, add→addfile rename
- Add `kb reindex` command with confirmation prompt and --yes flag
- Add implicit note shorthand: `kb "my note"` submits a note directly
- Rename `add` to `addfile`, remove --note/--title/--type flags
- Add client-side file extension validation before upload
- Add `kb examples` command for common usage patterns
- Update README, SKILL.md, and main specs
- Archive completed changes and sync delta specs

BREAKING: `kb add` renamed to `kb addfile`, `kb add --note` replaced by `kb "text"`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 13:58:04 +01:00

8.8 KiB

kb

Personal knowledge base with hybrid search (full-text + semantic vector search).

v2 uses a client-server architecture: a FastAPI engine running in Docker (with GPU acceleration) and a lightweight Go CLI client that talks to it over HTTP.

Architecture

Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
  • Engine: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
  • Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
  • Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.

Quick start

1. Start the engine

From pre-built images (recommended):

# NVIDIA GPU
docker run -d --name kb-engine \
  --gpus all \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia

# AMD GPU (ROCm)
docker run -d --name kb-engine \
  --device /dev/kfd --device /dev/dri \
  --group-add video \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-rocm

Or use a compose file — create compose.yaml:

services:
  kb-engine:
    image: docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia  # or latest-rocm
    runtime: nvidia  # remove for ROCm
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    # For ROCm, replace the above runtime/deploy block with:
    # devices:
    #   - "/dev/kfd"
    #   - "/dev/dri"
    # group_add:
    #   - "video"
    ports:
      - "${KB_PORT:-8000}:8000"
    volumes:
      - ${KB_DATA_PATH:-./data}:/data
    environment:
      - KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
      - KB_DEVICE=${KB_DEVICE:-auto}
      - KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
      - KB_API_KEY=${KB_API_KEY:-}
      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
    restart: unless-stopped
KB_DATA_PATH=~/kb-data docker compose up -d

From source (for development):

cd engine

# NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d

The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:

curl http://localhost:8000/api/v1/health
# {"status": "healthy"}

2. Install the client

Build from source:

cd client
make build    # produces ./kb binary

Or cross-compile for all platforms:

make all      # produces dist/kb-{os}-{arch} binaries

3. Configure the client

The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:

engine_url: http://localhost:8000
api_key: ""
default_format: human

Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).

4. Use it

# Quick notes (shorthand — no subcommand needed)
kb "Always restart nginx after config changes"
kb "Server room is building 3, floor 2" --tags ops

# Add files (async — uploads and exits immediately)
kb addfile ~/docs/manual.pdf --tags admin
kb addfile ~/notes/ --recursive

# Check ingestion progress
kb jobs

# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf

# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb export 1 -o manual.pdf    # download original file
kb remove 3 --yes
kb status

How it works

  • Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
  • Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
  • Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use --format json on any command.

Engine configuration

The engine is configured via environment variables (set in the compose file or via docker compose CLI):

Variable Default Description
KB_DATA_DIR /data Data directory inside the container (bind-mounted)
KB_MODEL all-MiniLM-L6-v2 HuggingFace embedding model name
KB_DEVICE auto Embedding/search device: auto, cpu, or cuda
KB_INGEST_DEVICE auto Docling layout detection device: auto, cpu, or cuda
KB_API_KEY (none) Optional Bearer token for API authentication
KB_SEARCH_THRESHOLD 0.01 Minimum score for search results (filters noise)
KB_PORT 8000 Port to expose
KB_HOST 0.0.0.0 Host to bind to
HF_HUB_OFFLINE (none) Set to 1 to prevent model downloads (use cached only)
KB_DATA_PATH ./data Host path for bind mount (compose variable, not used by engine)

Data portability

The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:

# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/

# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.

API reference

All endpoints are under /api/v1/. Requires Authorization: Bearer <key> header when KB_API_KEY is set.

Method Endpoint Description
GET /health Health check (bypasses auth)
POST /search Hybrid search (JSON body)
POST /jobs Upload file/note for ingestion (multipart, returns 202 or 409 if duplicate)
GET /jobs List ingestion jobs
GET /jobs/{id} Job details
GET /documents List documents
GET /documents/{id} Document details with chunks
GET /documents/{id}/file Download original file
DELETE /documents/{id} Remove a document (and stored file)
PUT /documents/{id}/tags Add/remove tags
GET /tags List all tags
GET /status Engine status, GPU info, DB stats
POST /reindex Re-embed all chunks

Building and releasing

Client and engine are versioned independently via client/VERSION and engine/VERSION. Each has its own release script and git tag prefix.

Release client

./release-client.sh --gitea              # patch bump, release via Gitea
./release-client.sh --github --minor     # minor bump, release via GitHub
./release-client.sh --gitea --no-increment  # release current version as-is
./release-client.sh --gitea --dry-run    # preview without doing anything

Creates tag client-vX.Y.Z, builds Go binaries for all platforms, and creates a Gitea/GitHub release with binaries attached.

The client embeds a MinEngineVersion (from client/MIN_ENGINE_VERSION) and will hard-fail if the connected engine is too old.

Release engine

./release-engine.sh --gitea              # patch bump, release via Gitea
./release-engine.sh --github --minor     # minor bump, release via GitHub
./release-engine.sh --gitea --no-increment  # release current version as-is
./release-engine.sh --gitea --dry-run    # preview without doing anything

Creates tag engine-vX.Y.Z, builds NVIDIA and ROCm Docker images, creates a Gitea/GitHub release, and pushes images to the registry.

Checking versions

# Client
kb --version

# Engine
curl http://localhost:8000/api/v1/status | jq .version

Docker images

Images are pushed to docker.dcglab.co.uk/dcg/kb/engine with tags:

  • engine-v2.0.6-nvidia / engine-v2.0.6-rocm — versioned
  • latest-nvidia / latest-rocm — latest release

Override the registry and org via environment variables:

REGISTRY=ghcr.io IMAGE_ORG=myorg ./release-engine.sh --github

Future: ROCm runtime migration

The onnxruntime-rocm execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the MIGraphX execution provider as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from onnxruntime-rocm to onnxruntime with the MIGraphX EP and install the migraphx runtime libraries instead.

Claude Code skill

This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.