steve/kb

T

steve afbe270181 Replace implicit note shorthand with explicit addnote command and split README

Two changes:

1. structured-add-commands: The implicit note shorthand (kb "text") caused
   accidental note creation from mistyped commands. Replaced with explicit
   kb addnote <text> command. Root command reverts to standard Cobra
   behaviour. Updated examples, tests, SKILL.md, and specs.

2. split-readme-developer-docs: Moved build-from-source instructions, release
   process, API reference, and ROCm migration notes from README.md into a
   new DEVELOPER.md. README now links to DEVELOPER.md for dev workflows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-31 20:48:22 +01:00

.claude

Add GPU device control, Docker support, and v2 client-server design

2026-03-25 20:17:31 +00:00

client

Replace implicit note shorthand with explicit addnote command and split README

2026-03-31 20:48:22 +01:00

engine

Add dev-up script and archive kb-title-in-chunks change

2026-03-30 07:25:22 +01:00

openspec

Replace implicit note shorthand with explicit addnote command and split README

2026-03-31 20:48:22 +01:00

.gitignore

Added pycache to gitignore

2026-03-30 07:26:16 +01:00

DEVELOPER.md

Replace implicit note shorthand with explicit addnote command and split README

2026-03-31 20:48:22 +01:00

README.md

Replace implicit note shorthand with explicit addnote command and split README

2026-03-31 20:48:22 +01:00

release-client.sh

Independent client/engine versioning with compatibility check

2026-03-28 15:59:16 +00:00

release-engine.sh

Independent client/engine versioning with compatibility check

2026-03-28 15:59:16 +00:00

SKILL.md

Replace implicit note shorthand with explicit addnote command and split README

2026-03-31 20:48:22 +01:00

README.md

kb

Personal knowledge base with hybrid search (full-text + semantic vector search).

v2 uses a client-server architecture: a FastAPI engine running in Docker (with GPU acceleration) and a lightweight Go CLI client that talks to it over HTTP.

Architecture

Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU

Engine: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.

Quick start

1. Start the engine

From pre-built images (recommended):

# NVIDIA GPU
docker run -d --name kb-engine \
  --gpus all \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia

# AMD GPU (ROCm)
docker run -d --name kb-engine \
  --device /dev/kfd --device /dev/dri \
  --group-add video \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-rocm

Or use a compose file — create compose.yaml:

services:
  kb-engine:
    image: docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia  # or latest-rocm
    runtime: nvidia  # remove for ROCm
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    # For ROCm, replace the above runtime/deploy block with:
    # devices:
    #   - "/dev/kfd"
    #   - "/dev/dri"
    # group_add:
    #   - "video"
    ports:
      - "${KB_PORT:-8000}:8000"
    volumes:
      - ${KB_DATA_PATH:-./data}:/data
    environment:
      - KB_MODEL=${KB_MODEL:-all-MiniLM-L6-v2}
      - KB_DEVICE=${KB_DEVICE:-auto}
      - KB_INGEST_DEVICE=${KB_INGEST_DEVICE:-auto}
      - KB_API_KEY=${KB_API_KEY:-}
      - KB_SEARCH_THRESHOLD=${KB_SEARCH_THRESHOLD:-0.01}
      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-}
    restart: unless-stopped

KB_DATA_PATH=~/kb-data docker compose up -d

See DEVELOPER.md to run the engine from source.

The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:

curl http://localhost:8000/api/v1/health
# {"status": "healthy"}

2. Install the client

From a release (recommended):

Check releases for the latest client tag, then:

# Set the version tag
TAG=client-v2.1.0

# Linux (amd64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-amd64

# Linux (arm64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-arm64

# macOS (Apple Silicon)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-arm64

# macOS (Intel)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-amd64

# Then install
chmod +x kb
sudo mv kb /usr/local/bin/

See DEVELOPER.md to build the client from source.

3. Configure the client

The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:

engine_url: http://localhost:8000
api_key: ""
default_format: human

Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).

4. Use it

# Add notes
kb addnote "Always restart nginx after config changes"
kb addnote "Server room is building 3, floor 2" --tags ops

# Add files (async — uploads and exits immediately)
kb addfile ~/docs/manual.pdf --tags admin
kb addfile ~/notes/ --recursive

# Check ingestion progress
kb jobs

# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf

# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb export 1 -o manual.pdf    # download original file
kb remove 3 --yes
kb status

How it works

Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use --format json on any command.

Engine configuration

The engine is configured via environment variables (set in the compose file or via docker compose CLI):

Variable	Default	Description
`KB_DATA_DIR`	`/data`	Data directory inside the container (bind-mounted)
`KB_MODEL`	`all-MiniLM-L6-v2`	HuggingFace embedding model name
`KB_DEVICE`	`auto`	Embedding/search device: `auto`, `cpu`, or `cuda`
`KB_INGEST_DEVICE`	`auto`	Docling layout detection device: `auto`, `cpu`, or `cuda`
`KB_API_KEY`	(none)	Optional Bearer token for API authentication
`KB_SEARCH_THRESHOLD`	`0.01`	Minimum score for search results (filters noise)
`KB_PORT`	`8000`	Port to expose
`KB_HOST`	`0.0.0.0`	Host to bind to
`HF_HUB_OFFLINE`	(none)	Set to `1` to prevent model downloads (use cached only)
`KB_DATA_PATH`	`./data`	Host path for bind mount (compose variable, not used by engine)

Data portability

The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:

# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/

# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.

Claude Code skill

This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.

Releases 18

Engine engine-v3.2.4 Latest

2026-05-15 18:37:12 +01:00

Languages

Python 62.5%

Go 27.1%

Shell 9.9%

Makefile 0.4%

Dockerfile 0.1%