steve/kb

T

steve b3dce188e1 Fix version check failing on non-200 status responses

When the engine returns 401 (auth required) or other non-200 responses,
the version check was parsing the error body, getting an empty version
string, and fatally exiting. Now skips the check on non-200 responses
and lets the actual API call surface the real error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-02 21:52:24 +01:00

.claude

Add GPU device control, Docker support, and v2 client-server design

2026-03-25 20:17:31 +00:00

client

Fix version check failing on non-200 status responses

2026-04-02 21:52:24 +01:00

engine

Add MCP server, note mutation endpoint, and updated_at tracking (v3.0.0)

2026-04-02 21:34:55 +01:00

mcp

Add MCP server, note mutation endpoint, and updated_at tracking (v3.0.0)

2026-04-02 21:34:55 +01:00

openspec

Add MCP server, note mutation endpoint, and updated_at tracking (v3.0.0)

2026-04-02 21:34:55 +01:00

.gitignore

Add MCP server, note mutation endpoint, and updated_at tracking (v3.0.0)

2026-04-02 21:34:55 +01:00

DEVELOPER.md

Replace implicit note shorthand with explicit addnote command and split README

2026-03-31 20:48:22 +01:00

README.md

Update README for v3.0.0 — add MCP server docs, updatenote, fix version refs

2026-04-02 21:45:31 +01:00

release-client.sh

Independent client/engine versioning with compatibility check

2026-03-28 15:59:16 +00:00

release-engine.sh

Add MCP server, note mutation endpoint, and updated_at tracking (v3.0.0)

2026-04-02 21:34:55 +01:00

SKILL.md

Add MCP server, note mutation endpoint, and updated_at tracking (v3.0.0)

2026-04-02 21:34:55 +01:00

README.md

kb

Personal knowledge base with hybrid search (full-text + semantic vector search).

Client-server architecture: a FastAPI engine running in Docker (with optional GPU acceleration), a lightweight Go CLI client, and an MCP server for native agent integration.

Architecture

Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
                            ▲
MCP Agents  ──MCP/HTTP──▶ MCP Server (Docker) ──┘

Engine: Keeps the embedding model warm in memory. Handles search, ingestion, document management, and note mutation via REST API. Runs in Docker with NVIDIA GPU, AMD GPU (ROCm), or CPU-only support.
Client: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
MCP Server: Exposes kb operations as native MCP tools over Streamable HTTP. Runs as a separate Docker container alongside the engine. Supports collections for scoping agent memory vs user documents.
Storage: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.

Quick start

1. Start the engine

From pre-built images (recommended):

# NVIDIA GPU
docker run -d --name kb-engine \
  --gpus all \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia

# AMD GPU (ROCm)
docker run -d --name kb-engine \
  --device /dev/kfd --device /dev/dri \
  --group-add video \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_DEVICE=auto \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-rocm

# CPU only (no GPU required — smaller image)
docker run -d --name kb-engine \
  -p 8000:8000 \
  -v ~/kb-data:/data \
  -e KB_MODEL=all-MiniLM-L6-v2 \
  -e KB_API_KEY=your-secret-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/engine:latest-cpu

Or use a compose file from the repo:

# NVIDIA GPU
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d

# AMD GPU (ROCm)
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.rocm.yaml up -d

# CPU only
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d

See DEVELOPER.md to run the engine from source.

The engine will download the embedding model on first start (~90MB) and load it into memory (GPU or CPU). Check readiness:

curl http://localhost:8000/api/v1/health
# {"status": "healthy"}

2. Install the client

From a release (recommended):

Check releases for the latest client tag, then:

# Set the version tag
TAG=client-v3.0.0

# Linux (amd64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-amd64

# Linux (arm64)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-arm64

# macOS (Apple Silicon)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-arm64

# macOS (Intel)
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-amd64

# Then install
chmod +x kb
sudo mv kb /usr/local/bin/

See DEVELOPER.md to build the client from source.

3. Configure the client

The client works with zero configuration if the engine is on localhost:8000. To customise, create ~/.kb/client.yaml:

engine_url: http://localhost:8000
api_key: ""
default_format: human

Override via environment variables (KB_ENGINE_URL, KB_API_KEY) or CLI flags (--engine, --api-key, --format).

4. Use it

# Add notes
kb addnote "Always restart nginx after config changes"
kb addnote "Server room is building 3, floor 2" --tags ops

# Add files (async — uploads and exits immediately)
kb addfile ~/docs/manual.pdf --tags admin
kb addfile ~/notes/ --recursive

# Check ingestion progress
kb jobs

# Search
kb search "how to install git"
kb search "deploy process" --tags ops --type pdf

# Update a note in place
kb updatenote 42 "revised note content"

# Manage
kb list
kb info 1
kb tags
kb tag 1 --add important
kb export 1 -o manual.pdf    # download original file
kb remove 3 --yes
kb status

How it works

Ingestion: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
Search: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
Output: JSON (for scripts/LLM tool use) or human-readable terminal format. Use --format json on any command.

Engine configuration

The engine is configured via environment variables (set in the compose file or via docker compose CLI):

Variable	Default	Description
`KB_DATA_DIR`	`/data`	Data directory inside the container (bind-mounted)
`KB_MODEL`	`all-MiniLM-L6-v2`	HuggingFace embedding model name
`KB_DEVICE`	`auto`	Embedding/search device: `auto`, `cpu`, or `cuda`
`KB_INGEST_DEVICE`	`auto`	Docling layout detection device: `auto`, `cpu`, or `cuda`
`KB_API_KEY`	(none)	Optional Bearer token for API authentication
`KB_SEARCH_THRESHOLD`	`0.01`	Minimum score for search results (filters noise)
`KB_PORT`	`8000`	Port to expose
`KB_HOST`	`0.0.0.0`	Host to bind to
`HF_HUB_OFFLINE`	(none)	Set to `1` to prevent model downloads (use cached only)
`KB_DATA_PATH`	`./data`	Host path for bind mount (compose variable, not used by engine)

Data portability

The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:

# On source host
rsync -a ~/kb-data/ user@target:/home/user/kb-data/

# On target host
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d

Data is device-agnostic — you can ingest on NVIDIA and serve from AMD or CPU (or any combination) with the same data directory.

MCP server (agent integration)

The MCP server exposes kb operations as native MCP tools, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI.

Start the MCP server

The compose files include a kb-mcp service alongside the engine. Set KB_MCP_API_KEY to require Bearer token auth from connecting agents:

KB_API_KEY=your-engine-key KB_MCP_API_KEY=your-agent-key \
  docker compose -f engine/compose.nvidia.yaml up -d

Or run the MCP server standalone:

docker run -d --name kb-mcp \
  -p 3000:3000 \
  -e KB_ENGINE_URL=http://your-engine-host:8000 \
  -e KB_API_KEY=your-engine-key \
  -e KB_MCP_API_KEY=your-agent-key \
  --restart unless-stopped \
  docker.dcglab.co.uk/dcg/kb/mcp:latest

MCP tools

Tool	Description
`kb_search`	Hybrid search with optional collection/tag/type filters
`kb_addnote`	Add a text note (queued for async ingestion)
`kb_update_note`	Update an existing note in place
`kb_get`	Get document details by ID or source path
`kb_status`	Engine health and statistics
`kb_jobs`	Ingestion queue status
`kb_upload_start`	Start a chunked file upload
`kb_upload_chunk`	Upload a base64-encoded file chunk
`kb_upload_finish`	Finish upload and submit for ingestion

Collections

The MCP server supports collections — scoped document namespaces implemented via tag conventions. Use these to separate agent memory from user documents:

documents (default) — user-facing documents
memory — agent memory and preferences
workspace — working context

Tools accept a collection parameter. The MCP server translates this to collection:<name> tags on the engine, and strips them from responses so agents see a clean "collection": "memory" field.

MCP server configuration

Variable	Default	Description
`KB_ENGINE_URL`	`http://localhost:8000`	Engine API URL
`KB_API_KEY`	(none)	Engine API key
`KB_MCP_API_KEY`	(none)	Bearer token required from agents (disabled if unset)
`KB_MCP_PORT`	`3000`	Port to listen on

Claude Code skill

This tool is designed to be wrapped as a Claude Code skill. See SKILL.md for the skill definition.

Releases 18

Engine engine-v3.2.4 Latest

2026-05-15 18:37:12 +01:00

Languages

Python 62.5%

Go 27.1%

Shell 9.9%

Makefile 0.4%

Dockerfile 0.1%