0dc3065979
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
252 lines
8.5 KiB
Markdown
252 lines
8.5 KiB
Markdown
# kb
|
|
|
|
Personal knowledge base with hybrid search (full-text + semantic vector search).
|
|
|
|
Client-server architecture: a **FastAPI engine** running in Docker (with optional GPU acceleration), a lightweight **Go CLI client**, and an **MCP server** for native agent integration.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
|
|
▲
|
|
MCP Agents ──MCP/HTTP──▶ MCP Server (Docker) ──┘
|
|
```
|
|
|
|
- **Engine**: Keeps the embedding model warm in memory. Handles search, ingestion, document management, and note mutation via REST API. Runs in Docker with NVIDIA GPU, AMD GPU (ROCm), or CPU-only support.
|
|
- **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
|
|
- **MCP Server**: Exposes kb operations as native MCP tools over Streamable HTTP. Runs as a separate Docker container alongside the engine. Supports collections for scoping agent memory vs user documents.
|
|
- **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.
|
|
|
|
## Quick start
|
|
|
|
### 1. Start the engine
|
|
|
|
**From pre-built images** (recommended):
|
|
|
|
```bash
|
|
# NVIDIA GPU
|
|
docker run -d --name kb-engine \
|
|
--gpus all \
|
|
-p 8000:8000 \
|
|
-v ~/kb-data:/data \
|
|
-e KB_MODEL=all-MiniLM-L6-v2 \
|
|
-e KB_DEVICE=auto \
|
|
-e KB_API_KEY=your-secret-key \
|
|
--restart unless-stopped \
|
|
docker.dcglab.co.uk/dcg/kb/engine:latest-nvidia
|
|
|
|
# AMD GPU (ROCm)
|
|
docker run -d --name kb-engine \
|
|
--device /dev/kfd --device /dev/dri \
|
|
--group-add video \
|
|
-p 8000:8000 \
|
|
-v ~/kb-data:/data \
|
|
-e KB_MODEL=all-MiniLM-L6-v2 \
|
|
-e KB_DEVICE=auto \
|
|
-e KB_API_KEY=your-secret-key \
|
|
--restart unless-stopped \
|
|
docker.dcglab.co.uk/dcg/kb/engine:latest-rocm
|
|
|
|
# CPU only (no GPU required — smaller image)
|
|
docker run -d --name kb-engine \
|
|
-p 8000:8000 \
|
|
-v ~/kb-data:/data \
|
|
-e KB_MODEL=all-MiniLM-L6-v2 \
|
|
-e KB_API_KEY=your-secret-key \
|
|
--restart unless-stopped \
|
|
docker.dcglab.co.uk/dcg/kb/engine:latest-cpu
|
|
```
|
|
|
|
Or use a compose file from the repo:
|
|
|
|
```bash
|
|
# NVIDIA GPU
|
|
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.nvidia.yaml up -d
|
|
|
|
# AMD GPU (ROCm)
|
|
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.rocm.yaml up -d
|
|
|
|
# CPU only
|
|
KB_DATA_PATH=~/kb-data docker compose -f engine/compose.cpu.yaml up -d
|
|
```
|
|
|
|
See [DEVELOPER.md](DEVELOPER.md) to run the engine from source.
|
|
|
|
The engine will download the embedding model on first start (~90MB) and load it into memory (GPU or CPU). Check readiness:
|
|
|
|
```bash
|
|
curl http://localhost:8000/api/v1/health
|
|
# {"status": "healthy"}
|
|
```
|
|
|
|
### 2. Install the client
|
|
|
|
**From a release** (recommended):
|
|
|
|
Check [releases](https://gitea.dcglab.co.uk/steve/kb/releases) for the latest client tag, then:
|
|
|
|
```bash
|
|
# Set the version tag
|
|
TAG=client-v3.0.0
|
|
|
|
# Linux (amd64)
|
|
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-amd64
|
|
|
|
# Linux (arm64)
|
|
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-linux-arm64
|
|
|
|
# macOS (Apple Silicon)
|
|
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-arm64
|
|
|
|
# macOS (Intel)
|
|
curl -L -o kb https://gitea.dcglab.co.uk/steve/kb/releases/download/${TAG}/kb-darwin-amd64
|
|
|
|
# Then install
|
|
chmod +x kb
|
|
sudo mv kb /usr/local/bin/
|
|
```
|
|
|
|
See [DEVELOPER.md](DEVELOPER.md) to build the client from source.
|
|
|
|
### 3. Configure the client
|
|
|
|
The client works with zero configuration if the engine is on localhost:8000. To customise, create `~/.kb/client.yaml`:
|
|
|
|
```yaml
|
|
engine_url: http://localhost:8000
|
|
api_key: ""
|
|
default_format: human
|
|
```
|
|
|
|
Override via environment variables (`KB_ENGINE_URL`, `KB_API_KEY`) or CLI flags (`--engine`, `--api-key`, `--format`).
|
|
|
|
### 4. Use it
|
|
|
|
```bash
|
|
# Add notes
|
|
kb addnote "Always restart nginx after config changes"
|
|
kb addnote "Server room is building 3, floor 2" --tags ops
|
|
|
|
# Add files (async — uploads and exits immediately)
|
|
kb addfile ~/docs/manual.pdf --tags admin
|
|
kb addfile ~/notes/ --recursive
|
|
|
|
# Check ingestion progress
|
|
kb jobs
|
|
|
|
# Search
|
|
kb search "how to install git"
|
|
kb search "deploy process" --tags ops --type pdf
|
|
|
|
# Update a note in place
|
|
kb updatenote 42 "revised note content"
|
|
|
|
# Manage
|
|
kb list
|
|
kb info 1
|
|
kb tags
|
|
kb tag 1 --add important
|
|
kb export 1 -o manual.pdf # download original file
|
|
kb remove 3 --yes
|
|
kb status
|
|
```
|
|
|
|
## How it works
|
|
|
|
- **Ingestion**: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
|
|
- **Search**: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
|
|
- **Output**: JSON (for scripts/LLM tool use) or human-readable terminal format. Use `--format json` on any command.
|
|
|
|
## Engine configuration
|
|
|
|
The engine is configured via environment variables (set in the compose file or via `docker compose` CLI):
|
|
|
|
| Variable | Default | Description |
|
|
|---|---|---|
|
|
| `KB_DATA_DIR` | `/data` | Data directory inside the container (bind-mounted) |
|
|
| `KB_MODEL` | `all-MiniLM-L6-v2` | HuggingFace embedding model name |
|
|
| `KB_DEVICE` | `auto` | Embedding/search device: `auto`, `cpu`, or `cuda` |
|
|
| `KB_INGEST_DEVICE` | `auto` | Docling layout detection device: `auto`, `cpu`, or `cuda` |
|
|
| `KB_API_KEY` | (none) | Optional Bearer token for API authentication |
|
|
| `KB_SEARCH_THRESHOLD` | `0.01` | Minimum score for search results (filters noise) |
|
|
| `KB_PORT` | `8000` | Port to expose |
|
|
| `KB_HOST` | `0.0.0.0` | Host to bind to |
|
|
| `HF_HUB_OFFLINE` | (none) | Set to `1` to prevent model downloads (use cached only) |
|
|
| `KB_DATA_PATH` | `./data` | Host path for bind mount (compose variable, not used by engine) |
|
|
|
|
## Data portability
|
|
|
|
The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:
|
|
|
|
```bash
|
|
# On source host
|
|
rsync -a ~/kb-data/ user@target:/home/user/kb-data/
|
|
|
|
# On target host
|
|
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
|
|
```
|
|
|
|
Data is device-agnostic — you can ingest on NVIDIA and serve from AMD or CPU (or any combination) with the same data directory.
|
|
|
|
## MCP server (agent integration)
|
|
|
|
The MCP server exposes kb operations as native MCP tools, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI.
|
|
|
|
### Start the MCP server
|
|
|
|
The compose files include a `kb-mcp` service alongside the engine. Set `KB_MCP_API_KEY` to require Bearer token auth from connecting agents:
|
|
|
|
```bash
|
|
KB_API_KEY=your-engine-key KB_MCP_API_KEY=your-agent-key \
|
|
docker compose -f engine/compose.nvidia.yaml up -d
|
|
```
|
|
|
|
Or run the MCP server standalone:
|
|
|
|
```bash
|
|
docker run -d --name kb-mcp \
|
|
-p 3000:3000 \
|
|
-e KB_ENGINE_URL=http://your-engine-host:8000 \
|
|
-e KB_API_KEY=your-engine-key \
|
|
-e KB_MCP_API_KEY=your-agent-key \
|
|
--restart unless-stopped \
|
|
docker.dcglab.co.uk/dcg/kb/mcp:latest
|
|
```
|
|
|
|
### MCP tools
|
|
|
|
| Tool | Description |
|
|
|---|---|
|
|
| `kb_search` | Hybrid search with optional collection/tag/type filters |
|
|
| `kb_addnote` | Add a text note (queued for async ingestion) |
|
|
| `kb_update_note` | Update an existing note in place |
|
|
| `kb_get` | Get document details by ID or source path |
|
|
| `kb_status` | Engine health and statistics |
|
|
| `kb_jobs` | Ingestion queue status |
|
|
| `kb_upload_start` | Start a chunked file upload |
|
|
| `kb_upload_chunk` | Upload a base64-encoded file chunk |
|
|
| `kb_upload_finish` | Finish upload and submit for ingestion |
|
|
|
|
### Collections
|
|
|
|
The MCP server supports **collections** — scoped document namespaces implemented via tag conventions. Use these to separate agent memory from user documents:
|
|
|
|
- `documents` (default) — user-facing documents
|
|
- `memory` — agent memory and preferences
|
|
- `workspace` — working context
|
|
|
|
Tools accept a `collection` parameter. The MCP server translates this to `collection:<name>` tags on the engine, and strips them from responses so agents see a clean `"collection": "memory"` field.
|
|
|
|
### MCP server configuration
|
|
|
|
| Variable | Default | Description |
|
|
|---|---|---|
|
|
| `KB_ENGINE_URL` | `http://localhost:8000` | Engine API URL |
|
|
| `KB_API_KEY` | (none) | Engine API key |
|
|
| `KB_MCP_API_KEY` | (none) | Bearer token required from agents (disabled if unset) |
|
|
| `KB_MCP_PORT` | `3000` | Port to listen on |
|
|
|
|
## Claude Code skill
|
|
|
|
This tool is designed to be wrapped as a Claude Code skill. See `SKILL.md` for the skill definition.
|