v2 restructure: Go client, Docker engine, release tooling
- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv) - Add Go client with cross-platform build (client/) - Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/) - Add VERSION files for client and engine, wired into builds - Add release.sh for automated build, tag, release, and Docker push - Update README with build/release docs and ROCm migration note - Clean up .gitignore for v2 project structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,53 +1,195 @@
|
||||
# kb-search
|
||||
|
||||
CLI knowledge base with hybrid search (full-text + semantic vector search).
|
||||
Personal knowledge base with hybrid search (full-text + semantic vector search).
|
||||
|
||||
## Install
|
||||
v2 uses a client-server architecture: a **FastAPI engine** running in Docker (with GPU acceleration) and a lightweight **Go CLI client** that talks to it over HTTP.
|
||||
|
||||
```bash
|
||||
pipx install kb-search
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Go CLI (kb) ──HTTP──▶ FastAPI Engine (Docker) ──▶ SQLite + GPU
|
||||
```
|
||||
|
||||
## Quickstart
|
||||
- **Engine**: Keeps the embedding model warm in GPU memory. Handles search, ingestion, and document management via REST API. Runs in Docker with NVIDIA or AMD GPU support.
|
||||
- **Client**: Single static Go binary. No Python, no ML dependencies, instant startup. Talks to the engine over HTTP.
|
||||
- **Storage**: Single SQLite database with FTS5 (keyword search) and sqlite-vec (vector search). Portable via bind mount — just copy the data directory between hosts.
|
||||
|
||||
## Quick start
|
||||
|
||||
### 1. Start the engine
|
||||
|
||||
```bash
|
||||
# Initialise (downloads embedding model ~90MB)
|
||||
kb init
|
||||
cd engine
|
||||
|
||||
# Add documents
|
||||
# NVIDIA GPU
|
||||
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
|
||||
|
||||
# AMD GPU (ROCm)
|
||||
KB_DATA_PATH=~/kb-data docker compose -f compose.rocm.yaml up -d
|
||||
```
|
||||
|
||||
The engine will download the embedding model on first start (~90MB) and load it onto the GPU. Check readiness:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/api/v1/health
|
||||
# {"status": "healthy"}
|
||||
```
|
||||
|
||||
### 2. Install the client
|
||||
|
||||
Build from source:
|
||||
|
||||
```bash
|
||||
cd client
|
||||
make build # produces ./kb binary
|
||||
```
|
||||
|
||||
Or cross-compile for all platforms:
|
||||
|
||||
```bash
|
||||
make all # produces dist/kb-{os}-{arch} binaries
|
||||
```
|
||||
|
||||
### 3. Configure the client
|
||||
|
||||
The client works with zero configuration if the engine is on localhost:8000. To customise, create `~/.kb/client.yaml`:
|
||||
|
||||
```yaml
|
||||
engine_url: http://localhost:8000
|
||||
api_key: ""
|
||||
default_format: human
|
||||
```
|
||||
|
||||
Override via environment variables (`KB_ENGINE_URL`, `KB_API_KEY`) or CLI flags (`--engine`, `--api-key`, `--format`).
|
||||
|
||||
### 4. Use it
|
||||
|
||||
```bash
|
||||
# Add documents (async — uploads and exits immediately)
|
||||
kb add ~/docs/manual.pdf --tags admin
|
||||
kb add ~/notes/ --recursive
|
||||
kb add --note "Always restart nginx after config changes" --tags ops
|
||||
|
||||
# Check ingestion progress
|
||||
kb jobs
|
||||
|
||||
# Search
|
||||
kb search "how to install git"
|
||||
kb search "deploy process" --tags ops --type pdf
|
||||
kb search "authentication" --format human
|
||||
|
||||
# Manage
|
||||
kb list --format human
|
||||
kb list
|
||||
kb info 1
|
||||
kb tags
|
||||
kb tag 1 --add important
|
||||
kb remove 3 --yes
|
||||
kb status
|
||||
```
|
||||
|
||||
## How it works
|
||||
|
||||
- **Ingestion**: Documents are chunked (PDFs via Docling, markdown by headers, code by AST/functions) and embedded locally
|
||||
- **Storage**: Everything in a single SQLite database (`~/.kb/kb.db`) using FTS5 for keyword search and sqlite-vec for vector search
|
||||
- **Search**: Hybrid retrieval combining BM25 keyword scoring and vector similarity via Reciprocal Rank Fusion
|
||||
- **Output**: JSON (for LLM tool use) or human-readable terminal format
|
||||
- **Ingestion**: Files are uploaded to the engine and queued for async processing. The engine chunks documents (PDFs via Docling, markdown by headers, code by AST/functions, notes as whole text), generates embeddings on GPU, and stores everything in SQLite.
|
||||
- **Search**: Hybrid retrieval combining BM25 keyword scoring (FTS5) and vector similarity (sqlite-vec), merged via Reciprocal Rank Fusion. Sub-100ms with a warm model.
|
||||
- **Output**: JSON (for scripts/LLM tool use) or human-readable terminal format. Use `--format json` on any command.
|
||||
|
||||
## Configuration
|
||||
## Engine configuration
|
||||
|
||||
Optional YAML config at `~/.kb/config.yaml`. Works with zero configuration.
|
||||
The engine is configured via environment variables (set in the compose file or via `docker compose` CLI):
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `KB_DATA_DIR` | `/data` | Data directory inside the container (bind-mounted) |
|
||||
| `KB_MODEL` | `all-MiniLM-L6-v2` | HuggingFace embedding model name |
|
||||
| `KB_DEVICE` | `auto` | Embedding device: `auto`, `cpu`, or `cuda` |
|
||||
| `KB_INGEST_DEVICE` | `auto` | Docling layout detection device |
|
||||
| `KB_API_KEY` | (none) | Optional Bearer token for API authentication |
|
||||
| `KB_SEARCH_THRESHOLD` | `0.01` | Minimum score for search results (filters noise) |
|
||||
| `KB_PORT` | `8000` | Port to expose |
|
||||
| `KB_DATA_PATH` | `./data` | Host path for bind mount (compose variable) |
|
||||
|
||||
## Data portability
|
||||
|
||||
The data directory contains everything: SQLite database, model cache, and staging files. To migrate between hosts:
|
||||
|
||||
```bash
|
||||
kb config # View current config
|
||||
kb config set chunking.pdf.max_tokens 2048 # Change a value
|
||||
# On source host
|
||||
rsync -a ~/kb-data/ user@target:/home/user/kb-data/
|
||||
|
||||
# On target host
|
||||
KB_DATA_PATH=~/kb-data docker compose -f compose.nvidia.yaml up -d
|
||||
```
|
||||
|
||||
ENV overrides: `KB_DATA_DIR`, `KB_MODEL`, `KB_DEFAULT_TOP`, `KB_DEFAULT_FORMAT`
|
||||
Data is GPU-vendor-agnostic — you can ingest on NVIDIA and serve from AMD (or vice versa) with the same data directory.
|
||||
|
||||
## Claude Code Skill
|
||||
## API reference
|
||||
|
||||
All endpoints are under `/api/v1/`. Requires `Authorization: Bearer <key>` header when `KB_API_KEY` is set.
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|---|---|---|
|
||||
| `GET` | `/health` | Health check (bypasses auth) |
|
||||
| `POST` | `/search` | Hybrid search (JSON body) |
|
||||
| `POST` | `/jobs` | Upload file/note for ingestion (multipart, returns 202) |
|
||||
| `GET` | `/jobs` | List ingestion jobs |
|
||||
| `GET` | `/jobs/{id}` | Job details |
|
||||
| `GET` | `/documents` | List documents |
|
||||
| `GET` | `/documents/{id}` | Document details with chunks |
|
||||
| `DELETE` | `/documents/{id}` | Remove a document |
|
||||
| `PUT` | `/documents/{id}/tags` | Add/remove tags |
|
||||
| `GET` | `/tags` | List all tags |
|
||||
| `GET` | `/status` | Engine status, GPU info, DB stats |
|
||||
| `POST` | `/reindex` | Re-embed all chunks |
|
||||
|
||||
## Building and releasing
|
||||
|
||||
Versioning is managed via `client/VERSION` and `engine/VERSION` files. The release script bumps these, builds all artifacts, tags, and publishes in one step.
|
||||
|
||||
### Release
|
||||
|
||||
```bash
|
||||
./release.sh --gitea # patch bump (e.g. 2.0.0 → 2.0.1), release via Gitea
|
||||
./release.sh --github --minor # minor bump (e.g. 2.0.1 → 2.1.0), release via GitHub
|
||||
./release.sh --gitea --major # major bump (e.g. 2.1.0 → 3.0.0)
|
||||
./release.sh --gitea --no-increment # release current version as-is
|
||||
./release.sh --gitea --dry-run # preview without doing anything
|
||||
```
|
||||
|
||||
The script will:
|
||||
|
||||
1. Bump the version in both `client/VERSION` and `engine/VERSION` (unless `--no-increment`)
|
||||
2. Build Go client binaries for all platforms (linux/darwin/windows, amd64/arm64)
|
||||
3. Build Docker engine images for NVIDIA and ROCm
|
||||
4. Commit the version bump, create an annotated git tag, and push
|
||||
5. Create a release (with client binaries attached) via `tea` or `gh`
|
||||
6. Push Docker images to the registry
|
||||
|
||||
### Checking versions
|
||||
|
||||
```bash
|
||||
# Client
|
||||
kb --version
|
||||
|
||||
# Engine
|
||||
curl http://localhost:8000/api/v1/status | jq .version
|
||||
```
|
||||
|
||||
### Docker images
|
||||
|
||||
Images are pushed to `docker.dcglab.co.uk/dcg/kb/engine` with tags:
|
||||
|
||||
- `v2.1.0-nvidia` / `v2.1.0-rocm` — versioned
|
||||
- `latest-nvidia` / `latest-rocm` — latest release
|
||||
|
||||
Override the registry and org via environment variables:
|
||||
|
||||
```bash
|
||||
REGISTRY=ghcr.io IMAGE_ORG=myorg ./release.sh --github
|
||||
```
|
||||
|
||||
## Future: ROCm runtime migration
|
||||
|
||||
The `onnxruntime-rocm` execution provider was removed from onnxruntime as of v1.23. AMD is pushing toward the **MIGraphX execution provider** as the replacement for ROCm GPU inference. When upgrading onnxruntime beyond v1.22, the ROCm Dockerfile will need to switch from `onnxruntime-rocm` to `onnxruntime` with the MIGraphX EP and install the `migraphx` runtime libraries instead.
|
||||
|
||||
## Claude Code skill
|
||||
|
||||
This tool is designed to be wrapped as a Claude Code skill. See `SKILL.md` for the skill definition.
|
||||
|
||||
Reference in New Issue
Block a user