kb/openspec/changes/archive/2026-03-25-kb-v2-client-server/proposal.md at 75e4a0cf730969bdae711a42e54a4b9a533b8126

steve/kb

Files

T

steve 9aab79d49b v2 restructure: Go client, Docker engine, release tooling

- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv)
- Add Go client with cross-platform build (client/)
- Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/)
- Add VERSION files for client and engine, wired into builds
- Add release.sh for automated build, tag, release, and Docker push
- Update README with build/release docs and ROCm migration note
- Clean up .gitignore for v2 project structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-26 21:52:25 +00:00

2.8 KiB

Raw Blame History

Why

Every kb CLI invocation loads the embedding model from scratch (~3-5 seconds) before executing even a simple query. This makes interactive use painfully slow and wastes GPU memory with redundant loads. The monolithic architecture also ties the CLI to heavy Python ML dependencies, prevents multi-client access, and couples GPU vendor choice (NVIDIA vs AMD) to every installation.

What Changes

Clean-sheet v2 architecture — not a refactor of v1, built from scratch for client-server from day one
- Engine: FastAPI server running in Docker, keeping the embedding model warm in GPU memory. Handles both ingestion and search via HTTP API
- Client: Lightweight Go binary that talks to the engine over HTTP. No Python, no ML dependencies, instant startup
The kb CLI is the Go client only — all operations go through the engine API
GPU-vendor-agnostic Docker builds (NVIDIA CUDA and AMD ROCm targets)
Engine exposes a REST API suitable for reverse proxy / HTTPS termination
Data directory uses bind mounts for portability between hosts (e.g., WSL ingest → production server)
v1 Python CLI is retired — no dual-CLI maintenance burden

Capabilities

New Capabilities

engine-api: REST API server (FastAPI) exposing search, ingestion, document management, and status endpoints. Keeps embedding model resident in memory. Handles all DB and GPU operations
go-client: Lightweight Go CLI that communicates with the engine API over HTTP. Provides the same user-facing commands as v1 (init, add, search, list, info, remove, tags, status, config) without any ML dependencies
docker-deployment: GPU-vendor-agnostic Docker packaging with separate NVIDIA (CUDA) and AMD (ROCm) build targets. Bind-mount data volumes for host portability. Compose files for single-command deployment

Modified Capabilities

Impact

Code: v2 is a new codebase. Python engine built fresh around FastAPI (reusing v1's proven core logic for search, embeddings, database, and ingestion where appropriate). Go client is entirely new. v1 cli.py is not carried forward
APIs: New HTTP REST API (JSON). This is the primary integration surface going forward (replaces direct Python imports for Claude Code skills etc.)
Dependencies: Go toolchain added for client build. Python side adds fastapi + uvicorn. Heavy ML deps (torch, sentence-transformers, docling) contained entirely within the Docker image
Systems: Docker + NVIDIA Container Toolkit (or ROCm equivalent) required on engine host. Client machines need only the Go binary and network access to the engine
Data: SQLite database and HF model cache unchanged in format. Bind-mount directory structure must be documented for cross-host migration

2.8 KiB Raw Blame History

Why