Files
steve 9aab79d49b v2 restructure: Go client, Docker engine, release tooling
- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv)
- Add Go client with cross-platform build (client/)
- Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/)
- Add VERSION files for client and engine, wired into builds
- Add release.sh for automated build, tag, release, and Docker push
- Update README with build/release docs and ROCm migration note
- Clean up .gitignore for v2 project structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 21:52:25 +00:00

33 lines
2.8 KiB
Markdown

## Why
Every `kb` CLI invocation loads the embedding model from scratch (~3-5 seconds) before executing even a simple query. This makes interactive use painfully slow and wastes GPU memory with redundant loads. The monolithic architecture also ties the CLI to heavy Python ML dependencies, prevents multi-client access, and couples GPU vendor choice (NVIDIA vs AMD) to every installation.
## What Changes
- Clean-sheet v2 architecture — not a refactor of v1, built from scratch for client-server from day one
- **Engine**: FastAPI server running in Docker, keeping the embedding model warm in GPU memory. Handles both ingestion and search via HTTP API
- **Client**: Lightweight Go binary that talks to the engine over HTTP. No Python, no ML dependencies, instant startup
- The `kb` CLI is the Go client only — all operations go through the engine API
- GPU-vendor-agnostic Docker builds (NVIDIA CUDA and AMD ROCm targets)
- Engine exposes a REST API suitable for reverse proxy / HTTPS termination
- Data directory uses bind mounts for portability between hosts (e.g., WSL ingest → production server)
- v1 Python CLI is retired — no dual-CLI maintenance burden
## Capabilities
### New Capabilities
- `engine-api`: REST API server (FastAPI) exposing search, ingestion, document management, and status endpoints. Keeps embedding model resident in memory. Handles all DB and GPU operations
- `go-client`: Lightweight Go CLI that communicates with the engine API over HTTP. Provides the same user-facing commands as v1 (init, add, search, list, info, remove, tags, status, config) without any ML dependencies
- `docker-deployment`: GPU-vendor-agnostic Docker packaging with separate NVIDIA (CUDA) and AMD (ROCm) build targets. Bind-mount data volumes for host portability. Compose files for single-command deployment
### Modified Capabilities
<!-- No existing specs to modify — greenfield OpenSpec setup -->
## Impact
- **Code**: v2 is a new codebase. Python engine built fresh around FastAPI (reusing v1's proven core logic for search, embeddings, database, and ingestion where appropriate). Go client is entirely new. v1 `cli.py` is not carried forward
- **APIs**: New HTTP REST API (JSON). This is the primary integration surface going forward (replaces direct Python imports for Claude Code skills etc.)
- **Dependencies**: Go toolchain added for client build. Python side adds `fastapi` + `uvicorn`. Heavy ML deps (torch, sentence-transformers, docling) contained entirely within the Docker image
- **Systems**: Docker + NVIDIA Container Toolkit (or ROCm equivalent) required on engine host. Client machines need only the Go binary and network access to the engine
- **Data**: SQLite database and HF model cache unchanged in format. Bind-mount directory structure must be documented for cross-host migration