## Why Every `kb` CLI invocation loads the embedding model from scratch (~3-5 seconds) before executing even a simple query. This makes interactive use painfully slow and wastes GPU memory with redundant loads. The monolithic architecture also ties the CLI to heavy Python ML dependencies, prevents multi-client access, and couples GPU vendor choice (NVIDIA vs AMD) to every installation. ## What Changes - Clean-sheet v2 architecture — not a refactor of v1, built from scratch for client-server from day one - **Engine**: FastAPI server running in Docker, keeping the embedding model warm in GPU memory. Handles both ingestion and search via HTTP API - **Client**: Lightweight Go binary that talks to the engine over HTTP. No Python, no ML dependencies, instant startup - The `kb` CLI is the Go client only — all operations go through the engine API - GPU-vendor-agnostic Docker builds (NVIDIA CUDA and AMD ROCm targets) - Engine exposes a REST API suitable for reverse proxy / HTTPS termination - Data directory uses bind mounts for portability between hosts (e.g., WSL ingest → production server) - v1 Python CLI is retired — no dual-CLI maintenance burden ## Capabilities ### New Capabilities - `engine-api`: REST API server (FastAPI) exposing search, ingestion, document management, and status endpoints. Keeps embedding model resident in memory. Handles all DB and GPU operations - `go-client`: Lightweight Go CLI that communicates with the engine API over HTTP. Provides the same user-facing commands as v1 (init, add, search, list, info, remove, tags, status, config) without any ML dependencies - `docker-deployment`: GPU-vendor-agnostic Docker packaging with separate NVIDIA (CUDA) and AMD (ROCm) build targets. Bind-mount data volumes for host portability. Compose files for single-command deployment ### Modified Capabilities ## Impact - **Code**: v2 is a new codebase. Python engine built fresh around FastAPI (reusing v1's proven core logic for search, embeddings, database, and ingestion where appropriate). Go client is entirely new. v1 `cli.py` is not carried forward - **APIs**: New HTTP REST API (JSON). This is the primary integration surface going forward (replaces direct Python imports for Claude Code skills etc.) - **Dependencies**: Go toolchain added for client build. Python side adds `fastapi` + `uvicorn`. Heavy ML deps (torch, sentence-transformers, docling) contained entirely within the Docker image - **Systems**: Docker + NVIDIA Container Toolkit (or ROCm equivalent) required on engine host. Client machines need only the Go binary and network access to the engine - **Data**: SQLite database and HF model cache unchanged in format. Bind-mount directory structure must be documented for cross-host migration