9aab79d49b
- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv) - Add Go client with cross-platform build (client/) - Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/) - Add VERSION files for client and engine, wired into builds - Add release.sh for automated build, tag, release, and Docker push - Update README with build/release docs and ROCm migration note - Clean up .gitignore for v2 project structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.8 KiB
2.8 KiB
Why
Every kb CLI invocation loads the embedding model from scratch (~3-5 seconds) before executing even a simple query. This makes interactive use painfully slow and wastes GPU memory with redundant loads. The monolithic architecture also ties the CLI to heavy Python ML dependencies, prevents multi-client access, and couples GPU vendor choice (NVIDIA vs AMD) to every installation.
What Changes
- Clean-sheet v2 architecture — not a refactor of v1, built from scratch for client-server from day one
- Engine: FastAPI server running in Docker, keeping the embedding model warm in GPU memory. Handles both ingestion and search via HTTP API
- Client: Lightweight Go binary that talks to the engine over HTTP. No Python, no ML dependencies, instant startup
- The
kbCLI is the Go client only — all operations go through the engine API - GPU-vendor-agnostic Docker builds (NVIDIA CUDA and AMD ROCm targets)
- Engine exposes a REST API suitable for reverse proxy / HTTPS termination
- Data directory uses bind mounts for portability between hosts (e.g., WSL ingest → production server)
- v1 Python CLI is retired — no dual-CLI maintenance burden
Capabilities
New Capabilities
engine-api: REST API server (FastAPI) exposing search, ingestion, document management, and status endpoints. Keeps embedding model resident in memory. Handles all DB and GPU operationsgo-client: Lightweight Go CLI that communicates with the engine API over HTTP. Provides the same user-facing commands as v1 (init, add, search, list, info, remove, tags, status, config) without any ML dependenciesdocker-deployment: GPU-vendor-agnostic Docker packaging with separate NVIDIA (CUDA) and AMD (ROCm) build targets. Bind-mount data volumes for host portability. Compose files for single-command deployment
Modified Capabilities
Impact
- Code: v2 is a new codebase. Python engine built fresh around FastAPI (reusing v1's proven core logic for search, embeddings, database, and ingestion where appropriate). Go client is entirely new. v1
cli.pyis not carried forward - APIs: New HTTP REST API (JSON). This is the primary integration surface going forward (replaces direct Python imports for Claude Code skills etc.)
- Dependencies: Go toolchain added for client build. Python side adds
fastapi+uvicorn. Heavy ML deps (torch, sentence-transformers, docling) contained entirely within the Docker image - Systems: Docker + NVIDIA Container Toolkit (or ROCm equivalent) required on engine host. Client machines need only the Go binary and network access to the engine
- Data: SQLite database and HF model cache unchanged in format. Bind-mount directory structure must be documented for cross-host migration