- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv) - Add Go client with cross-platform build (client/) - Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/) - Add VERSION files for client and engine, wired into builds - Add release.sh for automated build, tag, release, and Docker push - Update README with build/release docs and ROCm migration note - Clean up .gitignore for v2 project structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.6 KiB
ADDED Requirements
Requirement: NVIDIA CUDA Docker image
The project SHALL provide a Dockerfile.nvidia that builds the engine on an NVIDIA CUDA runtime base image with GPU support for PyTorch and ONNX Runtime.
Scenario: Build NVIDIA image
- WHEN an admin runs
docker compose -f compose.nvidia.yaml build - THEN the build SHALL produce a working image with CUDA runtime, PyTorch with CUDA support, onnxruntime-gpu, and all engine dependencies
Scenario: GPU access in NVIDIA container
- WHEN the NVIDIA container starts with
--gpus allor the NVIDIA runtime - THEN
torch.cuda.is_available()SHALL return True and the engine SHALL load the embedding model on GPU
Requirement: AMD ROCm Docker image
The project SHALL provide a Dockerfile.rocm that builds the engine on an AMD ROCm base image with GPU support for PyTorch and ONNX Runtime.
Scenario: Build ROCm image
- WHEN an admin runs
docker compose -f compose.rocm.yaml build - THEN the build SHALL produce a working image with ROCm runtime, PyTorch with ROCm support, onnxruntime-rocm, and all engine dependencies
Scenario: GPU access in ROCm container
- WHEN the ROCm container starts with
--device=/dev/kfd --device=/dev/dri - THEN
torch.cuda.is_available()SHALL return True (via HIP) and the engine SHALL load the embedding model on GPU
Requirement: Application code is GPU-vendor-agnostic
The Python engine code SHALL NOT reference CUDA or ROCm directly. GPU vendor abstraction SHALL be handled entirely at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and AMD images without modification.
Scenario: Same engine code on both platforms
- WHEN the engine starts on an NVIDIA image and an AMD image with identical configuration
- THEN both SHALL load the model, accept requests, and return identical search results for the same query and data
Requirement: Bind-mount data directory
The engine SHALL store all persistent state (SQLite database, HF model cache, staging directory) under a single configurable data directory. This directory SHALL be mounted from the host via bind mount.
Scenario: Data directory structure
- WHEN the engine starts for the first time
- THEN it SHALL create the following structure under the data directory:
kb.db— SQLite databasehf_cache/— HuggingFace model cachestaging/— temporary files for queued ingestion jobs
Scenario: Portable data across hosts
- WHEN an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
- THEN the engine SHALL start successfully and serve all previously ingested documents without reprocessing
Scenario: Portable data across GPU vendors
- WHEN an admin moves the data directory from an NVIDIA host to an AMD host (same model name)
- THEN the engine SHALL start successfully. Embeddings in the database remain valid (they are model-specific, not GPU-vendor-specific)
Requirement: Compose files for deployment
The project SHALL provide Docker Compose files for single-command deployment.
Scenario: Start NVIDIA deployment
- WHEN an admin runs
docker compose -f compose.nvidia.yaml up -d - THEN the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
Scenario: Start ROCm deployment
- WHEN an admin runs
docker compose -f compose.rocm.yaml up -d - THEN the engine SHALL start with GPU access via ROCm device passthrough, bind-mount the data directory, and be reachable on the configured port
Scenario: Automatic restart
- WHEN the engine process crashes or the host reboots
- THEN Docker SHALL automatically restart the container (restart policy
unless-stopped)
Scenario: Configure via environment
- WHEN an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, etc.)
- THEN the engine SHALL use those values
Requirement: CPU-only fallback
The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.
Scenario: No GPU available
- WHEN the container starts without GPU passthrough (no
--gpus, no/dev/kfd) - THEN the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable
Scenario: Explicit CPU mode
- WHEN
KB_DEVICE=cpuandKB_INGEST_DEVICE=cpuare set in the environment - THEN the engine SHALL use CPU regardless of GPU availability