v2 restructure: Go client, Docker engine, release tooling

- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv) - Add Go client with cross-platform build (client/) - Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/) - Add VERSION files for client and engine, wired into builds - Add release.sh for automated build, tag, release, and Docker push - Update README with build/release docs and ROCm migration note - Clean up .gitignore for v2 project structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 21:52:25 +00:00
parent 2030976b85
commit 9aab79d49b
98 changed files with 4526 additions and 7776 deletions
@@ -0,0 +1,100 @@
+# Docker Deployment
+
+## Purpose
+
+Docker deployment provides containerized packaging of the knowledge base engine with GPU support for NVIDIA and AMD platforms, along with Compose files for single-command deployment.
+
+## Requirements
+
+### Requirement: NVIDIA CUDA Docker image
+
+The project SHALL provide a `Dockerfile.nvidia` that builds the engine on an NVIDIA CUDA runtime base image with GPU support for PyTorch and ONNX Runtime.
+
+#### Scenario: Build NVIDIA image
+- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml build`
+- **THEN** the build SHALL produce a working image with CUDA runtime, PyTorch with CUDA support, onnxruntime-gpu, and all engine dependencies
+
+#### Scenario: GPU access in NVIDIA container
+- **WHEN** the NVIDIA container starts with `--gpus all` or the NVIDIA runtime
+- **THEN** `torch.cuda.is_available()` SHALL return True and the engine SHALL load the embedding model on GPU
+
+---
+
+### Requirement: AMD ROCm Docker image
+
+The project SHALL provide a `Dockerfile.rocm` that builds the engine on an AMD ROCm base image with GPU support for PyTorch and ONNX Runtime.
+
+#### Scenario: Build ROCm image
+- **WHEN** an admin runs `docker compose -f compose.rocm.yaml build`
+- **THEN** the build SHALL produce a working image with ROCm runtime, PyTorch with ROCm support, onnxruntime-rocm, and all engine dependencies
+
+#### Scenario: GPU access in ROCm container
+- **WHEN** the ROCm container starts with `--device=/dev/kfd --device=/dev/dri`
+- **THEN** `torch.cuda.is_available()` SHALL return True (via HIP) and the engine SHALL load the embedding model on GPU
+
+---
+
+### Requirement: Application code is GPU-vendor-agnostic
+
+The Python engine code SHALL NOT reference CUDA or ROCm directly. GPU vendor abstraction SHALL be handled entirely at the Docker image level (base image selection and pip package choice). The same application code SHALL run on both NVIDIA and AMD images without modification.
+
+#### Scenario: Same engine code on both platforms
+- **WHEN** the engine starts on an NVIDIA image and an AMD image with identical configuration
+- **THEN** both SHALL load the model, accept requests, and return identical search results for the same query and data
+
+---
+
+### Requirement: Bind-mount data directory
+
+The engine SHALL store all persistent state (SQLite database, HF model cache, staging directory) under a single configurable data directory. This directory SHALL be mounted from the host via bind mount.
+
+#### Scenario: Data directory structure
+- **WHEN** the engine starts for the first time
+- **THEN** it SHALL create the following structure under the data directory:
+  - `kb.db` — SQLite database
+  - `hf_cache/` — HuggingFace model cache
+  - `staging/` — temporary files for queued ingestion jobs
+
+#### Scenario: Portable data across hosts
+- **WHEN** an admin copies the data directory from Host A to Host B and starts the engine with the same bind mount path
+- **THEN** the engine SHALL start successfully and serve all previously ingested documents without reprocessing
+
+#### Scenario: Portable data across GPU vendors
+- **WHEN** an admin moves the data directory from an NVIDIA host to an AMD host (same model name)
+- **THEN** the engine SHALL start successfully. Embeddings in the database remain valid (they are model-specific, not GPU-vendor-specific)
+
+---
+
+### Requirement: Compose files for deployment
+
+The project SHALL provide Docker Compose files for single-command deployment.
+
+#### Scenario: Start NVIDIA deployment
+- **WHEN** an admin runs `docker compose -f compose.nvidia.yaml up -d`
+- **THEN** the engine SHALL start with GPU access, bind-mount the data directory, and be reachable on the configured port
+
+#### Scenario: Start ROCm deployment
+- **WHEN** an admin runs `docker compose -f compose.rocm.yaml up -d`
+- **THEN** the engine SHALL start with GPU access via ROCm device passthrough, bind-mount the data directory, and be reachable on the configured port
+
+#### Scenario: Automatic restart
+- **WHEN** the engine process crashes or the host reboots
+- **THEN** Docker SHALL automatically restart the container (restart policy `unless-stopped`)
+
+#### Scenario: Configure via environment
+- **WHEN** an admin sets environment variables in the compose file (KB_MODEL, KB_API_KEY, KB_DEVICE, etc.)
+- **THEN** the engine SHALL use those values
+
+---
+
+### Requirement: CPU-only fallback
+
+The Dockerfiles SHALL produce images that work without GPU access. If no GPU is available, the engine SHALL fall back to CPU for all operations.
+
+#### Scenario: No GPU available
+- **WHEN** the container starts without GPU passthrough (no `--gpus`, no `/dev/kfd`)
+- **THEN** the engine SHALL detect no GPU, load the model on CPU, and log a warning that GPU acceleration is unavailable
+
+#### Scenario: Explicit CPU mode
+- **WHEN** `KB_DEVICE=cpu` and `KB_INGEST_DEVICE=cpu` are set in the environment
+- **THEN** the engine SHALL use CPU regardless of GPU availability
@@ -0,0 +1,205 @@
+# Engine API
+
+## Purpose
+
+The engine API provides an HTTP interface for knowledge base operations including search, document ingestion, document management, tag management, and system status.
+
+## Requirements
+
+### Requirement: Engine startup and model loading
+
+The engine SHALL load the embedding model eagerly at startup before accepting HTTP requests. The engine SHALL expose a health endpoint that returns unhealthy until the model is fully loaded and the database is initialised.
+
+#### Scenario: Cold start with model download
+- **WHEN** the engine starts for the first time with no cached model
+- **THEN** it SHALL download the configured embedding model, load it into memory (GPU if available, CPU otherwise), enable WAL mode on the SQLite database, and begin accepting requests only after all initialisation completes
+
+#### Scenario: Health check during startup
+- **WHEN** a client sends `GET /api/v1/health` before the model is loaded
+- **THEN** the engine SHALL respond with HTTP 503 and `{"status": "starting"}`
+
+#### Scenario: Health check after startup
+- **WHEN** a client sends `GET /api/v1/health` after initialisation completes
+- **THEN** the engine SHALL respond with HTTP 200 and `{"status": "healthy"}`
+
+---
+
+### Requirement: Hybrid search
+
+The engine SHALL provide hybrid search combining BM25 full-text search (via FTS5) and vector similarity search (via sqlite-vec), merged using Reciprocal Rank Fusion. Search SHALL complete in under 100ms when the model is warm.
+
+#### Scenario: Hybrid search with results
+- **WHEN** a client sends `POST /api/v1/search` with body `{"query": "how to change oil", "top": 5}`
+- **THEN** the engine SHALL embed the query using the resident model, run both FTS5 and vector searches, merge results via RRF, and return a JSON response with matched chunks including scores, document metadata, and tags
+
+#### Scenario: Search with filters
+- **WHEN** a client sends `POST /api/v1/search` with body `{"query": "brakes", "tags": ["maintenance"], "doc_type": "pdf", "top": 3}`
+- **THEN** the engine SHALL apply tag and document type filters to both FTS5 and vector results before merging
+
+#### Scenario: Search with mode override
+- **WHEN** a client sends `POST /api/v1/search` with body `{"query": "error log", "fts_only": true}`
+- **THEN** the engine SHALL return only FTS5 results without running vector search
+
+#### Scenario: Empty knowledge base
+- **WHEN** a client searches against an empty database
+- **THEN** the engine SHALL return HTTP 200 with `{"query": "...", "results": [], "total_matches": 0}`
+
+---
+
+### Requirement: Async ingestion via job queue
+
+The engine SHALL accept file uploads and text notes for ingestion asynchronously. Uploaded content SHALL be written to a staging area and a job record created in the database. The engine SHALL return HTTP 202 immediately. A background worker SHALL process queued jobs sequentially.
+
+#### Scenario: Upload a PDF file
+- **WHEN** a client sends `POST /api/v1/jobs` with a multipart form containing a PDF file and optional fields (tags, doc_type)
+- **THEN** the engine SHALL write the file to the staging directory, create a job record with status `queued`, and return HTTP 202 with `{"job_id": "<id>", "status": "queued", "filename": "report.pdf"}`
+
+#### Scenario: Upload a text note
+- **WHEN** a client sends `POST /api/v1/jobs` with a multipart form containing a `note` text field and optional `title` field
+- **THEN** the engine SHALL write the note content to a staging file, create a job record with status `queued`, and return HTTP 202 with the job ID
+
+#### Scenario: Upload multiple files in sequence
+- **WHEN** a client sends multiple `POST /api/v1/jobs` requests in quick succession
+- **THEN** the engine SHALL queue each job independently and the background worker SHALL process them in FIFO order
+
+#### Scenario: Duplicate content detection
+- **WHEN** a client uploads a file whose content hash matches an already-ingested document
+- **THEN** the engine SHALL return HTTP 202 but the background worker SHALL mark the job as `skipped` with reason `duplicate`
+
+#### Scenario: Upload failure due to unsupported file type
+- **WHEN** a client uploads a file with an unsupported extension
+- **THEN** the engine SHALL return HTTP 422 with an error message listing supported types
+
+---
+
+### Requirement: Job status tracking
+
+The engine SHALL maintain job records in SQLite with status tracking. Jobs SHALL transition through states: `queued` → `processing` → `done` | `failed` | `skipped`.
+
+#### Scenario: List all jobs
+- **WHEN** a client sends `GET /api/v1/jobs`
+- **THEN** the engine SHALL return a JSON array of job records ordered by creation time (newest first), each including job_id, filename, status, created_at, and completed_at
+
+#### Scenario: Filter jobs by status
+- **WHEN** a client sends `GET /api/v1/jobs?status=failed`
+- **THEN** the engine SHALL return only jobs with the specified status
+
+#### Scenario: Get job details
+- **WHEN** a client sends `GET /api/v1/jobs/{id}`
+- **THEN** the engine SHALL return the full job record including status, filename, error message (if failed), document_id (if done), chunk count, and timing information
+
+#### Scenario: Job not found
+- **WHEN** a client sends `GET /api/v1/jobs/{id}` with a non-existent ID
+- **THEN** the engine SHALL return HTTP 404
+
+---
+
+### Requirement: Background ingestion worker
+
+The engine SHALL run a background worker that processes queued jobs. The worker SHALL process one job at a time. For each job, it SHALL: detect document type, run the appropriate chunking pipeline (Docling for PDFs, header-based for Markdown, AST-based for code, whole-text for notes), generate embeddings using the resident model, and insert chunks and vectors into the database.
+
+#### Scenario: Successful PDF ingestion
+- **WHEN** the background worker picks up a queued PDF job
+- **THEN** it SHALL update the job status to `processing`, run Docling conversion and chunking, embed all chunks, insert document and chunks into the database, update the job status to `done` with the resulting document_id and chunk count, and delete the staged file
+
+#### Scenario: Ingestion failure
+- **WHEN** the background worker encounters an error during processing (e.g., corrupt PDF)
+- **THEN** it SHALL update the job status to `failed` with the error message, delete the staged file, and continue processing the next queued job
+
+#### Scenario: Search during active ingestion
+- **WHEN** a search request arrives while the background worker is processing a job
+- **THEN** the search SHALL execute without blocking (SQLite WAL mode) and return results from already-ingested documents
+
+---
+
+### Requirement: Document management
+
+The engine SHALL provide endpoints to list, inspect, and remove ingested documents.
+
+#### Scenario: List documents
+- **WHEN** a client sends `GET /api/v1/documents`
+- **THEN** the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, and created_at
+
+#### Scenario: List documents with filters
+- **WHEN** a client sends `GET /api/v1/documents?type=pdf&tags=manual`
+- **THEN** the engine SHALL return only documents matching all specified filters
+
+#### Scenario: Get document details
+- **WHEN** a client sends `GET /api/v1/documents/{id}`
+- **THEN** the engine SHALL return the full document record including all chunks and their text content
+
+#### Scenario: Remove a document
+- **WHEN** a client sends `DELETE /api/v1/documents/{id}`
+- **THEN** the engine SHALL delete the document, all its chunks, associated embeddings, and tag associations, and return HTTP 200 with a confirmation
+
+#### Scenario: Remove non-existent document
+- **WHEN** a client sends `DELETE /api/v1/documents/{id}` with a non-existent ID
+- **THEN** the engine SHALL return HTTP 404
+
+---
+
+### Requirement: Tag management
+
+The engine SHALL provide endpoints to list all tags and manage tags on documents.
+
+#### Scenario: List all tags
+- **WHEN** a client sends `GET /api/v1/tags`
+- **THEN** the engine SHALL return a JSON array of tags with name and document count
+
+#### Scenario: Add tags to a document
+- **WHEN** a client sends `PUT /api/v1/documents/{id}/tags` with body `{"add": ["manual", "v2"]}`
+- **THEN** the engine SHALL add the specified tags to the document and return the updated tag list
+
+#### Scenario: Remove tags from a document
+- **WHEN** a client sends `PUT /api/v1/documents/{id}/tags` with body `{"remove": ["draft"]}`
+- **THEN** the engine SHALL remove the specified tags from the document and return the updated tag list
+
+---
+
+### Requirement: Engine status and reindex
+
+The engine SHALL provide status information and support re-embedding all chunks.
+
+#### Scenario: Get engine status
+- **WHEN** a client sends `GET /api/v1/status`
+- **THEN** the engine SHALL return JSON with model_name, embedding_dim, GPU device info, database stats (document count by type, total chunks, DB size), and queue stats (queued/processing job count)
+
+#### Scenario: Trigger reindex
+- **WHEN** a client sends `POST /api/v1/reindex`
+- **THEN** the engine SHALL re-embed all existing chunks using the currently loaded model and return progress information. This operation SHALL NOT block search queries.
+
+---
+
+### Requirement: API authentication
+
+The engine SHALL support optional API key authentication via Bearer token. When `KB_API_KEY` is set, all requests MUST include a matching `Authorization: Bearer <key>` header. When `KB_API_KEY` is not set, authentication SHALL be disabled.
+
+#### Scenario: Valid API key
+- **WHEN** `KB_API_KEY` is set and a request includes a matching Bearer token
+- **THEN** the engine SHALL process the request normally
+
+#### Scenario: Missing API key when required
+- **WHEN** `KB_API_KEY` is set and a request has no Authorization header
+- **THEN** the engine SHALL return HTTP 401 `{"error": "authentication required"}`
+
+#### Scenario: Invalid API key
+- **WHEN** `KB_API_KEY` is set and a request includes a non-matching Bearer token
+- **THEN** the engine SHALL return HTTP 401 `{"error": "invalid api key"}`
+
+#### Scenario: Auth disabled
+- **WHEN** `KB_API_KEY` is not set
+- **THEN** the engine SHALL process all requests without requiring authentication
+
+---
+
+### Requirement: Engine configuration via environment variables
+
+The engine SHALL be configured via environment variables. No config file is read by the engine — all configuration comes from the environment (set via compose.yaml or Docker run).
+
+#### Scenario: Default configuration
+- **WHEN** the engine starts with no environment variables set
+- **THEN** it SHALL use defaults: data directory `/data`, model `all-MiniLM-L6-v2`, device `auto`, no API key required
+
+#### Scenario: Custom model
+- **WHEN** `KB_MODEL` is set to `BAAI/bge-small-en-v1.5`
+- **THEN** the engine SHALL download and load that model instead of the default
@@ -0,0 +1,183 @@
+# Go Client
+
+## Purpose
+
+The Go client (`kb`) provides a command-line interface for interacting with the knowledge base engine, supporting search, document ingestion, job tracking, document management, tag management, and status display.
+
+## Requirements
+
+### Requirement: Single static binary with zero runtime dependencies
+
+The Go client SHALL compile to a single static binary with no runtime dependencies. It SHALL support cross-compilation for Linux (amd64, arm64), macOS (amd64, arm64), and Windows (amd64).
+
+#### Scenario: Install on a clean machine
+- **WHEN** a user downloads the `kb` binary for their platform
+- **THEN** they SHALL be able to run it immediately with no additional installs (no Python, no Docker, no shared libraries)
+
+---
+
+### Requirement: Client configuration
+
+The client SHALL read configuration from `~/.kb/client.yaml`. Configuration values SHALL be overridable via environment variables and CLI flags. Precedence: CLI flags > environment variables > config file > defaults.
+
+#### Scenario: Default configuration
+- **WHEN** no config file exists and no env vars or flags are set
+- **THEN** the client SHALL use defaults: engine URL `http://localhost:8000`, no API key, format `human`
+
+#### Scenario: Config file
+- **WHEN** `~/.kb/client.yaml` contains `engine_url: https://kb.example.com`
+- **THEN** the client SHALL use that URL for all API requests
+
+#### Scenario: Environment variable override
+- **WHEN** `KB_ENGINE_URL` is set
+- **THEN** it SHALL override the config file value
+
+#### Scenario: CLI flag override
+- **WHEN** the user passes `--engine https://other.host:8000`
+- **THEN** it SHALL override both the config file and environment variable
+
+#### Scenario: Engine unreachable
+- **WHEN** the client cannot connect to the engine URL
+- **THEN** it SHALL print a clear error message (e.g., "Cannot reach engine at http://localhost:8000 — is it running?") and exit with a non-zero code
+
+---
+
+### Requirement: Search command
+
+The client SHALL provide a `kb search <query>` command that sends the query to the engine and displays results.
+
+#### Scenario: Human-readable search output
+- **WHEN** the user runs `kb search "how to change oil"`
+- **THEN** the client SHALL POST to `/api/v1/search`, and display results in a human-readable format showing rank, score, document title, page/section, doc type, tags, and a text snippet
+
+#### Scenario: JSON search output
+- **WHEN** the user runs `kb search "query" --format json`
+- **THEN** the client SHALL output the raw JSON response from the engine
+
+#### Scenario: Search with filters
+- **WHEN** the user runs `kb search "brakes" --tags maintenance --type pdf --top 3`
+- **THEN** the client SHALL include the filters in the API request body
+
+#### Scenario: Search mode flags
+- **WHEN** the user runs `kb search "error" --fts-only`
+- **THEN** the client SHALL set `fts_only: true` in the request body
+
+---
+
+### Requirement: Add command (file and note ingestion)
+
+The client SHALL provide a `kb add` command that uploads files or notes to the engine for async ingestion. The client SHALL exit immediately after a successful upload.
+
+#### Scenario: Add a single file
+- **WHEN** the user runs `kb add report.pdf`
+- **THEN** the client SHALL upload the file via `POST /api/v1/jobs` (multipart), print "Queued: report.pdf", and exit
+
+#### Scenario: Add a file with tags
+- **WHEN** the user runs `kb add manual.pdf --tags car,maintenance`
+- **THEN** the client SHALL include the tags in the multipart upload metadata
+
+#### Scenario: Add a directory recursively
+- **WHEN** the user runs `kb add ~/documents/ --recursive`
+- **THEN** the client SHALL discover all supported files in the directory tree, upload each one sequentially, and print "Queued: N files"
+
+#### Scenario: Add a text note
+- **WHEN** the user runs `kb add --note "The server room is in building 3, floor 2"`
+- **THEN** the client SHALL submit the note text via `POST /api/v1/jobs` (multipart with note field), print "Queued: note", and exit
+
+#### Scenario: Add with JSON output
+- **WHEN** the user runs `kb add report.pdf --format json`
+- **THEN** the client SHALL output the JSON response from the engine including the job_id
+
+#### Scenario: File not found
+- **WHEN** the user runs `kb add nonexistent.pdf`
+- **THEN** the client SHALL print an error and exit with a non-zero code without making any API call
+
+#### Scenario: Upload failure
+- **WHEN** the upload fails (network error, engine returns 4xx/5xx)
+- **THEN** the client SHALL print the error and exit with a non-zero code
+
+---
+
+### Requirement: Jobs command
+
+The client SHALL provide a `kb jobs` command to view the ingestion queue.
+
+#### Scenario: List all jobs
+- **WHEN** the user runs `kb jobs`
+- **THEN** the client SHALL fetch `GET /api/v1/jobs` and display a table of recent jobs showing ID, filename, status, and timestamp
+
+#### Scenario: Filter jobs by status
+- **WHEN** the user runs `kb jobs --status failed`
+- **THEN** the client SHALL pass the status filter and display only matching jobs
+
+#### Scenario: Job details
+- **WHEN** the user runs `kb jobs <id>`
+- **THEN** the client SHALL fetch `GET /api/v1/jobs/{id}` and display full job details including error message (if failed), document_id (if done), and chunk count
+
+---
+
+### Requirement: Document management commands
+
+The client SHALL provide commands to list, inspect, and remove documents.
+
+#### Scenario: List documents
+- **WHEN** the user runs `kb list`
+- **THEN** the client SHALL fetch `GET /api/v1/documents` and display a table of documents with ID, title, type, tags, chunk count, and date
+
+#### Scenario: List with filters
+- **WHEN** the user runs `kb list --type pdf --tags manual`
+- **THEN** the client SHALL pass filters as query parameters
+
+#### Scenario: Document info
+- **WHEN** the user runs `kb info <id>`
+- **THEN** the client SHALL fetch `GET /api/v1/documents/{id}` and display full document details
+
+#### Scenario: Remove a document
+- **WHEN** the user runs `kb remove <id>`
+- **THEN** the client SHALL prompt for confirmation, then send `DELETE /api/v1/documents/{id}` and display the result
+
+#### Scenario: Remove with skip confirmation
+- **WHEN** the user runs `kb remove <id> --yes`
+- **THEN** the client SHALL skip the confirmation prompt
+
+---
+
+### Requirement: Tag management commands
+
+The client SHALL provide commands to list and manage tags.
+
+#### Scenario: List tags
+- **WHEN** the user runs `kb tags`
+- **THEN** the client SHALL fetch `GET /api/v1/tags` and display tags with document counts
+
+#### Scenario: Add tags to a document
+- **WHEN** the user runs `kb tag <id> --add manual,v2`
+- **THEN** the client SHALL send `PUT /api/v1/documents/{id}/tags` with the add payload
+
+#### Scenario: Remove tags from a document
+- **WHEN** the user runs `kb tag <id> --remove draft`
+- **THEN** the client SHALL send `PUT /api/v1/documents/{id}/tags` with the remove payload
+
+---
+
+### Requirement: Status command
+
+The client SHALL provide a `kb status` command to display engine status.
+
+#### Scenario: Display engine status
+- **WHEN** the user runs `kb status`
+- **THEN** the client SHALL fetch `GET /api/v1/status` and display model name, embedding dimensions, GPU info, document counts by type, total chunks, database size, and queue status
+
+---
+
+### Requirement: Global output format flag
+
+All commands SHALL support a `--format` flag accepting `human` (default) or `json`. The default MAY be changed via the `default_format` config value.
+
+#### Scenario: JSON output on any command
+- **WHEN** the user passes `--format json` to any command
+- **THEN** the client SHALL output the raw JSON response from the engine without human formatting
+
+#### Scenario: Human output (default)
+- **WHEN** the user runs any command without `--format`
+- **THEN** the client SHALL format the response in a human-readable table or structured text output