Files
steve 9aab79d49b v2 restructure: Go client, Docker engine, release tooling
- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv)
- Add Go client with cross-platform build (client/)
- Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/)
- Add VERSION files for client and engine, wired into builds
- Add release.sh for automated build, tag, release, and Docker push
- Update README with build/release docs and ROCm migration note
- Clean up .gitignore for v2 project structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 21:52:25 +00:00

10 KiB

ADDED Requirements

Requirement: Engine startup and model loading

The engine SHALL load the embedding model eagerly at startup before accepting HTTP requests. The engine SHALL expose a health endpoint that returns unhealthy until the model is fully loaded and the database is initialised.

Scenario: Cold start with model download

  • WHEN the engine starts for the first time with no cached model
  • THEN it SHALL download the configured embedding model, load it into memory (GPU if available, CPU otherwise), enable WAL mode on the SQLite database, and begin accepting requests only after all initialisation completes

Scenario: Health check during startup

  • WHEN a client sends GET /api/v1/health before the model is loaded
  • THEN the engine SHALL respond with HTTP 503 and {"status": "starting"}

Scenario: Health check after startup

  • WHEN a client sends GET /api/v1/health after initialisation completes
  • THEN the engine SHALL respond with HTTP 200 and {"status": "healthy"}

The engine SHALL provide hybrid search combining BM25 full-text search (via FTS5) and vector similarity search (via sqlite-vec), merged using Reciprocal Rank Fusion. Search SHALL complete in under 100ms when the model is warm.

Scenario: Hybrid search with results

  • WHEN a client sends POST /api/v1/search with body {"query": "how to change oil", "top": 5}
  • THEN the engine SHALL embed the query using the resident model, run both FTS5 and vector searches, merge results via RRF, and return a JSON response with matched chunks including scores, document metadata, and tags

Scenario: Search with filters

  • WHEN a client sends POST /api/v1/search with body {"query": "brakes", "tags": ["maintenance"], "doc_type": "pdf", "top": 3}
  • THEN the engine SHALL apply tag and document type filters to both FTS5 and vector results before merging

Scenario: Search with mode override

  • WHEN a client sends POST /api/v1/search with body {"query": "error log", "fts_only": true}
  • THEN the engine SHALL return only FTS5 results without running vector search

Scenario: Empty knowledge base

  • WHEN a client searches against an empty database
  • THEN the engine SHALL return HTTP 200 with {"query": "...", "results": [], "total_matches": 0}

Requirement: Async ingestion via job queue

The engine SHALL accept file uploads and text notes for ingestion asynchronously. Uploaded content SHALL be written to a staging area and a job record created in the database. The engine SHALL return HTTP 202 immediately. A background worker SHALL process queued jobs sequentially.

Scenario: Upload a PDF file

  • WHEN a client sends POST /api/v1/jobs with a multipart form containing a PDF file and optional fields (tags, doc_type)
  • THEN the engine SHALL write the file to the staging directory, create a job record with status queued, and return HTTP 202 with {"job_id": "<id>", "status": "queued", "filename": "report.pdf"}

Scenario: Upload a text note

  • WHEN a client sends POST /api/v1/jobs with a multipart form containing a note text field and optional title field
  • THEN the engine SHALL write the note content to a staging file, create a job record with status queued, and return HTTP 202 with the job ID

Scenario: Upload multiple files in sequence

  • WHEN a client sends multiple POST /api/v1/jobs requests in quick succession
  • THEN the engine SHALL queue each job independently and the background worker SHALL process them in FIFO order

Scenario: Duplicate content detection

  • WHEN a client uploads a file whose content hash matches an already-ingested document
  • THEN the engine SHALL return HTTP 202 but the background worker SHALL mark the job as skipped with reason duplicate

Scenario: Upload failure due to unsupported file type

  • WHEN a client uploads a file with an unsupported extension
  • THEN the engine SHALL return HTTP 422 with an error message listing supported types

Requirement: Job status tracking

The engine SHALL maintain job records in SQLite with status tracking. Jobs SHALL transition through states: queuedprocessingdone | failed | skipped.

Scenario: List all jobs

  • WHEN a client sends GET /api/v1/jobs
  • THEN the engine SHALL return a JSON array of job records ordered by creation time (newest first), each including job_id, filename, status, created_at, and completed_at

Scenario: Filter jobs by status

  • WHEN a client sends GET /api/v1/jobs?status=failed
  • THEN the engine SHALL return only jobs with the specified status

Scenario: Get job details

  • WHEN a client sends GET /api/v1/jobs/{id}
  • THEN the engine SHALL return the full job record including status, filename, error message (if failed), document_id (if done), chunk count, and timing information

Scenario: Job not found

  • WHEN a client sends GET /api/v1/jobs/{id} with a non-existent ID
  • THEN the engine SHALL return HTTP 404

Requirement: Background ingestion worker

The engine SHALL run a background worker that processes queued jobs. The worker SHALL process one job at a time. For each job, it SHALL: detect document type, run the appropriate chunking pipeline (Docling for PDFs, header-based for Markdown, AST-based for code, whole-text for notes), generate embeddings using the resident model, and insert chunks and vectors into the database.

Scenario: Successful PDF ingestion

  • WHEN the background worker picks up a queued PDF job
  • THEN it SHALL update the job status to processing, run Docling conversion and chunking, embed all chunks, insert document and chunks into the database, update the job status to done with the resulting document_id and chunk count, and delete the staged file

Scenario: Ingestion failure

  • WHEN the background worker encounters an error during processing (e.g., corrupt PDF)
  • THEN it SHALL update the job status to failed with the error message, delete the staged file, and continue processing the next queued job

Scenario: Search during active ingestion

  • WHEN a search request arrives while the background worker is processing a job
  • THEN the search SHALL execute without blocking (SQLite WAL mode) and return results from already-ingested documents

Requirement: Document management

The engine SHALL provide endpoints to list, inspect, and remove ingested documents.

Scenario: List documents

  • WHEN a client sends GET /api/v1/documents
  • THEN the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, and created_at

Scenario: List documents with filters

  • WHEN a client sends GET /api/v1/documents?type=pdf&tags=manual
  • THEN the engine SHALL return only documents matching all specified filters

Scenario: Get document details

  • WHEN a client sends GET /api/v1/documents/{id}
  • THEN the engine SHALL return the full document record including all chunks and their text content

Scenario: Remove a document

  • WHEN a client sends DELETE /api/v1/documents/{id}
  • THEN the engine SHALL delete the document, all its chunks, associated embeddings, and tag associations, and return HTTP 200 with a confirmation

Scenario: Remove non-existent document

  • WHEN a client sends DELETE /api/v1/documents/{id} with a non-existent ID
  • THEN the engine SHALL return HTTP 404

Requirement: Tag management

The engine SHALL provide endpoints to list all tags and manage tags on documents.

Scenario: List all tags

  • WHEN a client sends GET /api/v1/tags
  • THEN the engine SHALL return a JSON array of tags with name and document count

Scenario: Add tags to a document

  • WHEN a client sends PUT /api/v1/documents/{id}/tags with body {"add": ["manual", "v2"]}
  • THEN the engine SHALL add the specified tags to the document and return the updated tag list

Scenario: Remove tags from a document

  • WHEN a client sends PUT /api/v1/documents/{id}/tags with body {"remove": ["draft"]}
  • THEN the engine SHALL remove the specified tags from the document and return the updated tag list

Requirement: Engine status and reindex

The engine SHALL provide status information and support re-embedding all chunks.

Scenario: Get engine status

  • WHEN a client sends GET /api/v1/status
  • THEN the engine SHALL return JSON with model_name, embedding_dim, GPU device info, database stats (document count by type, total chunks, DB size), and queue stats (queued/processing job count)

Scenario: Trigger reindex

  • WHEN a client sends POST /api/v1/reindex
  • THEN the engine SHALL re-embed all existing chunks using the currently loaded model and return progress information. This operation SHALL NOT block search queries.

Requirement: API authentication

The engine SHALL support optional API key authentication via Bearer token. When KB_API_KEY is set, all requests MUST include a matching Authorization: Bearer <key> header. When KB_API_KEY is not set, authentication SHALL be disabled.

Scenario: Valid API key

  • WHEN KB_API_KEY is set and a request includes a matching Bearer token
  • THEN the engine SHALL process the request normally

Scenario: Missing API key when required

  • WHEN KB_API_KEY is set and a request has no Authorization header
  • THEN the engine SHALL return HTTP 401 {"error": "authentication required"}

Scenario: Invalid API key

  • WHEN KB_API_KEY is set and a request includes a non-matching Bearer token
  • THEN the engine SHALL return HTTP 401 {"error": "invalid api key"}

Scenario: Auth disabled

  • WHEN KB_API_KEY is not set
  • THEN the engine SHALL process all requests without requiring authentication

Requirement: Engine configuration via environment variables

The engine SHALL be configured via environment variables. No config file is read by the engine — all configuration comes from the environment (set via compose.yaml or Docker run).

Scenario: Default configuration

  • WHEN the engine starts with no environment variables set
  • THEN it SHALL use defaults: data directory /data, model all-MiniLM-L6-v2, device auto, no API key required

Scenario: Custom model

  • WHEN KB_MODEL is set to BAAI/bge-small-en-v1.5
  • THEN the engine SHALL download and load that model instead of the default