16 KiB
Engine API
Purpose
The engine API provides an HTTP interface for knowledge base operations including search, document ingestion, document management, tag management, and system status.
Requirements
Requirement: Engine startup and model loading
The engine SHALL load the embedding model eagerly at startup before accepting HTTP requests. The engine SHALL expose a health endpoint that returns unhealthy until the model is fully loaded and the database is initialised.
Scenario: Cold start with model download
- WHEN the engine starts for the first time with no cached model
- THEN it SHALL download the configured embedding model, load it into memory (GPU if available, CPU otherwise), enable WAL mode on the SQLite database, and begin accepting requests only after all initialisation completes
Scenario: Health check during startup
- WHEN a client sends
GET /api/v1/healthbefore the model is loaded - THEN the engine SHALL respond with HTTP 503 and
{"status": "starting"}
Scenario: Health check after startup
- WHEN a client sends
GET /api/v1/healthafter initialisation completes - THEN the engine SHALL respond with HTTP 200 and
{"status": "healthy"}
Requirement: Hybrid search
The engine SHALL provide hybrid search combining BM25 full-text search (via FTS5) and vector similarity search (via sqlite-vec), merged using Reciprocal Rank Fusion. Search SHALL complete in under 100ms when the model is warm. The engine SHALL sanitize user query strings to prevent FTS5 syntax errors for any input.
Scenario: Hybrid search with results
- WHEN a client sends
POST /api/v1/searchwith body{"query": "how to change oil", "top": 5} - THEN the engine SHALL embed the query using the resident model, run both FTS5 and vector searches, merge results via RRF, and return a JSON response with matched chunks including scores, document metadata, and tags
Scenario: Search with filters
- WHEN a client sends
POST /api/v1/searchwith body{"query": "brakes", "tags": ["maintenance"], "doc_type": "pdf", "top": 3} - THEN the engine SHALL apply tag and document type filters to both FTS5 and vector results before merging
Scenario: Search with mode override
- WHEN a client sends
POST /api/v1/searchwith body{"query": "error log", "fts_only": true} - THEN the engine SHALL return only FTS5 results without running vector search
Scenario: Empty knowledge base
- WHEN a client searches against an empty database
- THEN the engine SHALL return HTTP 200 with
{"query": "...", "results": [], "total_matches": 0}
Scenario: Search with special characters
- WHEN a client sends
POST /api/v1/searchwith body{"query": "what color is grass?"} - THEN the engine SHALL sanitize the query for FTS5, execute the search successfully, and return results (not a 500 error)
Scenario: Search with FTS5 operators in query
- WHEN a client sends
POST /api/v1/searchwith body{"query": "NOT something OR (other)"} - THEN the engine SHALL treat the input as literal search terms, not FTS5 operators, and return matching results
Scenario: Search with only special characters
- WHEN a client sends
POST /api/v1/searchwith body{"query": "??!@#"} - THEN the engine SHALL return HTTP 200 with an empty result set (not a 500 error)
Scenario: Search with quotes in query
- WHEN a client sends
POST /api/v1/searchwith body{"query": "the \"quick\" fox"} - THEN the engine SHALL sanitize embedded quotes and return results normally
Requirement: Async ingestion via job queue
The engine SHALL accept file uploads and text notes for ingestion asynchronously. Uploaded content SHALL be written to a staging area and a job record created in the database. The engine SHALL return HTTP 202 immediately. A background worker SHALL process queued jobs sequentially. Before staging, the engine SHALL compute a SHA256 hash of the uploaded content and reject duplicates immediately.
Scenario: Upload a PDF file
- WHEN a client sends
POST /api/v1/jobswith a multipart form containing a PDF file and optional fields (tags, doc_type) - THEN the engine SHALL compute the SHA256 hash of the file bytes, verify no existing document has the same hash, write the file to the staging directory, create a job record with status
queued, and return HTTP 202 with{"job_id": "<id>", "status": "queued", "filename": "report.pdf"}
Scenario: Upload a text note
- WHEN a client sends
POST /api/v1/jobswith a multipart form containing anotetext field and optionaltitlefield - THEN the engine SHALL compute the SHA256 hash of the note text (UTF-8 encoded), verify no existing document has the same hash, write the note content to a staging file, create a job record with status
queued, and return HTTP 202 with the job ID
Scenario: Upload multiple files in sequence
- WHEN a client sends multiple
POST /api/v1/jobsrequests in quick succession - THEN the engine SHALL queue each job independently and the background worker SHALL process them in FIFO order
Scenario: Duplicate file detected at upload time (already ingested)
- WHEN a client uploads a file whose SHA256 content hash matches an already-ingested document
- THEN the engine SHALL NOT stage the file or create a job record, and SHALL return HTTP 409 with
{"error": "duplicate", "document_id": <id>, "title": "<title>"}
Scenario: Duplicate file detected at upload time (in-flight job)
- WHEN a client uploads a file whose SHA256 content hash matches a queued or processing job
- THEN the engine SHALL NOT stage the file or create a job record, and SHALL return HTTP 409 with
{"error": "duplicate", "job_id": <id>, "title": "<filename>"}
Scenario: Duplicate note detected at upload time (already ingested)
- WHEN a client submits a note whose SHA256 content hash matches an already-ingested document
- THEN the engine SHALL NOT stage the note or create a job record, and SHALL return HTTP 409 with
{"error": "duplicate", "document_id": <id>, "title": "<title>"}
Scenario: Duplicate note detected at upload time (in-flight job)
- WHEN a client submits a note whose SHA256 content hash matches a queued or processing job
- THEN the engine SHALL NOT stage the note or create a job record, and SHALL return HTTP 409 with
{"error": "duplicate", "job_id": <id>, "title": "<filename>"}
Scenario: Duplicate uploaded during concurrent request handling
- WHEN two identical files are uploaded in the same instant, both passing the API hash check before either job is committed
- THEN both jobs SHALL be queued, and the background worker SHALL process the first normally and mark the second as
skipped(worker-side safety net viahash_exists()and UNIQUE constraint)
Scenario: Upload failure due to unsupported file type
- WHEN a client uploads a file with an unsupported extension
- THEN the engine SHALL return HTTP 422 with an error message listing supported types
Requirement: Job status tracking
The engine SHALL maintain job records in SQLite with status tracking. Jobs SHALL transition through states: queued → processing → done | failed | skipped.
Scenario: List all jobs
- WHEN a client sends
GET /api/v1/jobs - THEN the engine SHALL return a JSON array of job records ordered by creation time (newest first), each including job_id, filename, status, created_at, and completed_at
Scenario: Filter jobs by status
- WHEN a client sends
GET /api/v1/jobs?status=failed - THEN the engine SHALL return only jobs with the specified status
Scenario: Get job details
- WHEN a client sends
GET /api/v1/jobs/{id} - THEN the engine SHALL return the full job record including status, filename, error message (if failed), document_id (if done), chunk count, and timing information
Scenario: Job not found
- WHEN a client sends
GET /api/v1/jobs/{id}with a non-existent ID - THEN the engine SHALL return HTTP 404
Requirement: Background ingestion worker
The engine SHALL run a background worker that processes queued jobs. The worker SHALL process one job at a time. For each job, it SHALL: detect document type, run the appropriate chunking pipeline (Docling for PDFs, header-based for Markdown, AST-based for code, whole-text for notes), build enriched text by prepending the document title (and section header when present) to each chunk's text, generate embeddings using the enriched text and the resident model, insert chunks (with both raw text and enriched text) and vectors into the database, and move the original file to persistent storage.
Scenario: Successful PDF ingestion
- WHEN the background worker picks up a queued PDF job
- THEN it SHALL update the job status to
processing, run Docling conversion and chunking, build enriched text for each chunk by prepending the document title, embed all chunks using enriched text, insert document and chunks into the database, move the staged file to{data_dir}/documents/{content_hash}.pdf, updatedocuments.stored_pathwith the permanent path, store the original filename indocuments.original_filename, update the job status todonewith the resulting document_id and chunk count, and clean up the staging entry
Scenario: Ingestion failure
- WHEN the background worker encounters an error during processing (e.g., corrupt PDF)
- THEN it SHALL update the job status to
failedwith the error message, delete the staged file, and continue processing the next queued job
Scenario: Search during active ingestion
- WHEN a search request arrives while the background worker is processing a job
- THEN the search SHALL execute without blocking (SQLite WAL mode) and return results from already-ingested documents
Requirement: Document management
The engine SHALL provide endpoints to list, inspect, remove, and download original files for ingested documents.
Scenario: List documents
- WHEN a client sends
GET /api/v1/documents - THEN the engine SHALL return a JSON array of documents with id, title, doc_type, tags, chunk_count, created_at, and updated_at
Scenario: List documents with filters
- WHEN a client sends
GET /api/v1/documents?type=pdf&tags=manual - THEN the engine SHALL return only documents matching all specified filters
Scenario: List documents sorted by most recent
- WHEN a client requests documents sorted by date
- THEN the engine SHALL use
COALESCE(updated_at, created_at)for ordering, so un-mutated documents sort by creation time and mutated documents sort by their last update
Scenario: Get document details
- WHEN a client sends
GET /api/v1/documents/{id} - THEN the engine SHALL return the full document record including all chunks, their text content,
updated_at, and whether the original file is available (has_file: true/false)
Scenario: Download original file
- WHEN a client sends
GET /api/v1/documents/{id}/file - THEN the engine SHALL return the original file with appropriate Content-Type and
Content-Disposition: attachment; filename="{original_filename}"headers, or HTTP 404 if the file is not available
Scenario: Remove a document
- WHEN a client sends
DELETE /api/v1/documents/{id} - THEN the engine SHALL delete the document, all its chunks, associated embeddings, tag associations, and the stored original file from disk, and return HTTP 200 with a confirmation
Scenario: Remove non-existent document
- WHEN a client sends
DELETE /api/v1/documents/{id}with a non-existent ID - THEN the engine SHALL return HTTP 404
Requirement: Note mutation endpoint
The engine SHALL provide a PATCH /api/v1/notes/{id} endpoint for updating existing notes in place. See the note-mutation spec for full details.
Scenario: Note update endpoint exists
- WHEN a client sends
PATCH /api/v1/notes/42with body{"text": "new content"} - THEN the engine SHALL process the update synchronously and return the updated document
Requirement: Document updated_at tracking
The engine SHALL track when documents are modified via an updated_at column. This column SHALL be NULL for documents that have never been updated.
Scenario: New document has no updated_at
- WHEN a document is first ingested
- THEN
updated_atSHALL be NULL andcreated_atSHALL be set to the ingestion timestamp
Scenario: Note update sets updated_at
- WHEN a note is updated via
PATCH /api/v1/notes/{id} - THEN
updated_atSHALL be set to the current timestamp
Scenario: Tag change sets updated_at
- WHEN tags are modified via
PUT /api/v1/documents/{id}/tags - THEN
updated_atSHALL be set to the current timestamp
Scenario: Schema migration for updated_at
- WHEN the engine starts against a v2 database without an
updated_atcolumn - THEN the engine SHALL automatically add
ALTER TABLE documents ADD COLUMN updated_at TEXTand all existing documents SHALL haveupdated_at = NULL
Requirement: Tag management
The engine SHALL provide endpoints to list all tags and manage tags on documents.
Scenario: List all tags
- WHEN a client sends
GET /api/v1/tags - THEN the engine SHALL return a JSON array of tags with name and document count
Scenario: Add tags to a document
- WHEN a client sends
PUT /api/v1/documents/{id}/tagswith body{"add": ["manual", "v2"]} - THEN the engine SHALL add the specified tags to the document and return the updated tag list
Scenario: Remove tags from a document
- WHEN a client sends
PUT /api/v1/documents/{id}/tagswith body{"remove": ["draft"]} - THEN the engine SHALL remove the specified tags from the document and return the updated tag list
Requirement: Engine status and reindex
The engine SHALL provide status information and support re-embedding all chunks. The version field in the status response SHALL always be present and SHALL reflect the engine's release version as read from the VERSION file. This field is the contract used by clients for compatibility checking.
Scenario: Get engine status
- WHEN a client sends
GET /api/v1/status - THEN the engine SHALL return JSON with
version(string, from VERSION file), model_name, embedding_dim, GPU device info, database stats (document count by type, total chunks, DB size), and queue stats (queued/processing job count)
Scenario: Trigger reindex
- WHEN a client sends
POST /api/v1/reindex - THEN the engine SHALL re-embed all existing chunks using the
enriched_textcolumn and the currently loaded model, and return progress information. This operation SHALL NOT block search queries.
Requirement: API authentication
The engine SHALL support optional API key authentication via Bearer token. When KB_API_KEY is set, all requests MUST include a matching Authorization: Bearer <key> header. When KB_API_KEY is not set, authentication SHALL be disabled.
Scenario: Valid API key
- WHEN
KB_API_KEYis set and a request includes a matching Bearer token - THEN the engine SHALL process the request normally
Scenario: Missing API key when required
- WHEN
KB_API_KEYis set and a request has no Authorization header - THEN the engine SHALL return HTTP 401
{"error": "authentication required"}
Scenario: Invalid API key
- WHEN
KB_API_KEYis set and a request includes a non-matching Bearer token - THEN the engine SHALL return HTTP 401
{"error": "invalid api key"}
Scenario: Auth disabled
- WHEN
KB_API_KEYis not set - THEN the engine SHALL process all requests without requiring authentication
Requirement: Engine configuration via environment variables
The engine SHALL be configured via environment variables. No config file is read by the engine — all configuration comes from the environment (set via compose.yaml or Docker run).
Scenario: Default configuration
- WHEN the engine starts with no environment variables set
- THEN it SHALL use defaults: data directory
/data, modelall-MiniLM-L6-v2, deviceauto, no API key required. It SHALL createstaging/anddocuments/subdirectories under the data directory.
Scenario: Custom model
- WHEN
KB_MODELis set toBAAI/bge-small-en-v1.5 - THEN the engine SHALL download and load that model instead of the default