b04823e67b
Persist uploaded files to {data_dir}/documents/{content_hash}{ext} after
successful ingestion. Add GET /documents/{id}/file endpoint for retrieval,
delete stored files on document deletion, and add `kb export` client command.
Includes schema migration, tests, and spec updates.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.2 KiB
2.2 KiB
Why
The knowledge base currently discards original files after chunking and embedding. Once a document is ingested, only the extracted text chunks and vectors remain — the original PDF, markdown, or code file is deleted from staging. Users cannot retrieve the source document from the KB, which limits its usefulness as a document store and prevents use cases like re-processing with a different model or serving the original file to downstream tools.
What Changes
- Add a persistent document storage directory (
{data_dir}/documents/) alongside the SQLite database - After successful ingestion, copy the original file from staging to permanent storage instead of deleting it
- Store the permanent file path in the
documentstable (stored_pathcolumn) and the original upload filename (original_filenamecolumn) so downloads use the correct name - Add an API endpoint to download the original file by document ID
- Add a CLI command to export/retrieve the original document
- BREAKING: Delete document now also removes the stored file from disk
- Notes (text-only) are stored as
.notefiles in the same directory for consistency
Capabilities
New Capabilities
document-storage: Persistent storage of original uploaded files on disk, lifecycle management (store on ingest, delete on document removal), and retrieval via API
Modified Capabilities
engine-api: New endpointGET /api/v1/documents/{id}/fileto download the original file; delete endpoint must also clean up stored files; ingestion worker stores files instead of discarding them
Impact
- Engine config: New
documents_dirproperty on Config, new directory created at startup viaensure_dirs() - Worker: After successful chunking, move/copy file from staging to documents dir; update
source_path→stored_pathwith permanent location - Database schema: Add
stored_pathandoriginal_filenamecolumns todocumentstable (migration for existing DBs) - Routes: New file-download endpoint; update delete handler to remove stored file
- Go client: New
export/get-filesubcommand to download original documents - Docker:
documents/directory lives inside the existing/datavolume — no new mounts needed