Files
kb/openspec/changes/archive/2026-03-28-store-original-documents/tasks.md
T
steve b04823e67b Store original documents for download after ingestion
Persist uploaded files to {data_dir}/documents/{content_hash}{ext} after
successful ingestion. Add GET /documents/{id}/file endpoint for retrieval,
delete stored files on document deletion, and add `kb export` client command.
Includes schema migration, tests, and spec updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:16:27 +00:00

2.1 KiB

1. Config and Schema

  • 1.1 Add documents_dir property to Config in engine/kb/config.py returning {data_dir}/documents
  • 1.2 Add documents_dir.mkdir() to Config.ensure_dirs()
  • 1.3 Add stored_path TEXT and original_filename TEXT columns to documents table in init_schema() (both CREATE TABLE and ALTER TABLE migration for existing DBs)

2. Worker — File Persistence

  • 2.1 In worker._process_job(), after successful DB commit, move staged file to {documents_dir}/{content_hash}{ext} using shutil.move()
  • 2.2 Update documents.stored_path and documents.original_filename (from jobs.filename) after moving the file
  • 2.3 Remove staging.cleanup() call for successful jobs (file is moved, not deleted); keep cleanup on failure path

3. API — File Download Endpoint

  • 3.1 Add GET /api/v1/documents/{id}/file route in engine/kb/routes/documents.py using FastAPI FileResponse
  • 3.2 Return appropriate Content-Type from file extension and Content-Disposition: attachment; filename="{original_filename}" (fall back to {title}{ext} if NULL)
  • 3.3 Handle 404 cases: document not found, stored_path is NULL, file missing from disk

4. API — Delete Cleanup

  • 4.1 Update DELETE /api/v1/documents/{id} in engine/kb/routes/documents.py to also delete the stored file from disk
  • 4.2 Handle missing file gracefully (log warning, don't fail the request)

5. Document Details Enhancement

  • 5.1 Add has_file boolean to GET /api/v1/documents/{id} response based on stored_path presence and file existence on disk

6. Go Client

  • 6.1 Add kb export <doc_id> subcommand to the Go client that calls GET /api/v1/documents/{id}/file and writes to stdout or a specified output path

7. Testing

  • 7.1 Test successful ingestion stores file at expected path
  • 7.2 Test failed ingestion does not leave file in documents dir
  • 7.3 Test file download endpoint returns correct content and headers
  • 7.4 Test document deletion removes stored file
  • 7.5 Test download returns 404 for documents without stored files