Files
kb/openspec/changes/archive/2026-03-28-store-original-documents/tasks.md
T
steve b04823e67b Store original documents for download after ingestion
Persist uploaded files to {data_dir}/documents/{content_hash}{ext} after
successful ingestion. Add GET /documents/{id}/file endpoint for retrieval,
delete stored files on document deletion, and add `kb export` client command.
Includes schema migration, tests, and spec updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:16:27 +00:00

39 lines
2.1 KiB
Markdown

## 1. Config and Schema
- [x] 1.1 Add `documents_dir` property to `Config` in `engine/kb/config.py` returning `{data_dir}/documents`
- [x] 1.2 Add `documents_dir.mkdir()` to `Config.ensure_dirs()`
- [x] 1.3 Add `stored_path TEXT` and `original_filename TEXT` columns to `documents` table in `init_schema()` (both CREATE TABLE and ALTER TABLE migration for existing DBs)
## 2. Worker — File Persistence
- [x] 2.1 In `worker._process_job()`, after successful DB commit, move staged file to `{documents_dir}/{content_hash}{ext}` using `shutil.move()`
- [x] 2.2 Update `documents.stored_path` and `documents.original_filename` (from `jobs.filename`) after moving the file
- [x] 2.3 Remove `staging.cleanup()` call for successful jobs (file is moved, not deleted); keep cleanup on failure path
## 3. API — File Download Endpoint
- [x] 3.1 Add `GET /api/v1/documents/{id}/file` route in `engine/kb/routes/documents.py` using FastAPI `FileResponse`
- [x] 3.2 Return appropriate `Content-Type` from file extension and `Content-Disposition: attachment; filename="{original_filename}"` (fall back to `{title}{ext}` if NULL)
- [x] 3.3 Handle 404 cases: document not found, `stored_path` is NULL, file missing from disk
## 4. API — Delete Cleanup
- [x] 4.1 Update `DELETE /api/v1/documents/{id}` in `engine/kb/routes/documents.py` to also delete the stored file from disk
- [x] 4.2 Handle missing file gracefully (log warning, don't fail the request)
## 5. Document Details Enhancement
- [x] 5.1 Add `has_file` boolean to `GET /api/v1/documents/{id}` response based on `stored_path` presence and file existence on disk
## 6. Go Client
- [x] 6.1 Add `kb export <doc_id>` subcommand to the Go client that calls `GET /api/v1/documents/{id}/file` and writes to stdout or a specified output path
## 7. Testing
- [x] 7.1 Test successful ingestion stores file at expected path
- [x] 7.2 Test failed ingestion does not leave file in documents dir
- [x] 7.3 Test file download endpoint returns correct content and headers
- [x] 7.4 Test document deletion removes stored file
- [x] 7.5 Test download returns 404 for documents without stored files