6fec627503
- Reject duplicate uploads at the API boundary (HTTP 409) instead of silently skipping in the background worker. Checks both ingested documents and in-flight jobs via content_hash on the jobs table. - Go client handles 409 with distinct messages for already-imported documents vs already-queued jobs. - Sanitize FTS5 search queries by quoting each token to prevent syntax errors from special characters like ?, *, ", (), AND, OR, NOT. - Add try/except safety net around FTS5 execute for edge cases. - Add main branch guard to release.sh to prevent releasing from feature branches. - Update specs and README to reflect new behaviour. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
23 lines
1.3 KiB
Markdown
23 lines
1.3 KiB
Markdown
## 1. Database Layer
|
|
|
|
- [x] 1.1 Add `get_document_by_hash(conn, content_hash)` function to `engine/kb/database.py` that returns `(document_id, title)` or `None`
|
|
|
|
## 2. Upload Endpoint
|
|
|
|
- [x] 2.1 Update `submit_job()` in `engine/kb/routes/jobs.py` to compute SHA256 hash of uploaded file bytes before staging
|
|
- [x] 2.2 Add duplicate check: call `get_document_by_hash()` and return HTTP 409 with `{"error": "duplicate", "document_id": <id>, "title": "<title>"}` if match found
|
|
- [x] 2.3 Apply same hash check for note submissions (hash the UTF-8 encoded note text)
|
|
|
|
## 3. Go Client
|
|
|
|
- [x] 3.1 Update `uploadFile()` in `client/cmd/add.go` to handle HTTP 409 responses — parse the JSON body and print "Already imported: <title> (doc ID: <id>)"
|
|
- [x] 3.2 Update recursive directory upload to continue on 409, track duplicate count, and include in summary output
|
|
- [x] 3.3 Handle 409 in JSON output mode — pass through the raw engine response
|
|
|
|
## 4. Testing
|
|
|
|
- [x] 4.1 Test: upload a file, then upload the same file again — verify 409 with correct document_id and title
|
|
- [x] 4.2 Test: upload a note, then upload the same note text — verify 409
|
|
- [x] 4.3 Test: upload a file, then upload a different file — verify 202 as normal
|
|
- [x] 4.4 Test: verify the worker-side `hash_exists()` safety net still works (direct job insertion bypassing API)
|