## Why Duplicate document detection currently happens in the background worker — the upload endpoint always returns HTTP 202, and the user only discovers a duplicate later when the job status is `skipped`. This wastes staging I/O, creates noise in the job list, and gives poor user feedback. Moving the SHA256 content hash check to the upload endpoint allows immediate rejection with a clear error, preventing unnecessary work and giving the user instant feedback. ## What Changes - **Compute content hash at upload time**: The `POST /api/v1/jobs` endpoint will SHA256-hash the uploaded file bytes before staging and check against `documents.content_hash` - **Reject duplicates immediately**: Return HTTP 409 Conflict with the existing document ID when a duplicate is detected, instead of accepting and later skipping - **No job created for duplicates**: Duplicate uploads will not create a job record or stage a file - **Remove worker-side dedup**: The background worker's `hash_exists()` check becomes redundant for the normal flow but should be retained as a safety net (race condition guard) - **Update Go client**: Surface the 409 response with a clear message (e.g., "Already imported: (doc ID: <id>)") - **Note dedup**: Apply the same check to notes — hash the note text content ## Capabilities ### New Capabilities _(none — this modifies existing capabilities)_ ### Modified Capabilities - `engine-api`: The "Async ingestion via job queue" requirement changes — duplicate content is now rejected at upload time (HTTP 409) instead of accepted and later skipped by the worker. The "Duplicate content detection" scenario moves from background to synchronous. - `go-client`: The "Add command" requirement changes — the client must handle HTTP 409 responses and display the duplicate document info to the user. ## Impact - **Engine API** (`engine/kb/routes/jobs.py`): `submit_job()` gains hash computation and DB lookup before staging/job creation - **Engine database** (`engine/kb/database.py`): Need a query to return the existing document ID/title for a given hash (not just boolean exists check) - **Engine worker** (`engine/kb/worker.py`): Dedup check retained as safety net but no longer the primary guard - **Go client** (`client/cmd/add.go`): Handle 409 response, display duplicate info - **API contract**: New HTTP 409 response on `POST /api/v1/jobs` — this is additive, not breaking, since no consumer expects 409 today