## MODIFIED Requirements ### Requirement: Async ingestion via job queue The engine SHALL accept file uploads and text notes for ingestion asynchronously. Uploaded content SHALL be written to a staging area and a job record created in the database. The engine SHALL return HTTP 202 immediately. A background worker SHALL process queued jobs sequentially. Before staging, the engine SHALL compute a SHA256 hash of the uploaded content and reject duplicates immediately. #### Scenario: Upload a PDF file - **WHEN** a client sends `POST /api/v1/jobs` with a multipart form containing a PDF file and optional fields (tags, doc_type) - **THEN** the engine SHALL compute the SHA256 hash of the file bytes, verify no existing document has the same hash, write the file to the staging directory, create a job record with status `queued`, and return HTTP 202 with `{"job_id": "", "status": "queued", "filename": "report.pdf"}` #### Scenario: Upload a text note - **WHEN** a client sends `POST /api/v1/jobs` with a multipart form containing a `note` text field and optional `title` field - **THEN** the engine SHALL compute the SHA256 hash of the note text (UTF-8 encoded), verify no existing document has the same hash, write the note content to a staging file, create a job record with status `queued`, and return HTTP 202 with the job ID #### Scenario: Upload multiple files in sequence - **WHEN** a client sends multiple `POST /api/v1/jobs` requests in quick succession - **THEN** the engine SHALL queue each job independently and the background worker SHALL process them in FIFO order #### Scenario: Duplicate file detected at upload time (already ingested) - **WHEN** a client uploads a file whose SHA256 content hash matches an already-ingested document - **THEN** the engine SHALL NOT stage the file or create a job record, and SHALL return HTTP 409 with `{"error": "duplicate", "document_id": , "title": ""}` #### Scenario: Duplicate file detected at upload time (in-flight job) - **WHEN** a client uploads a file whose SHA256 content hash matches a queued or processing job - **THEN** the engine SHALL NOT stage the file or create a job record, and SHALL return HTTP 409 with `{"error": "duplicate", "job_id": <id>, "title": "<filename>"}` #### Scenario: Duplicate note detected at upload time (already ingested) - **WHEN** a client submits a note whose SHA256 content hash matches an already-ingested document - **THEN** the engine SHALL NOT stage the note or create a job record, and SHALL return HTTP 409 with `{"error": "duplicate", "document_id": <id>, "title": "<title>"}` #### Scenario: Duplicate note detected at upload time (in-flight job) - **WHEN** a client submits a note whose SHA256 content hash matches a queued or processing job - **THEN** the engine SHALL NOT stage the note or create a job record, and SHALL return HTTP 409 with `{"error": "duplicate", "job_id": <id>, "title": "<filename>"}` #### Scenario: Duplicate uploaded during concurrent request handling - **WHEN** two identical files are uploaded in the same instant, both passing the API hash check before either job is committed - **THEN** both jobs SHALL be queued, and the background worker SHALL process the first normally and mark the second as `skipped` (worker-side safety net via `hash_exists()` and UNIQUE constraint) #### Scenario: Upload failure due to unsupported file type - **WHEN** a client uploads a file with an unsupported extension - **THEN** the engine SHALL return HTTP 422 with an error message listing supported types