Files
steve 6fec627503 Upload-time duplicate detection, FTS5 query sanitization, release guard
- Reject duplicate uploads at the API boundary (HTTP 409) instead of
  silently skipping in the background worker. Checks both ingested
  documents and in-flight jobs via content_hash on the jobs table.
- Go client handles 409 with distinct messages for already-imported
  documents vs already-queued jobs.
- Sanitize FTS5 search queries by quoting each token to prevent syntax
  errors from special characters like ?, *, ", (), AND, OR, NOT.
- Add try/except safety net around FTS5 execute for edge cases.
- Add main branch guard to release.sh to prevent releasing from
  feature branches.
- Update specs and README to reflect new behaviour.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 23:05:07 +00:00

50 lines
2.9 KiB
Markdown

## MODIFIED Requirements
### Requirement: Add command (file and note ingestion)
The client SHALL provide a `kb add` command that uploads files or notes to the engine for async ingestion. The client SHALL exit immediately after a successful upload. The client SHALL handle duplicate rejection (HTTP 409) and display the existing document information.
#### Scenario: Add a single file
- **WHEN** the user runs `kb add report.pdf`
- **THEN** the client SHALL upload the file via `POST /api/v1/jobs` (multipart), print "Queued: report.pdf", and exit
#### Scenario: Add a file with tags
- **WHEN** the user runs `kb add manual.pdf --tags car,maintenance`
- **THEN** the client SHALL include the tags in the multipart upload metadata
#### Scenario: Add a directory recursively
- **WHEN** the user runs `kb add ~/documents/ --recursive`
- **THEN** the client SHALL discover all supported files in the directory tree, upload each one sequentially, and print "Queued: N files"
#### Scenario: Add a text note
- **WHEN** the user runs `kb add --note "The server room is in building 3, floor 2"`
- **THEN** the client SHALL submit the note text via `POST /api/v1/jobs` (multipart with note field), print "Queued: note", and exit
#### Scenario: Duplicate file rejected (already ingested)
- **WHEN** the user runs `kb add report.pdf` and the engine returns HTTP 409 with `{"error": "duplicate", "document_id": 42, "title": "report.pdf"}`
- **THEN** the client SHALL print "Already imported: report.pdf (doc ID: 42)" and exit with code 0
#### Scenario: Duplicate file rejected (in-flight job)
- **WHEN** the user runs `kb add report.pdf` and the engine returns HTTP 409 with `{"error": "duplicate", "job_id": 7, "title": "report.pdf"}`
- **THEN** the client SHALL print "Already queued: report.pdf (job ID: 7)" and exit with code 0
#### Scenario: Duplicate file in recursive add
- **WHEN** the user runs `kb add ~/documents/ --recursive` and some files are rejected as duplicates
- **THEN** the client SHALL print the duplicate message for each rejected file (distinguishing "Already imported" from "Already queued"), continue uploading remaining files, and include a summary (e.g., "Queued: 5 files, 2 duplicates skipped")
#### Scenario: Duplicate with JSON output
- **WHEN** the user runs `kb add report.pdf --format json` and the engine returns HTTP 409
- **THEN** the client SHALL output the raw JSON response from the engine including the document_id and title
#### Scenario: Add with JSON output
- **WHEN** the user runs `kb add report.pdf --format json`
- **THEN** the client SHALL output the JSON response from the engine including the job_id
#### Scenario: File not found
- **WHEN** the user runs `kb add nonexistent.pdf`
- **THEN** the client SHALL print an error and exit with a non-zero code without making any API call
#### Scenario: Upload failure
- **WHEN** the upload fails (network error, engine returns 4xx/5xx other than 409)
- **THEN** the client SHALL print the error and exit with a non-zero code