Upload-time dedup, FTS5 query sanitization, release guard #1

Merged
steve merged 1 commits from 2.0.5 into main 2026-03-26 23:06:09 +00:00
Owner

Summary

  • Upload-time duplicate detection: Reject duplicate uploads at the API boundary with HTTP 409 instead of silently accepting and later skipping in the background worker. Checks both already-ingested documents (document_id) and in-flight jobs (job_id), with distinct response shapes and client messages for each case.
  • FTS5 query sanitization: Escape user search queries for FTS5 by quoting each token individually, preventing 500 errors from special characters (?, *, ", (), etc.) and FTS5 keywords (AND, OR, NOT). Includes a try/except safety net for edge cases.
  • Release branch guard: release.sh now refuses to run from non-main branches.
  • Specs and README updated to reflect all changes.

Test plan

  • Rebuild engine with --build, upload a file, upload same file again — verify 409 with "Already imported" message
  • Upload same file rapidly twice before first processes — verify second gets 409 with "Already queued" message
  • kb search "what color is grass?" returns results, not a 500 error
  • kb search "NOT something OR (other)" treats input as literal terms
  • Run ./release.sh --gitea from a non-main branch — verify it exits with error

🤖 Generated with Claude Code

## Summary - **Upload-time duplicate detection**: Reject duplicate uploads at the API boundary with HTTP 409 instead of silently accepting and later skipping in the background worker. Checks both already-ingested documents (`document_id`) and in-flight jobs (`job_id`), with distinct response shapes and client messages for each case. - **FTS5 query sanitization**: Escape user search queries for FTS5 by quoting each token individually, preventing 500 errors from special characters (`?`, `*`, `"`, `()`, etc.) and FTS5 keywords (`AND`, `OR`, `NOT`). Includes a try/except safety net for edge cases. - **Release branch guard**: `release.sh` now refuses to run from non-main branches. - Specs and README updated to reflect all changes. ## Test plan - [ ] Rebuild engine with `--build`, upload a file, upload same file again — verify 409 with "Already imported" message - [ ] Upload same file rapidly twice before first processes — verify second gets 409 with "Already queued" message - [ ] `kb search "what color is grass?"` returns results, not a 500 error - [ ] `kb search "NOT something OR (other)"` treats input as literal terms - [ ] Run `./release.sh --gitea` from a non-main branch — verify it exits with error 🤖 Generated with [Claude Code](https://claude.com/claude-code)
steve added 1 commit 2026-03-26 23:05:29 +00:00
- Reject duplicate uploads at the API boundary (HTTP 409) instead of
  silently skipping in the background worker. Checks both ingested
  documents and in-flight jobs via content_hash on the jobs table.
- Go client handles 409 with distinct messages for already-imported
  documents vs already-queued jobs.
- Sanitize FTS5 search queries by quoting each token to prevent syntax
  errors from special characters like ?, *, ", (), AND, OR, NOT.
- Add try/except safety net around FTS5 execute for edge cases.
- Add main branch guard to release.sh to prevent releasing from
  feature branches.
- Update specs and README to reflect new behaviour.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
steve merged commit 4590c124ad into main 2026-03-26 23:06:09 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/kb#1