Files
kb/openspec/changes/archive/2026-03-26-sanitize-fts5-query/proposal.md
T
steve 6fec627503 Upload-time duplicate detection, FTS5 query sanitization, release guard
- Reject duplicate uploads at the API boundary (HTTP 409) instead of
  silently skipping in the background worker. Checks both ingested
  documents and in-flight jobs via content_hash on the jobs table.
- Go client handles 409 with distinct messages for already-imported
  documents vs already-queued jobs.
- Sanitize FTS5 search queries by quoting each token to prevent syntax
  errors from special characters like ?, *, ", (), AND, OR, NOT.
- Add try/except safety net around FTS5 execute for edge cases.
- Add main branch guard to release.sh to prevent releasing from
  feature branches.
- Update specs and README to reflect new behaviour.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 23:05:07 +00:00

1.1 KiB

Why

Searching with natural language queries containing characters like ?, ", *, (, ), -, or FTS5 keywords (AND, OR, NOT, NEAR) causes a 500 error because the raw query string is passed directly to chunks_fts MATCH ? without escaping. Users should be able to type anything into a search query without triggering syntax errors.

What Changes

  • Sanitize FTS5 query input: Escape or strip FTS5 special characters from the user's query before passing it to the MATCH operator
  • Graceful fallback: If the sanitized query produces no valid FTS5 terms, return empty results from FTS instead of erroring

Capabilities

New Capabilities

(none)

Modified Capabilities

  • engine-api: The "Hybrid search" requirement changes — the engine must sanitize user queries to prevent FTS5 syntax errors for any input

Impact

  • Engine search (engine/kb/search.py): _fts_search() needs query sanitization before the MATCH parameter
  • No client changes: The client already displays results or errors correctly
  • No schema changes: No database modifications needed