- Reject duplicate uploads at the API boundary (HTTP 409) instead of silently skipping in the background worker. Checks both ingested documents and in-flight jobs via content_hash on the jobs table. - Go client handles 409 with distinct messages for already-imported documents vs already-queued jobs. - Sanitize FTS5 search queries by quoting each token to prevent syntax errors from special characters like ?, *, ", (), AND, OR, NOT. - Add try/except safety net around FTS5 execute for edge cases. - Add main branch guard to release.sh to prevent releasing from feature branches. - Update specs and README to reflect new behaviour. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.4 KiB
Context
FTS5 has its own query syntax. Characters like ?, *, ", (, ), +, -, ^ and keywords like AND, OR, NOT, NEAR have special meaning. The current code passes the raw user query to chunks_fts MATCH ? — parameterized (safe from SQL injection) but not safe from FTS5 syntax errors.
The fix point is _fts_search() in engine/kb/search.py:92 where params: list = [query].
Goals / Non-Goals
Goals:
- Any user input to the search endpoint produces either valid results or an empty result set — never a 500 error
- Preserve the user's search intent as much as possible (don't over-strip)
Non-Goals:
- Exposing FTS5 advanced syntax to users (they can't use AND/OR/NEAR operators intentionally)
- Changing vector search (it already handles arbitrary strings via the embedding model)
Decisions
1. Quote each token individually
Split the query on whitespace, wrap each token in double quotes ("token"), and join with spaces. FTS5 interprets double-quoted strings as literal phrases, disabling all operator parsing within them. Any embedded double quotes in a token are stripped.
Example: what color is grass? becomes "what" "color" "is" "grass?" — FTS5 treats ? as a literal character inside quotes.
Alternative considered: Strip all non-alphanumeric characters. Rejected because it would break searches for terms containing hyphens, dots, or other meaningful punctuation (e.g., searching for "v2.0" or "self-hosted").
Alternative considered: Use a try/except to catch FTS5 errors and fall back. Rejected as a primary strategy because it silently degrades — but we'll add it as a safety net.
2. Handle empty/whitespace-only queries
If after sanitization no tokens remain, skip FTS search entirely and return empty results. This prevents sending an empty string to MATCH which would also error.
3. Try/except safety net
Wrap the FTS5 execute call in a try/except for sqlite3.OperationalError. If an edge case still slips through, return empty FTS results and log a warning rather than crashing with a 500.
Risks / Trade-offs
- [Reduced FTS expressiveness] Users cannot use FTS5 operators like
AND,OR, phrase matching. → Acceptable trade-off for a personal knowledge base tool where natural language queries are the norm. The hybrid search (vector + FTS) compensates. - [Edge cases] Some Unicode or control characters might still cause issues. → The try/except safety net handles these.