## Context FTS5 has its own query syntax. Characters like `?`, `*`, `"`, `(`, `)`, `+`, `-`, `^` and keywords like `AND`, `OR`, `NOT`, `NEAR` have special meaning. The current code passes the raw user query to `chunks_fts MATCH ?` — parameterized (safe from SQL injection) but not safe from FTS5 syntax errors. The fix point is `_fts_search()` in `engine/kb/search.py:92` where `params: list = [query]`. ## Goals / Non-Goals **Goals:** - Any user input to the search endpoint produces either valid results or an empty result set — never a 500 error - Preserve the user's search intent as much as possible (don't over-strip) **Non-Goals:** - Exposing FTS5 advanced syntax to users (they can't use AND/OR/NEAR operators intentionally) - Changing vector search (it already handles arbitrary strings via the embedding model) ## Decisions ### 1. Quote each token individually Split the query on whitespace, wrap each token in double quotes (`"token"`), and join with spaces. FTS5 interprets double-quoted strings as literal phrases, disabling all operator parsing within them. Any embedded double quotes in a token are stripped. Example: `what color is grass?` becomes `"what" "color" "is" "grass?"` — FTS5 treats `?` as a literal character inside quotes. **Alternative considered**: Strip all non-alphanumeric characters. Rejected because it would break searches for terms containing hyphens, dots, or other meaningful punctuation (e.g., searching for "v2.0" or "self-hosted"). **Alternative considered**: Use a try/except to catch FTS5 errors and fall back. Rejected as a primary strategy because it silently degrades — but we'll add it as a safety net. ### 2. Handle empty/whitespace-only queries If after sanitization no tokens remain, skip FTS search entirely and return empty results. This prevents sending an empty string to MATCH which would also error. ### 3. Try/except safety net Wrap the FTS5 execute call in a try/except for `sqlite3.OperationalError`. If an edge case still slips through, return empty FTS results and log a warning rather than crashing with a 500. ## Risks / Trade-offs - **[Reduced FTS expressiveness]** Users cannot use FTS5 operators like `AND`, `OR`, phrase matching. → Acceptable trade-off for a personal knowledge base tool where natural language queries are the norm. The hybrid search (vector + FTS) compensates. - **[Edge cases]** Some Unicode or control characters might still cause issues. → The try/except safety net handles these.