## Context When a document is ingested, the worker chunks its content and stores each chunk's text in the `chunks` table. FTS5 triggers index that text, and the embedding model embeds it. The document title is stored only in `documents.title` — it never participates in search. This means short documents (or documents whose content lacks the title keywords) are invisible to queries that match the title. The reindex endpoint (`POST /api/v1/reindex`) currently reads `chunks.text` and re-embeds it. Any fix must apply consistently at both ingestion and reindex time. ## Goals / Non-Goals **Goals:** - Document titles are searchable via both FTS5 and vector search - Section header breadcrumbs (when present in chunk metadata) are also searchable - Search results continue to return the original chunk text (no title prefix in the `text` field returned to clients) - Existing documents become searchable by title after a `kb reindex` - No schema-breaking migration — additive column only **Non-Goals:** - Changing the chunking strategies themselves (note, markdown, code, docling) - Adding a separate title-search endpoint or client-side title filtering - Changing the search result JSON structure ## Decisions ### 1. Add an `enriched_text` column to the `chunks` table Store the title-prefixed text in a new `chunks.enriched_text` column alongside the existing `chunks.text`. The `text` column remains the raw chunk content (used for display in search results). The `enriched_text` column holds `"{title}\n\n{section_header}\n\n{text}"` (with section_header omitted when absent). **Why not just modify `chunks.text`?** The title would then appear in every search result's text field, which is redundant (title is already a separate field) and would confuse consumers that display results. **Why not reconstruct enriched text on-the-fly at search time?** FTS5 uses an external content table and triggers — it needs a real column to index. Reconstructing via JOIN at FTS query time would defeat the purpose of the FTS index. ### 2. Point FTS5 at `enriched_text` instead of `text` Update the FTS5 virtual table definition and its sync triggers to index `enriched_text` rather than `text`. This is the core change that makes titles searchable via keyword search. Since FTS5 external content tables cannot be ALTERed, existing databases require a rebuild: drop and recreate `chunks_fts` and its triggers, then repopulate. This is handled as a schema migration in `init_schema`. ### 3. Embed `enriched_text` instead of `text` At ingestion time, pass `enriched_text` values to `embed_texts()` instead of raw chunk text. At reindex time, read `enriched_text` from the database. This makes titles searchable via vector similarity too. ### 4. Build enriched text in the worker, not in the ingest modules The enrichment format is: `"{title}\n\n{chunk_text}"` or `"{title} > {section_header}\n\n{chunk_text}"` when a section header exists in chunk metadata. This happens in `worker._process_job()` after chunking and before embedding/insertion. The ingest modules remain unchanged — they continue to return raw chunk text and metadata. ### 5. Schema migration adds `enriched_text` and rebuilds FTS The `init_schema` function will: 1. Add `enriched_text TEXT` column to `chunks` if missing 2. Backfill `enriched_text` from existing data (join with `documents.title` and chunk metadata) 3. Drop and recreate `chunks_fts` to index `enriched_text` instead of `text` 4. Recreate the FTS sync triggers This is safe because the migration only runs when the column is missing (first startup after upgrade). The backfill uses a single UPDATE...FROM query. ## Risks / Trade-offs **Slightly larger database** — Each chunk stores the title string twice (once in `enriched_text`, once via the document FK). For a typical KB with short titles this is negligible (< 1% size increase). → Acceptable for the search quality improvement. **FTS rebuild on upgrade** — First startup after upgrade will rebuild the FTS index, which takes a few seconds for large KBs. → This is a one-time cost and happens automatically. **Embedding drift** — Existing vector embeddings won't include title context until `kb reindex` is run. The FTS backfill happens automatically, but vectors require an explicit reindex. → Document this in release notes. The FTS improvement alone is a significant win even without reindexing vectors. **Title changes not propagated** — If a document's title were ever updated, `enriched_text` would be stale. Currently the engine has no title-update endpoint, so this is not a concern. → No mitigation needed now. If title editing is added later, it should update enriched_text.