Chunk enrichment: prepend document title to embeddings

Adds enriched_text column to chunks table that prepends document title
(and section header when present) to chunk text. Embeddings and FTS now
use enriched text for better search relevance. Includes schema migration
with backfill for existing data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-29 21:03:48 +01:00
parent 5f9946efc9
commit b2176c36ea
10 changed files with 278 additions and 21 deletions
+3 -3
View File
@@ -19,10 +19,10 @@ async def reindex():
conn = get_connection(cfg.db_path)
try:
# Fetch all chunks
rows = conn.execute("SELECT id, text FROM chunks ORDER BY id").fetchall()
# Fetch all chunks — use enriched_text for embedding (includes title context)
rows = conn.execute("SELECT id, enriched_text FROM chunks ORDER BY id").fetchall()
chunk_ids = [row["id"] for row in rows]
chunk_texts = [row["text"] for row in rows]
chunk_texts = [row["enriched_text"] or "" for row in rows]
logger.info("Reindexing %d chunks with model '%s'", len(chunk_ids), cfg.model)