Chunk enrichment: prepend document title to embeddings
Adds enriched_text column to chunks table that prepends document title (and section header when present) to chunk text. Embeddings and FTS now use enriched text for better search relevance. Includes schema migration with backfill for existing data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -19,10 +19,10 @@ async def reindex():
|
||||
|
||||
conn = get_connection(cfg.db_path)
|
||||
try:
|
||||
# Fetch all chunks
|
||||
rows = conn.execute("SELECT id, text FROM chunks ORDER BY id").fetchall()
|
||||
# Fetch all chunks — use enriched_text for embedding (includes title context)
|
||||
rows = conn.execute("SELECT id, enriched_text FROM chunks ORDER BY id").fetchall()
|
||||
chunk_ids = [row["id"] for row in rows]
|
||||
chunk_texts = [row["text"] for row in rows]
|
||||
chunk_texts = [row["enriched_text"] or "" for row in rows]
|
||||
|
||||
logger.info("Reindexing %d chunks with model '%s'", len(chunk_ids), cfg.model)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user