Files
kb/openspec/changes/archive/2026-03-29-kb-title-in-chunks/tasks.md
T
2026-03-30 07:25:22 +01:00

1.8 KiB

1. Schema Migration

  • 1.1 Add enriched_text TEXT column to chunks table in database.py:init_schema (with migration check for existing DBs)
  • 1.2 Write backfill query: UPDATE chunks SET enriched_text = ... FROM documents joining title and parsing chunk metadata for section_header
  • 1.3 Drop and recreate chunks_fts virtual table to index enriched_text instead of text
  • 1.4 Update FTS sync triggers (chunks_ai, chunks_ad, chunks_au) to use enriched_text

2. Enrichment Helper

  • 2.1 Create build_enriched_text(title: str, chunk_text: str, metadata: dict | None) -> str helper function in worker.py (or a shared util) that formats "{title} > {section_header}\n\n{chunk_text}" or "{title}\n\n{chunk_text}"

3. Ingestion Pipeline

  • 3.1 Update worker._process_job() to build enriched text for each chunk after chunking
  • 3.2 Pass enriched text to embed_texts() instead of raw chunk text
  • 3.3 Pass enriched text to database.insert_chunk() as the new enriched_text parameter
  • 3.4 Update database.insert_chunk() to accept and store enriched_text

4. Reindex

  • 4.1 Update routes/reindex.py to read enriched_text from chunks table and embed that instead of text

5. Search Results

  • 5.1 Verify search.py:_enrich() returns chunks.text (raw) not enriched_text — no change expected, but confirm

6. Testing

  • 6.1 Test: ingest a short note with a descriptive title, search by title keywords, confirm it is found
  • 6.2 Test: ingest a markdown doc, search by section header, confirm chunks are found
  • 6.3 Test: verify search result text field does not contain the prepended title
  • 6.4 Test: run reindex, verify enriched text is used for new embeddings
  • 6.5 Test: verify schema migration backfills enriched_text for pre-existing chunks on startup