bbe6a5e909
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1.8 KiB
1.8 KiB
1. Schema Migration
- 1.1 Add
enriched_text TEXTcolumn tochunkstable indatabase.py:init_schema(with migration check for existing DBs) - 1.2 Write backfill query:
UPDATE chunks SET enriched_text = ... FROM documentsjoining title and parsing chunk metadata for section_header - 1.3 Drop and recreate
chunks_ftsvirtual table to indexenriched_textinstead oftext - 1.4 Update FTS sync triggers (
chunks_ai,chunks_ad,chunks_au) to useenriched_text
2. Enrichment Helper
- 2.1 Create
build_enriched_text(title: str, chunk_text: str, metadata: dict | None) -> strhelper function inworker.py(or a shared util) that formats"{title} > {section_header}\n\n{chunk_text}"or"{title}\n\n{chunk_text}"
3. Ingestion Pipeline
- 3.1 Update
worker._process_job()to build enriched text for each chunk after chunking - 3.2 Pass enriched text to
embed_texts()instead of raw chunk text - 3.3 Pass enriched text to
database.insert_chunk()as the newenriched_textparameter - 3.4 Update
database.insert_chunk()to accept and storeenriched_text
4. Reindex
- 4.1 Update
routes/reindex.pyto readenriched_textfrom chunks table and embed that instead oftext
5. Search Results
- 5.1 Verify
search.py:_enrich()returnschunks.text(raw) notenriched_text— no change expected, but confirm
6. Testing
- 6.1 Test: ingest a short note with a descriptive title, search by title keywords, confirm it is found
- 6.2 Test: ingest a markdown doc, search by section header, confirm chunks are found
- 6.3 Test: verify search result
textfield does not contain the prepended title - 6.4 Test: run
reindex, verify enriched text is used for new embeddings - 6.5 Test: verify schema migration backfills enriched_text for pre-existing chunks on startup