bbe6a5e909
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
34 lines
1.8 KiB
Markdown
34 lines
1.8 KiB
Markdown
## 1. Schema Migration
|
|
|
|
- [x] 1.1 Add `enriched_text TEXT` column to `chunks` table in `database.py:init_schema` (with migration check for existing DBs)
|
|
- [x] 1.2 Write backfill query: `UPDATE chunks SET enriched_text = ... FROM documents` joining title and parsing chunk metadata for section_header
|
|
- [x] 1.3 Drop and recreate `chunks_fts` virtual table to index `enriched_text` instead of `text`
|
|
- [x] 1.4 Update FTS sync triggers (`chunks_ai`, `chunks_ad`, `chunks_au`) to use `enriched_text`
|
|
|
|
## 2. Enrichment Helper
|
|
|
|
- [x] 2.1 Create `build_enriched_text(title: str, chunk_text: str, metadata: dict | None) -> str` helper function in `worker.py` (or a shared util) that formats `"{title} > {section_header}\n\n{chunk_text}"` or `"{title}\n\n{chunk_text}"`
|
|
|
|
## 3. Ingestion Pipeline
|
|
|
|
- [x] 3.1 Update `worker._process_job()` to build enriched text for each chunk after chunking
|
|
- [x] 3.2 Pass enriched text to `embed_texts()` instead of raw chunk text
|
|
- [x] 3.3 Pass enriched text to `database.insert_chunk()` as the new `enriched_text` parameter
|
|
- [x] 3.4 Update `database.insert_chunk()` to accept and store `enriched_text`
|
|
|
|
## 4. Reindex
|
|
|
|
- [x] 4.1 Update `routes/reindex.py` to read `enriched_text` from chunks table and embed that instead of `text`
|
|
|
|
## 5. Search Results
|
|
|
|
- [x] 5.1 Verify `search.py:_enrich()` returns `chunks.text` (raw) not `enriched_text` — no change expected, but confirm
|
|
|
|
## 6. Testing
|
|
|
|
- [x] 6.1 Test: ingest a short note with a descriptive title, search by title keywords, confirm it is found
|
|
- [x] 6.2 Test: ingest a markdown doc, search by section header, confirm chunks are found
|
|
- [x] 6.3 Test: verify search result `text` field does not contain the prepended title
|
|
- [x] 6.4 Test: run `reindex`, verify enriched text is used for new embeddings
|
|
- [x] 6.5 Test: verify schema migration backfills enriched_text for pre-existing chunks on startup
|