Files
kb/openspec/changes/archive/2026-03-29-kb-title-in-chunks/tasks.md
T
2026-03-30 07:25:22 +01:00

34 lines
1.8 KiB
Markdown

## 1. Schema Migration
- [x] 1.1 Add `enriched_text TEXT` column to `chunks` table in `database.py:init_schema` (with migration check for existing DBs)
- [x] 1.2 Write backfill query: `UPDATE chunks SET enriched_text = ... FROM documents` joining title and parsing chunk metadata for section_header
- [x] 1.3 Drop and recreate `chunks_fts` virtual table to index `enriched_text` instead of `text`
- [x] 1.4 Update FTS sync triggers (`chunks_ai`, `chunks_ad`, `chunks_au`) to use `enriched_text`
## 2. Enrichment Helper
- [x] 2.1 Create `build_enriched_text(title: str, chunk_text: str, metadata: dict | None) -> str` helper function in `worker.py` (or a shared util) that formats `"{title} > {section_header}\n\n{chunk_text}"` or `"{title}\n\n{chunk_text}"`
## 3. Ingestion Pipeline
- [x] 3.1 Update `worker._process_job()` to build enriched text for each chunk after chunking
- [x] 3.2 Pass enriched text to `embed_texts()` instead of raw chunk text
- [x] 3.3 Pass enriched text to `database.insert_chunk()` as the new `enriched_text` parameter
- [x] 3.4 Update `database.insert_chunk()` to accept and store `enriched_text`
## 4. Reindex
- [x] 4.1 Update `routes/reindex.py` to read `enriched_text` from chunks table and embed that instead of `text`
## 5. Search Results
- [x] 5.1 Verify `search.py:_enrich()` returns `chunks.text` (raw) not `enriched_text` — no change expected, but confirm
## 6. Testing
- [x] 6.1 Test: ingest a short note with a descriptive title, search by title keywords, confirm it is found
- [x] 6.2 Test: ingest a markdown doc, search by section header, confirm chunks are found
- [x] 6.3 Test: verify search result `text` field does not contain the prepended title
- [x] 6.4 Test: run `reindex`, verify enriched text is used for new embeddings
- [x] 6.5 Test: verify schema migration backfills enriched_text for pre-existing chunks on startup