Add dev-up script and archive kb-title-in-chunks change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,33 @@
|
||||
## 1. Schema Migration
|
||||
|
||||
- [x] 1.1 Add `enriched_text TEXT` column to `chunks` table in `database.py:init_schema` (with migration check for existing DBs)
|
||||
- [x] 1.2 Write backfill query: `UPDATE chunks SET enriched_text = ... FROM documents` joining title and parsing chunk metadata for section_header
|
||||
- [x] 1.3 Drop and recreate `chunks_fts` virtual table to index `enriched_text` instead of `text`
|
||||
- [x] 1.4 Update FTS sync triggers (`chunks_ai`, `chunks_ad`, `chunks_au`) to use `enriched_text`
|
||||
|
||||
## 2. Enrichment Helper
|
||||
|
||||
- [x] 2.1 Create `build_enriched_text(title: str, chunk_text: str, metadata: dict | None) -> str` helper function in `worker.py` (or a shared util) that formats `"{title} > {section_header}\n\n{chunk_text}"` or `"{title}\n\n{chunk_text}"`
|
||||
|
||||
## 3. Ingestion Pipeline
|
||||
|
||||
- [x] 3.1 Update `worker._process_job()` to build enriched text for each chunk after chunking
|
||||
- [x] 3.2 Pass enriched text to `embed_texts()` instead of raw chunk text
|
||||
- [x] 3.3 Pass enriched text to `database.insert_chunk()` as the new `enriched_text` parameter
|
||||
- [x] 3.4 Update `database.insert_chunk()` to accept and store `enriched_text`
|
||||
|
||||
## 4. Reindex
|
||||
|
||||
- [x] 4.1 Update `routes/reindex.py` to read `enriched_text` from chunks table and embed that instead of `text`
|
||||
|
||||
## 5. Search Results
|
||||
|
||||
- [x] 5.1 Verify `search.py:_enrich()` returns `chunks.text` (raw) not `enriched_text` — no change expected, but confirm
|
||||
|
||||
## 6. Testing
|
||||
|
||||
- [x] 6.1 Test: ingest a short note with a descriptive title, search by title keywords, confirm it is found
|
||||
- [x] 6.2 Test: ingest a markdown doc, search by section header, confirm chunks are found
|
||||
- [x] 6.3 Test: verify search result `text` field does not contain the prepended title
|
||||
- [x] 6.4 Test: run `reindex`, verify enriched text is used for new embeddings
|
||||
- [x] 6.5 Test: verify schema migration backfills enriched_text for pre-existing chunks on startup
|
||||
Reference in New Issue
Block a user