kb/openspec/changes/archive/2026-03-29-kb-title-in-chunks/tasks.md at c5191df9c03f94483208f2acc81eb08ad5a7c203 - kb - DCG Git

Explore Help

Register Sign In

steve/kb

1

0

You've already forked kb

Code Issues Pull Requests Actions Projects Releases 18 Wiki Activity

Files

c5191df9c03f94483208f2acc81eb08ad5a7c203

kb/openspec/changes/archive/2026-03-29-kb-title-in-chunks/tasks.md

T

steve bbe6a5e909 Add dev-up script and archive kb-title-in-chunks change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-30 07:25:22 +01:00

1.8 KiB

Raw Blame History

1. Schema Migration

1.1 Add enriched_text TEXT column to chunks table in database.py:init_schema (with migration check for existing DBs)
1.2 Write backfill query: UPDATE chunks SET enriched_text = ... FROM documents joining title and parsing chunk metadata for section_header
1.3 Drop and recreate chunks_fts virtual table to index enriched_text instead of text
1.4 Update FTS sync triggers (chunks_ai, chunks_ad, chunks_au) to use enriched_text

2. Enrichment Helper

2.1 Create build_enriched_text(title: str, chunk_text: str, metadata: dict | None) -> str helper function in worker.py (or a shared util) that formats "{title} > {section_header}\n\n{chunk_text}" or "{title}\n\n{chunk_text}"

3. Ingestion Pipeline

3.1 Update worker._process_job() to build enriched text for each chunk after chunking
3.2 Pass enriched text to embed_texts() instead of raw chunk text
3.3 Pass enriched text to database.insert_chunk() as the new enriched_text parameter
3.4 Update database.insert_chunk() to accept and store enriched_text

4. Reindex

4.1 Update routes/reindex.py to read enriched_text from chunks table and embed that instead of text

5. Search Results

5.1 Verify search.py:_enrich() returns chunks.text (raw) not enriched_text — no change expected, but confirm

6. Testing

6.1 Test: ingest a short note with a descriptive title, search by title keywords, confirm it is found
6.2 Test: ingest a markdown doc, search by section header, confirm chunks are found
6.3 Test: verify search result text field does not contain the prepended title
6.4 Test: run reindex, verify enriched text is used for new embeddings
6.5 Test: verify schema migration backfills enriched_text for pre-existing chunks on startup

Reference in New Issue View Git Blame Copy Permalink

Powered by Gitea Version: 1.26.2 Page: 85ms Template: 3ms

Auto

English

Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API