Add dev-up script and archive kb-title-in-chunks change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 07:25:22 +01:00
parent 743102aee4
commit bbe6a5e909
7 changed files with 242 additions and 0 deletions
@@ -0,0 +1,75 @@
+## ADDED Requirements
+
+### Requirement: Chunk text enrichment with document title
+
+The engine SHALL prepend the document title to each chunk's text before FTS indexing and vector embedding. The enriched text SHALL be stored in a dedicated `enriched_text` column on the `chunks` table. The original chunk text SHALL remain in the `text` column for display purposes.
+
+The enrichment format SHALL be:
+- Without section header: `"{title}\n\n{chunk_text}"`
+- With section header: `"{title} > {section_header}\n\n{chunk_text}"`
+
+Where `section_header` is the value from the chunk's metadata `section_header` field, when present.
+
+#### Scenario: Note ingestion with title enrichment
+- **WHEN** a note titled "Suitcase Locks" with content "Steve = 363" is ingested
+- **THEN** the `chunks.text` column SHALL contain "Steve = 363" and the `chunks.enriched_text` column SHALL contain "Suitcase Locks\n\nSteve = 363"
+
+#### Scenario: Markdown chunk with section header enrichment
+- **WHEN** a markdown document titled "DCG Lab Hardware" produces a chunk with section_header "GRIMDAWN > motherboard" and text "MSI X870 Tomahawk"
+- **THEN** the `chunks.enriched_text` SHALL contain "DCG Lab Hardware > GRIMDAWN > motherboard\n\nMSI X870 Tomahawk"
+
+#### Scenario: Chunk without section header
+- **WHEN** a document titled "Docker Tips" produces a chunk with no section_header in metadata and text "dbash() { docker exec -it $1 bash; }"
+- **THEN** the `chunks.enriched_text` SHALL contain "Docker Tips\n\ndbash() { docker exec -it $1 bash; }"
+
+---
+
+### Requirement: FTS5 indexes enriched text
+
+The FTS5 virtual table `chunks_fts` SHALL index the `enriched_text` column instead of the `text` column. All FTS sync triggers (insert, update, delete) SHALL operate on `enriched_text`.
+
+#### Scenario: FTS search matches document title
+- **WHEN** a user searches for "suitcase locks" and a document titled "Suitcase Locks" exists with chunk text "Steve = 363"
+- **THEN** the FTS5 search SHALL return that chunk as a match
+
+#### Scenario: FTS search still matches chunk content
+- **WHEN** a user searches for "MSI X870" and a chunk contains that text in its body
+- **THEN** the FTS5 search SHALL return that chunk as a match (enrichment does not break content matching)
+
+---
+
+### Requirement: Vector embeddings use enriched text
+
+The embedding model SHALL receive `enriched_text` (not raw `text`) when generating vectors during both initial ingestion and reindex operations.
+
+#### Scenario: Vector search matches document title
+- **WHEN** a user searches semantically for "luggage combination codes" and a document titled "Suitcase Locks" exists
+- **THEN** the vector search SHALL return that chunk with higher similarity than it would without title enrichment
+
+#### Scenario: Reindex uses enriched text
+- **WHEN** `POST /api/v1/reindex` is called
+- **THEN** the engine SHALL read `enriched_text` from the chunks table and embed that (not `text`)
+
+---
+
+### Requirement: Schema migration adds enriched_text column
+
+On startup, `init_schema` SHALL add the `enriched_text` column to the `chunks` table if it does not exist. It SHALL then backfill `enriched_text` for all existing chunks by joining with `documents.title` and parsing chunk metadata for section headers. It SHALL rebuild the FTS5 table and triggers to index `enriched_text`.
+
+#### Scenario: First startup after upgrade
+- **WHEN** the engine starts and `chunks.enriched_text` column does not exist
+- **THEN** the engine SHALL add the column, backfill all rows, drop and recreate `chunks_fts` to index `enriched_text`, and recreate the FTS sync triggers
+
+#### Scenario: Subsequent startup
+- **WHEN** the engine starts and `chunks.enriched_text` column already exists
+- **THEN** the engine SHALL not perform any migration and start normally
+
+---
+
+### Requirement: Search results return raw text
+
+Search results SHALL continue to return the original chunk text (from `chunks.text`) in the `text` field, not the enriched text. The document title is already returned as a separate `title` field.
+
+#### Scenario: Search result text field
+- **WHEN** a search returns a chunk from document "Suitcase Locks" with raw text "Steve = 363"
+- **THEN** the result `text` field SHALL be "Steve = 363" (not "Suitcase Locks\n\nSteve = 363")