Files
kb/openspec/changes/kb-search/specs/document-management/spec.md
T
2026-03-23 20:38:42 +00:00

81 lines
4.2 KiB
Markdown

## ADDED Requirements
### Requirement: List documents
The system SHALL list all indexed documents via `kb list`. Results SHALL include document ID, title, type, tag count, chunk count, and creation date. Output SHALL support `--format json` and `--format human`.
#### Scenario: List all documents
- **WHEN** user runs `kb list`
- **THEN** all documents are listed with their ID, title, type, tags, chunk count, and creation date
#### Scenario: Filter by type
- **WHEN** user runs `kb list --type pdf`
- **THEN** only PDF documents are listed
#### Scenario: Filter by tags
- **WHEN** user runs `kb list --tags admin,ops`
- **THEN** only documents tagged with BOTH "admin" AND "ops" are listed
#### Scenario: Empty database
- **WHEN** user runs `kb list` with no documents indexed
- **THEN** the system prints "No documents indexed. Run `kb add` to get started." and exits with zero status
### Requirement: Document info
The system SHALL display detailed information about a single document via `kb info <doc_id>`, including all metadata, tags, chunk count, and chunk previews (first 100 characters of each chunk).
#### Scenario: View document info
- **WHEN** user runs `kb info 42`
- **THEN** the system displays: title, source path, type, language (if code), content hash, creation date, tags, total chunks, and a preview of each chunk
#### Scenario: Invalid document ID
- **WHEN** user runs `kb info 9999` and no document with ID 9999 exists
- **THEN** the system prints "Document not found: 9999" and exits with non-zero status
### Requirement: Remove document
The system SHALL remove a document and all its associated chunks, embeddings, and tag associations via `kb remove <doc_id>`. The system SHALL ask for confirmation before deletion unless `--yes` is passed.
#### Scenario: Remove with confirmation
- **WHEN** user runs `kb remove 42`
- **THEN** the system displays the document title and asks "Remove 'Git Admin Guide' and its 28 chunks? [y/N]". On confirmation, the document, its chunks, FTS entries, vector embeddings, and tag associations are deleted.
#### Scenario: Remove with --yes flag
- **WHEN** user runs `kb remove 42 --yes`
- **THEN** the document is removed without confirmation prompt
#### Scenario: Cascading delete
- **WHEN** a document is removed
- **THEN** all rows in `chunks`, `chunks_fts`, `chunks_vec`, and `document_tags` referencing that document SHALL be deleted
### Requirement: Tag management
The system SHALL support adding and removing tags on documents via `kb tag <doc_id> --add tag1,tag2` and `kb tag <doc_id> --remove tag1`. Tags are case-insensitive and stored lowercase. The system SHALL list all tags with document counts via `kb tags`.
#### Scenario: Add tags to a document
- **WHEN** user runs `kb tag 42 --add git,admin`
- **THEN** the tags "git" and "admin" are associated with document 42. Tags are created if they don't exist.
#### Scenario: Remove a tag from a document
- **WHEN** user runs `kb tag 42 --remove admin`
- **THEN** the "admin" tag association is removed from document 42. The tag itself remains in the tags table if other documents use it.
#### Scenario: List all tags
- **WHEN** user runs `kb tags`
- **THEN** the system lists all tags with the count of documents using each tag, sorted by count descending
#### Scenario: Tag on ingestion
- **WHEN** user runs `kb add report.pdf --tags compliance,q1`
- **THEN** the document is ingested and immediately tagged with "compliance" and "q1"
#### Scenario: Tags in JSON format
- **WHEN** user runs `kb tags --format json`
- **THEN** output is a JSON array of objects: `[{"name": "git", "count": 15}, ...]`
### Requirement: Database status
The system SHALL report database statistics via `kb status`, including: total documents (by type), total chunks, database file size, active model name and dimension, and schema version.
#### Scenario: Show status
- **WHEN** user runs `kb status`
- **THEN** the system displays: document counts by type, total chunks, DB file size, model name, embedding dimension, and schema version
#### Scenario: Status before init
- **WHEN** user runs `kb status` before `kb init`
- **THEN** the system prints "Knowledge base not initialised. Run `kb init` first." and exits with non-zero status