Add bulk operations and remove collections abstraction

- Add bulk delete, bulk tags, and bulk set-tags engine endpoints
  (POST /api/v1/bulk/delete, /bulk/tags, /bulk/set-tags)
- Filter-based selection: by tags, doc_type, ID list, ID range
- Safety threshold (KB_BULK_SAFETY_PERCENT, default 70%) prevents
  accidental mass operations unless force=true
- Synchronous execution with audit trail via jobs table
- Add kb_bulk_delete, kb_bulk_tags, kb_bulk_set_tags MCP tools
- Add kb bulk-remove, bulk-tag, bulk-set-tags CLI commands
- Remove collection abstraction from MCP server (use tags instead)
- Remove kb_set_collection MCP tool
- Update SKILL.md, MCP.md, README.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-04 22:34:47 +01:00
parent 0c124c4ab7
commit b5a203d2aa
21 changed files with 1619 additions and 112 deletions
@@ -0,0 +1,45 @@
## 1. Remove collections from MCP server
- [x] 1.1 Remove collection constants and helper functions from `mcp/server.py` (`COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`, `_collection_tag`, `_strip_collection_tags`, `_process_document`, `_process_search_results`, `_ensure_exclusive_collection`)
- [x] 1.2 Remove `collection` parameter from `kb_search`, `kb_addnote`, `kb_upload_start` tools
- [x] 1.3 Remove `kb_set_collection` tool entirely
- [x] 1.4 Remove `_process_document` / `_process_search_results` calls from `kb_get`, `kb_update_note`, `kb_search`
- [x] 1.5 Update MCP server instructions text to reflect tags-only approach
## 2. Engine bulk infrastructure
- [x] 2.1 Add `bulk_safety_percent` to `Config` class in `engine/kb/config.py` (env var `KB_BULK_SAFETY_PERCENT`, default 70)
- [x] 2.2 Add `job_type` column migration to `database.py` `init_schema` (TEXT, default "ingest")
- [x] 2.3 Add `resolve_bulk_selection(conn, document_ids, tags, doc_type, from_id, to_id)` helper to `database.py` — returns list of matching document IDs
- [x] 2.4 Add `create_bulk_job(conn, job_type, filters_json, matched, succeeded, failed, errors_json)` helper to `database.py`
## 3. Engine bulk endpoints
- [x] 3.1 Create `engine/kb/routes/bulk.py` with shared Pydantic request model (`BulkSelectionRequest` with selection fields + `force` bool)
- [x] 3.2 Add `_check_safety_threshold` helper that returns 409 if threshold exceeded
- [x] 3.3 Implement `POST /api/v1/bulk/delete` — resolve selection, check threshold, delete documents in transaction, clean up files, log job, return summary
- [x] 3.4 Implement `POST /api/v1/bulk/tags` — resolve selection, check threshold, add/remove tags on matched docs, log job, return summary
- [x] 3.5 Implement `POST /api/v1/bulk/set-tags` — resolve selection, check threshold, clear and replace tags on matched docs, log job, return summary
- [x] 3.6 Import bulk routes in engine app startup (add to `engine/kb/routes/__init__.py` or `main.py`)
## 4. MCP bulk tools
- [x] 4.1 Add `bulk_delete`, `bulk_tags`, `bulk_set_tags` methods to `mcp/engine.py`
- [x] 4.2 Add `kb_bulk_delete` tool to `mcp/server.py`
- [x] 4.3 Add `kb_bulk_tags` tool to `mcp/server.py`
- [x] 4.4 Add `kb_bulk_set_tags` tool to `mcp/server.py`
## 5. CLI bulk commands
- [x] 5.1 Create `client/cmd/bulk_remove.go``kb bulk-remove` with filter flags, confirmation prompt, JSON output support
- [x] 5.2 Create `client/cmd/bulk_tag.go``kb bulk-tag` with filter flags + `--add`/`--remove`, confirmation prompt
- [x] 5.3 Create `client/cmd/bulk_set_tags.go``kb bulk-set-tags` with filter flags + `--set`, confirmation prompt
## 6. Verification
- [x] 6.1 Test collection removal: verify `kb_search`, `kb_addnote`, `kb_get`, `kb_update_note`, `kb_upload_start` work without collection params
- [x] 6.2 Test bulk delete via engine API: filter by tags, by IDs, by range, safety threshold trigger and force override
- [x] 6.3 Test bulk tags and bulk set-tags via engine API
- [x] 6.4 Test MCP bulk tools against running engine
- [x] 6.5 Test CLI bulk commands against running engine
- [x] 6.6 Test audit trail: verify bulk jobs appear in `kb jobs` output