## Why Bulk operations on documents (delete, tag, retag) currently require one API/MCP call per document. When an LLM manages hundreds or thousands of documents, this means hundreds of tool calls — burning tokens, adding latency, and creating fragile multi-step flows that can fail partway through. Additionally, the "collection" abstraction in the MCP server adds complexity without real benefit. Collections are implemented as `collection:`-prefixed tags, but this convention is only enforced in the MCP layer — the CLI and engine don't know about it. This creates inconsistency and extra code. Tags alone, with a naming convention communicated via system prompt or configuration, achieve the same namespace isolation more simply and uniformly. ## What Changes ### 1. Remove collections from MCP server Strip all collection logic from `mcp/server.py`: - Remove `COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`, and all collection helper functions - Remove `collection` parameter from `kb_search`, `kb_addnote`, `kb_upload_start` - Remove `kb_set_collection` tool entirely - Remove `_process_document` / `_process_search_results` collection-tag stripping - Update MCP server instructions to explain tag-based namespace convention ### 2. Add bulk engine endpoints Three new endpoints in the engine API: - **POST /api/v1/bulk/delete** — Delete multiple documents matching a filter - **POST /api/v1/bulk/tags** — Add/remove tags on multiple documents matching a filter - **POST /api/v1/bulk/set-tags** — Replace all tags on multiple documents matching a filter All accept a common **selection filter** (combinable with AND logic): - `document_ids` — explicit list of IDs - `tags` — documents matching ALL specified tags - `doc_type` — documents of this type - `from_id` / `to_id` — ID range (inclusive) At least one selection criterion is required. **Safety threshold**: If the operation would affect more than N% of all documents (default 70%, configurable via `KB_BULK_SAFETY_PERCENT` env var), the request is rejected with a 409 response showing what would be affected. The caller must re-send with `force: true` to proceed. **Response model**: Synchronous execution with summary response. The operation is logged to the jobs table for audit trail: ```json { "job_id": 42, "status": "done", "matched": 750, "succeeded": 748, "failed": 2, "errors": [ {"document_id": 42, "error": "file locked"}, {"document_id": 99, "error": "not found"} ] } ``` ### 3. Add bulk MCP tools Expose the bulk engine endpoints as MCP tools: - `kb_bulk_delete` — bulk delete with filter selection - `kb_bulk_tags` — bulk add/remove tags with filter selection - `kb_bulk_set_tags` — bulk replace tags with filter selection These are thin wrappers around the engine bulk endpoints — no collection translation, no special logic. ### 4. Add bulk CLI commands - `kb bulk-remove` — bulk delete with `--tags`, `--type`, `--ids`, `--from-id`, `--to-id`, `--force` flags - `kb bulk-tag` — bulk tag/untag with `--add`, `--remove`, and the same filter flags - `kb bulk-set-tags` — bulk replace tags with `--tags` (new tags) and the same filter flags All show a confirmation prompt with match count before executing (unless `--yes`). ## Capabilities ### New Capabilities - `bulk-operations`: Engine endpoints, MCP tools, and CLI commands for bulk delete, tag, and set-tags operations with filter-based selection and safety threshold. ### Modified Capabilities - `mcp-document-management`: Remove `kb_set_collection` tool. Remove `collection` parameter from all tools. ### Removed Capabilities - `mcp-collections`: The collection abstraction (collection helpers, collection parameters, collection tag stripping) is removed from the MCP server entirely. ## Impact - **Engine API** (`engine/kb/routes/`): New `bulk.py` route module with 3 endpoints. New `bulk` job type in jobs table. - **Engine database** (`engine/kb/database.py`): Helper functions for bulk selection queries and bulk delete/tag operations. - **MCP server** (`mcp/server.py`): Remove ~70 lines of collection logic. Add 3 bulk tool definitions. Remove `collection` param from `kb_search`, `kb_addnote`, `kb_upload_start`. Remove `kb_set_collection`. - **MCP engine client** (`mcp/engine.py`): Add bulk operation methods. Remove no longer needed code. - **CLI** (`client/cmd/`): New `bulk_remove.go`, `bulk_tag.go`, `bulk_set_tags.go` command files. - **CLI API client** (`client/internal/api/`): Add `Post` with JSON body support if not present. - **Breaking changes**: `kb_set_collection` MCP tool removed. `collection` parameter removed from `kb_search`, `kb_addnote`, `kb_upload_start` MCP tools. Any MCP clients using collections will need to switch to tags.