Files
kb/openspec/changes/bulk-ops-remove-collections/proposal.md
T
steve b5a203d2aa Add bulk operations and remove collections abstraction
- Add bulk delete, bulk tags, and bulk set-tags engine endpoints
  (POST /api/v1/bulk/delete, /bulk/tags, /bulk/set-tags)
- Filter-based selection: by tags, doc_type, ID list, ID range
- Safety threshold (KB_BULK_SAFETY_PERCENT, default 70%) prevents
  accidental mass operations unless force=true
- Synchronous execution with audit trail via jobs table
- Add kb_bulk_delete, kb_bulk_tags, kb_bulk_set_tags MCP tools
- Add kb bulk-remove, bulk-tag, bulk-set-tags CLI commands
- Remove collection abstraction from MCP server (use tags instead)
- Remove kb_set_collection MCP tool
- Update SKILL.md, MCP.md, README.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:34:47 +01:00

92 lines
4.7 KiB
Markdown

## Why
Bulk operations on documents (delete, tag, retag) currently require one API/MCP call per document. When an LLM manages hundreds or thousands of documents, this means hundreds of tool calls — burning tokens, adding latency, and creating fragile multi-step flows that can fail partway through.
Additionally, the "collection" abstraction in the MCP server adds complexity without real benefit. Collections are implemented as `collection:`-prefixed tags, but this convention is only enforced in the MCP layer — the CLI and engine don't know about it. This creates inconsistency and extra code. Tags alone, with a naming convention communicated via system prompt or configuration, achieve the same namespace isolation more simply and uniformly.
## What Changes
### 1. Remove collections from MCP server
Strip all collection logic from `mcp/server.py`:
- Remove `COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`, and all collection helper functions
- Remove `collection` parameter from `kb_search`, `kb_addnote`, `kb_upload_start`
- Remove `kb_set_collection` tool entirely
- Remove `_process_document` / `_process_search_results` collection-tag stripping
- Update MCP server instructions to explain tag-based namespace convention
### 2. Add bulk engine endpoints
Three new endpoints in the engine API:
- **POST /api/v1/bulk/delete** — Delete multiple documents matching a filter
- **POST /api/v1/bulk/tags** — Add/remove tags on multiple documents matching a filter
- **POST /api/v1/bulk/set-tags** — Replace all tags on multiple documents matching a filter
All accept a common **selection filter** (combinable with AND logic):
- `document_ids` — explicit list of IDs
- `tags` — documents matching ALL specified tags
- `doc_type` — documents of this type
- `from_id` / `to_id` — ID range (inclusive)
At least one selection criterion is required.
**Safety threshold**: If the operation would affect more than N% of all documents (default 70%, configurable via `KB_BULK_SAFETY_PERCENT` env var), the request is rejected with a 409 response showing what would be affected. The caller must re-send with `force: true` to proceed.
**Response model**: Synchronous execution with summary response. The operation is logged to the jobs table for audit trail:
```json
{
"job_id": 42,
"status": "done",
"matched": 750,
"succeeded": 748,
"failed": 2,
"errors": [
{"document_id": 42, "error": "file locked"},
{"document_id": 99, "error": "not found"}
]
}
```
### 3. Add bulk MCP tools
Expose the bulk engine endpoints as MCP tools:
- `kb_bulk_delete` — bulk delete with filter selection
- `kb_bulk_tags` — bulk add/remove tags with filter selection
- `kb_bulk_set_tags` — bulk replace tags with filter selection
These are thin wrappers around the engine bulk endpoints — no collection translation, no special logic.
### 4. Add bulk CLI commands
- `kb bulk-remove` — bulk delete with `--tags`, `--type`, `--ids`, `--from-id`, `--to-id`, `--force` flags
- `kb bulk-tag` — bulk tag/untag with `--add`, `--remove`, and the same filter flags
- `kb bulk-set-tags` — bulk replace tags with `--tags` (new tags) and the same filter flags
All show a confirmation prompt with match count before executing (unless `--yes`).
## Capabilities
### New Capabilities
- `bulk-operations`: Engine endpoints, MCP tools, and CLI commands for bulk delete, tag, and set-tags operations with filter-based selection and safety threshold.
### Modified Capabilities
- `mcp-document-management`: Remove `kb_set_collection` tool. Remove `collection` parameter from all tools.
### Removed Capabilities
- `mcp-collections`: The collection abstraction (collection helpers, collection parameters, collection tag stripping) is removed from the MCP server entirely.
## Impact
- **Engine API** (`engine/kb/routes/`): New `bulk.py` route module with 3 endpoints. New `bulk` job type in jobs table.
- **Engine database** (`engine/kb/database.py`): Helper functions for bulk selection queries and bulk delete/tag operations.
- **MCP server** (`mcp/server.py`): Remove ~70 lines of collection logic. Add 3 bulk tool definitions. Remove `collection` param from `kb_search`, `kb_addnote`, `kb_upload_start`. Remove `kb_set_collection`.
- **MCP engine client** (`mcp/engine.py`): Add bulk operation methods. Remove no longer needed code.
- **CLI** (`client/cmd/`): New `bulk_remove.go`, `bulk_tag.go`, `bulk_set_tags.go` command files.
- **CLI API client** (`client/internal/api/`): Add `Post` with JSON body support if not present.
- **Breaking changes**: `kb_set_collection` MCP tool removed. `collection` parameter removed from `kb_search`, `kb_addnote`, `kb_upload_start` MCP tools. Any MCP clients using collections will need to switch to tags.