Add bulk operations and remove collections abstraction
- Add bulk delete, bulk tags, and bulk set-tags engine endpoints (POST /api/v1/bulk/delete, /bulk/tags, /bulk/set-tags) - Filter-based selection: by tags, doc_type, ID list, ID range - Safety threshold (KB_BULK_SAFETY_PERCENT, default 70%) prevents accidental mass operations unless force=true - Synchronous execution with audit trail via jobs table - Add kb_bulk_delete, kb_bulk_tags, kb_bulk_set_tags MCP tools - Add kb bulk-remove, bulk-tag, bulk-set-tags CLI commands - Remove collection abstraction from MCP server (use tags instead) - Remove kb_set_collection MCP tool - Update SKILL.md, MCP.md, README.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,91 @@
|
||||
## Why
|
||||
|
||||
Bulk operations on documents (delete, tag, retag) currently require one API/MCP call per document. When an LLM manages hundreds or thousands of documents, this means hundreds of tool calls — burning tokens, adding latency, and creating fragile multi-step flows that can fail partway through.
|
||||
|
||||
Additionally, the "collection" abstraction in the MCP server adds complexity without real benefit. Collections are implemented as `collection:`-prefixed tags, but this convention is only enforced in the MCP layer — the CLI and engine don't know about it. This creates inconsistency and extra code. Tags alone, with a naming convention communicated via system prompt or configuration, achieve the same namespace isolation more simply and uniformly.
|
||||
|
||||
## What Changes
|
||||
|
||||
### 1. Remove collections from MCP server
|
||||
|
||||
Strip all collection logic from `mcp/server.py`:
|
||||
- Remove `COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`, and all collection helper functions
|
||||
- Remove `collection` parameter from `kb_search`, `kb_addnote`, `kb_upload_start`
|
||||
- Remove `kb_set_collection` tool entirely
|
||||
- Remove `_process_document` / `_process_search_results` collection-tag stripping
|
||||
- Update MCP server instructions to explain tag-based namespace convention
|
||||
|
||||
### 2. Add bulk engine endpoints
|
||||
|
||||
Three new endpoints in the engine API:
|
||||
|
||||
- **POST /api/v1/bulk/delete** — Delete multiple documents matching a filter
|
||||
- **POST /api/v1/bulk/tags** — Add/remove tags on multiple documents matching a filter
|
||||
- **POST /api/v1/bulk/set-tags** — Replace all tags on multiple documents matching a filter
|
||||
|
||||
All accept a common **selection filter** (combinable with AND logic):
|
||||
- `document_ids` — explicit list of IDs
|
||||
- `tags` — documents matching ALL specified tags
|
||||
- `doc_type` — documents of this type
|
||||
- `from_id` / `to_id` — ID range (inclusive)
|
||||
|
||||
At least one selection criterion is required.
|
||||
|
||||
**Safety threshold**: If the operation would affect more than N% of all documents (default 70%, configurable via `KB_BULK_SAFETY_PERCENT` env var), the request is rejected with a 409 response showing what would be affected. The caller must re-send with `force: true` to proceed.
|
||||
|
||||
**Response model**: Synchronous execution with summary response. The operation is logged to the jobs table for audit trail:
|
||||
|
||||
```json
|
||||
{
|
||||
"job_id": 42,
|
||||
"status": "done",
|
||||
"matched": 750,
|
||||
"succeeded": 748,
|
||||
"failed": 2,
|
||||
"errors": [
|
||||
{"document_id": 42, "error": "file locked"},
|
||||
{"document_id": 99, "error": "not found"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Add bulk MCP tools
|
||||
|
||||
Expose the bulk engine endpoints as MCP tools:
|
||||
- `kb_bulk_delete` — bulk delete with filter selection
|
||||
- `kb_bulk_tags` — bulk add/remove tags with filter selection
|
||||
- `kb_bulk_set_tags` — bulk replace tags with filter selection
|
||||
|
||||
These are thin wrappers around the engine bulk endpoints — no collection translation, no special logic.
|
||||
|
||||
### 4. Add bulk CLI commands
|
||||
|
||||
- `kb bulk-remove` — bulk delete with `--tags`, `--type`, `--ids`, `--from-id`, `--to-id`, `--force` flags
|
||||
- `kb bulk-tag` — bulk tag/untag with `--add`, `--remove`, and the same filter flags
|
||||
- `kb bulk-set-tags` — bulk replace tags with `--tags` (new tags) and the same filter flags
|
||||
|
||||
All show a confirmation prompt with match count before executing (unless `--yes`).
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
|
||||
- `bulk-operations`: Engine endpoints, MCP tools, and CLI commands for bulk delete, tag, and set-tags operations with filter-based selection and safety threshold.
|
||||
|
||||
### Modified Capabilities
|
||||
|
||||
- `mcp-document-management`: Remove `kb_set_collection` tool. Remove `collection` parameter from all tools.
|
||||
|
||||
### Removed Capabilities
|
||||
|
||||
- `mcp-collections`: The collection abstraction (collection helpers, collection parameters, collection tag stripping) is removed from the MCP server entirely.
|
||||
|
||||
## Impact
|
||||
|
||||
- **Engine API** (`engine/kb/routes/`): New `bulk.py` route module with 3 endpoints. New `bulk` job type in jobs table.
|
||||
- **Engine database** (`engine/kb/database.py`): Helper functions for bulk selection queries and bulk delete/tag operations.
|
||||
- **MCP server** (`mcp/server.py`): Remove ~70 lines of collection logic. Add 3 bulk tool definitions. Remove `collection` param from `kb_search`, `kb_addnote`, `kb_upload_start`. Remove `kb_set_collection`.
|
||||
- **MCP engine client** (`mcp/engine.py`): Add bulk operation methods. Remove no longer needed code.
|
||||
- **CLI** (`client/cmd/`): New `bulk_remove.go`, `bulk_tag.go`, `bulk_set_tags.go` command files.
|
||||
- **CLI API client** (`client/internal/api/`): Add `Post` with JSON body support if not present.
|
||||
- **Breaking changes**: `kb_set_collection` MCP tool removed. `collection` parameter removed from `kb_search`, `kb_addnote`, `kb_upload_start` MCP tools. Any MCP clients using collections will need to switch to tags.
|
||||
Reference in New Issue
Block a user