92 lines
4.7 KiB
Markdown
92 lines
4.7 KiB
Markdown
## Why
|
|
|
|
Bulk operations on documents (delete, tag, retag) currently require one API/MCP call per document. When an LLM manages hundreds or thousands of documents, this means hundreds of tool calls — burning tokens, adding latency, and creating fragile multi-step flows that can fail partway through.
|
|
|
|
Additionally, the "collection" abstraction in the MCP server adds complexity without real benefit. Collections are implemented as `collection:`-prefixed tags, but this convention is only enforced in the MCP layer — the CLI and engine don't know about it. This creates inconsistency and extra code. Tags alone, with a naming convention communicated via system prompt or configuration, achieve the same namespace isolation more simply and uniformly.
|
|
|
|
## What Changes
|
|
|
|
### 1. Remove collections from MCP server
|
|
|
|
Strip all collection logic from `mcp/server.py`:
|
|
- Remove `COLLECTION_TAG_PREFIX`, `DEFAULT_COLLECTION`, and all collection helper functions
|
|
- Remove `collection` parameter from `kb_search`, `kb_addnote`, `kb_upload_start`
|
|
- Remove `kb_set_collection` tool entirely
|
|
- Remove `_process_document` / `_process_search_results` collection-tag stripping
|
|
- Update MCP server instructions to explain tag-based namespace convention
|
|
|
|
### 2. Add bulk engine endpoints
|
|
|
|
Three new endpoints in the engine API:
|
|
|
|
- **POST /api/v1/bulk/delete** — Delete multiple documents matching a filter
|
|
- **POST /api/v1/bulk/tags** — Add/remove tags on multiple documents matching a filter
|
|
- **POST /api/v1/bulk/set-tags** — Replace all tags on multiple documents matching a filter
|
|
|
|
All accept a common **selection filter** (combinable with AND logic):
|
|
- `document_ids` — explicit list of IDs
|
|
- `tags` — documents matching ALL specified tags
|
|
- `doc_type` — documents of this type
|
|
- `from_id` / `to_id` — ID range (inclusive)
|
|
|
|
At least one selection criterion is required.
|
|
|
|
**Safety threshold**: If the operation would affect more than N% of all documents (default 70%, configurable via `KB_BULK_SAFETY_PERCENT` env var), the request is rejected with a 409 response showing what would be affected. The caller must re-send with `force: true` to proceed.
|
|
|
|
**Response model**: Synchronous execution with summary response. The operation is logged to the jobs table for audit trail:
|
|
|
|
```json
|
|
{
|
|
"job_id": 42,
|
|
"status": "done",
|
|
"matched": 750,
|
|
"succeeded": 748,
|
|
"failed": 2,
|
|
"errors": [
|
|
{"document_id": 42, "error": "file locked"},
|
|
{"document_id": 99, "error": "not found"}
|
|
]
|
|
}
|
|
```
|
|
|
|
### 3. Add bulk MCP tools
|
|
|
|
Expose the bulk engine endpoints as MCP tools:
|
|
- `kb_bulk_delete` — bulk delete with filter selection
|
|
- `kb_bulk_tags` — bulk add/remove tags with filter selection
|
|
- `kb_bulk_set_tags` — bulk replace tags with filter selection
|
|
|
|
These are thin wrappers around the engine bulk endpoints — no collection translation, no special logic.
|
|
|
|
### 4. Add bulk CLI commands
|
|
|
|
- `kb bulk-remove` — bulk delete with `--tags`, `--type`, `--ids`, `--from-id`, `--to-id`, `--force` flags
|
|
- `kb bulk-tag` — bulk tag/untag with `--add`, `--remove`, and the same filter flags
|
|
- `kb bulk-set-tags` — bulk replace tags with `--tags` (new tags) and the same filter flags
|
|
|
|
All show a confirmation prompt with match count before executing (unless `--yes`).
|
|
|
|
## Capabilities
|
|
|
|
### New Capabilities
|
|
|
|
- `bulk-operations`: Engine endpoints, MCP tools, and CLI commands for bulk delete, tag, and set-tags operations with filter-based selection and safety threshold.
|
|
|
|
### Modified Capabilities
|
|
|
|
- `mcp-document-management`: Remove `kb_set_collection` tool. Remove `collection` parameter from all tools.
|
|
|
|
### Removed Capabilities
|
|
|
|
- `mcp-collections`: The collection abstraction (collection helpers, collection parameters, collection tag stripping) is removed from the MCP server entirely.
|
|
|
|
## Impact
|
|
|
|
- **Engine API** (`engine/kb/routes/`): New `bulk.py` route module with 3 endpoints. New `bulk` job type in jobs table.
|
|
- **Engine database** (`engine/kb/database.py`): Helper functions for bulk selection queries and bulk delete/tag operations.
|
|
- **MCP server** (`mcp/server.py`): Remove ~70 lines of collection logic. Add 3 bulk tool definitions. Remove `collection` param from `kb_search`, `kb_addnote`, `kb_upload_start`. Remove `kb_set_collection`.
|
|
- **MCP engine client** (`mcp/engine.py`): Add bulk operation methods. Remove no longer needed code.
|
|
- **CLI** (`client/cmd/`): New `bulk_remove.go`, `bulk_tag.go`, `bulk_set_tags.go` command files.
|
|
- **CLI API client** (`client/internal/api/`): Add `Post` with JSON body support if not present.
|
|
- **Breaking changes**: `kb_set_collection` MCP tool removed. `collection` parameter removed from `kb_search`, `kb_addnote`, `kb_upload_start` MCP tools. Any MCP clients using collections will need to switch to tags.
|