Files
2026-04-04 22:50:19 +01:00

4.7 KiB

Why

Bulk operations on documents (delete, tag, retag) currently require one API/MCP call per document. When an LLM manages hundreds or thousands of documents, this means hundreds of tool calls — burning tokens, adding latency, and creating fragile multi-step flows that can fail partway through.

Additionally, the "collection" abstraction in the MCP server adds complexity without real benefit. Collections are implemented as collection:-prefixed tags, but this convention is only enforced in the MCP layer — the CLI and engine don't know about it. This creates inconsistency and extra code. Tags alone, with a naming convention communicated via system prompt or configuration, achieve the same namespace isolation more simply and uniformly.

What Changes

1. Remove collections from MCP server

Strip all collection logic from mcp/server.py:

  • Remove COLLECTION_TAG_PREFIX, DEFAULT_COLLECTION, and all collection helper functions
  • Remove collection parameter from kb_search, kb_addnote, kb_upload_start
  • Remove kb_set_collection tool entirely
  • Remove _process_document / _process_search_results collection-tag stripping
  • Update MCP server instructions to explain tag-based namespace convention

2. Add bulk engine endpoints

Three new endpoints in the engine API:

  • POST /api/v1/bulk/delete — Delete multiple documents matching a filter
  • POST /api/v1/bulk/tags — Add/remove tags on multiple documents matching a filter
  • POST /api/v1/bulk/set-tags — Replace all tags on multiple documents matching a filter

All accept a common selection filter (combinable with AND logic):

  • document_ids — explicit list of IDs
  • tags — documents matching ALL specified tags
  • doc_type — documents of this type
  • from_id / to_id — ID range (inclusive)

At least one selection criterion is required.

Safety threshold: If the operation would affect more than N% of all documents (default 70%, configurable via KB_BULK_SAFETY_PERCENT env var), the request is rejected with a 409 response showing what would be affected. The caller must re-send with force: true to proceed.

Response model: Synchronous execution with summary response. The operation is logged to the jobs table for audit trail:

{
  "job_id": 42,
  "status": "done",
  "matched": 750,
  "succeeded": 748,
  "failed": 2,
  "errors": [
    {"document_id": 42, "error": "file locked"},
    {"document_id": 99, "error": "not found"}
  ]
}

3. Add bulk MCP tools

Expose the bulk engine endpoints as MCP tools:

  • kb_bulk_delete — bulk delete with filter selection
  • kb_bulk_tags — bulk add/remove tags with filter selection
  • kb_bulk_set_tags — bulk replace tags with filter selection

These are thin wrappers around the engine bulk endpoints — no collection translation, no special logic.

4. Add bulk CLI commands

  • kb bulk-remove — bulk delete with --tags, --type, --ids, --from-id, --to-id, --force flags
  • kb bulk-tag — bulk tag/untag with --add, --remove, and the same filter flags
  • kb bulk-set-tags — bulk replace tags with --tags (new tags) and the same filter flags

All show a confirmation prompt with match count before executing (unless --yes).

Capabilities

New Capabilities

  • bulk-operations: Engine endpoints, MCP tools, and CLI commands for bulk delete, tag, and set-tags operations with filter-based selection and safety threshold.

Modified Capabilities

  • mcp-document-management: Remove kb_set_collection tool. Remove collection parameter from all tools.

Removed Capabilities

  • mcp-collections: The collection abstraction (collection helpers, collection parameters, collection tag stripping) is removed from the MCP server entirely.

Impact

  • Engine API (engine/kb/routes/): New bulk.py route module with 3 endpoints. New bulk job type in jobs table.
  • Engine database (engine/kb/database.py): Helper functions for bulk selection queries and bulk delete/tag operations.
  • MCP server (mcp/server.py): Remove ~70 lines of collection logic. Add 3 bulk tool definitions. Remove collection param from kb_search, kb_addnote, kb_upload_start. Remove kb_set_collection.
  • MCP engine client (mcp/engine.py): Add bulk operation methods. Remove no longer needed code.
  • CLI (client/cmd/): New bulk_remove.go, bulk_tag.go, bulk_set_tags.go command files.
  • CLI API client (client/internal/api/): Add Post with JSON body support if not present.
  • Breaking changes: kb_set_collection MCP tool removed. collection parameter removed from kb_search, kb_addnote, kb_upload_start MCP tools. Any MCP clients using collections will need to switch to tags.