- Add bulk delete, bulk tags, and bulk set-tags engine endpoints (POST /api/v1/bulk/delete, /bulk/tags, /bulk/set-tags) - Filter-based selection: by tags, doc_type, ID list, ID range - Safety threshold (KB_BULK_SAFETY_PERCENT, default 70%) prevents accidental mass operations unless force=true - Synchronous execution with audit trail via jobs table - Add kb_bulk_delete, kb_bulk_tags, kb_bulk_set_tags MCP tools - Add kb bulk-remove, bulk-tag, bulk-set-tags CLI commands - Remove collection abstraction from MCP server (use tags instead) - Remove kb_set_collection MCP tool - Update SKILL.md, MCP.md, README.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
11 KiB
ADDED Requirements
Requirement: Common selection filter
All bulk engine endpoints SHALL accept a JSON body with the following optional selection fields, combined with AND logic:
document_ids(list of int) — match documents with these specific IDstags(list of str) — match documents that have ALL specified tagsdoc_type(str) — match documents with this document typefrom_id(int) — match documents with id >= this valueto_id(int) — match documents with id <= this value
At least one selection field MUST be present. If no selection fields are provided, the endpoint SHALL return 400 Bad Request.
Scenario: Filter by tags and doc_type
- WHEN a bulk endpoint receives
{"tags": ["draft"], "doc_type": "note"} - THEN it SHALL match only documents that have the tag "draft" AND have doc_type "note"
Scenario: Filter by ID range
- WHEN a bulk endpoint receives
{"from_id": 10, "to_id": 50} - THEN it SHALL match documents with id >= 10 AND id <= 50
Scenario: Filter by explicit IDs
- WHEN a bulk endpoint receives
{"document_ids": [1, 5, 12]} - THEN it SHALL match only documents with those specific IDs
Scenario: Combined filters
- WHEN a bulk endpoint receives
{"tags": ["agent:mybot"], "doc_type": "note", "from_id": 100} - THEN it SHALL match documents satisfying ALL three criteria
Scenario: No selection fields provided
- WHEN a bulk endpoint receives
{}or{"force": true}with no selection fields - THEN it SHALL return 400 Bad Request
Requirement: Safety threshold
All bulk endpoints SHALL enforce a safety threshold. Before executing, the engine SHALL count the matched documents and the total documents in the database. If matched / total * 100 exceeds the configured threshold, the request SHALL be rejected with 409 Conflict.
The response SHALL include: error ("safety_threshold_exceeded"), message (human-readable), matched (int), total (int), percent (float), and threshold (int).
The threshold SHALL default to 70 and be configurable via the KB_BULK_SAFETY_PERCENT environment variable (integer 0-100). A value of 0 disables the check.
The caller MAY override the threshold by including "force": true in the request body.
Scenario: Threshold exceeded
- GIVEN 1000 total documents and
KB_BULK_SAFETY_PERCENTis 70 - WHEN a bulk endpoint matches 750 documents (75%) without
force: true - THEN it SHALL return 409 with
matched: 750,total: 1000,percent: 75.0,threshold: 70
Scenario: Threshold not exceeded
- GIVEN 1000 total documents and
KB_BULK_SAFETY_PERCENTis 70 - WHEN a bulk endpoint matches 500 documents (50%) without
force: true - THEN the operation SHALL proceed normally
Scenario: Force override
- GIVEN 1000 total documents and a match of 900 (90%)
- WHEN the request includes
"force": true - THEN the operation SHALL proceed regardless of threshold
Scenario: Zero threshold
- GIVEN
KB_BULK_SAFETY_PERCENTis 0 - THEN the safety check SHALL be effectively disabled for all operations
Requirement: Synchronous response with audit log
All bulk endpoints SHALL execute synchronously and return a JSON response with:
job_id(int) — ID of the audit log entry in the jobs tablestatus(str) — "done" or "partial_failure"matched(int) — number of documents that matched the selectionsucceeded(int) — number of documents successfully processedfailed(int) — number of documents that failederrors(list) — array of{"document_id": int, "error": str}for each failure (empty on full success)
A job record SHALL be created in the jobs table with job_type set to the operation type. The filename field SHALL store a JSON representation of the selection filter. The error field SHALL store a JSON array of individual errors if any occurred.
Scenario: Full success
- WHEN a bulk operation matches 50 documents and all succeed
- THEN the response SHALL have
status: "done",matched: 50,succeeded: 50,failed: 0,errors: []
Scenario: Partial failure
- WHEN a bulk operation matches 50 documents but 2 fail
- THEN the response SHALL have
status: "partial_failure",matched: 50,succeeded: 48,failed: 2, anderrorslisting the 2 failures
Requirement: Bulk delete endpoint
The engine SHALL expose POST /api/v1/bulk/delete which permanently deletes all documents matching the selection filter. For each matched document, it SHALL delete embeddings from chunks_vec, delete the document row (cascading to chunks and document_tags), and delete any stored file from disk.
Database deletions SHALL be performed within a single transaction. File deletions SHALL occur after the transaction commits and SHALL be best-effort (failures logged but not counted as document failures).
Scenario: Bulk delete by tag
- WHEN
POST /api/v1/bulk/deletereceives{"tags": ["old", "draft"]} - THEN all documents with both tags "old" and "draft" SHALL be deleted
- AND their chunks, embeddings, tag associations, and stored files SHALL be removed
Scenario: Bulk delete with no matches
- WHEN
POST /api/v1/bulk/deletereceives a filter that matches 0 documents - THEN the response SHALL have
matched: 0,succeeded: 0,failed: 0
Requirement: Bulk tags endpoint
The engine SHALL expose POST /api/v1/bulk/tags which adds and/or removes tags on all documents matching the selection filter. The request body SHALL include the selection filter plus:
add(list of str, optional) — tags to addremove(list of str, optional) — tags to remove
At least one of add or remove MUST be present. The endpoint SHALL return 400 if neither is provided.
The endpoint SHALL update updated_at on all affected documents.
Scenario: Add and remove tags in one call
- WHEN
POST /api/v1/bulk/tagsreceives{"tags": ["agent:mybot"], "add": ["reviewed"], "remove": ["pending"]} - THEN all documents tagged "agent:mybot" SHALL have "reviewed" added and "pending" removed
Requirement: Bulk set-tags endpoint
The engine SHALL expose POST /api/v1/bulk/set-tags which replaces all tags on matched documents with a new set. The request body SHALL include the selection filter plus:
new_tags(list of str) — the replacement tag set
The endpoint SHALL remove all existing tag associations from matched documents, then apply the new set. It SHALL update updated_at on all affected documents.
Scenario: Replace all tags
- WHEN
POST /api/v1/bulk/set-tagsreceives{"doc_type": "note", "new_tags": ["clean", "final"]} - THEN all notes SHALL have their existing tags removed and replaced with "clean" and "final"
Requirement: Jobs table extension
The jobs table SHALL be extended with a job_type column (TEXT, default "ingest") to distinguish ingestion jobs from bulk operation audit entries. Valid values: "ingest", "bulk_delete", "bulk_tags", "bulk_set_tags".
Existing jobs SHALL default to job_type = "ingest". The existing jobs list endpoint and CLI kb jobs command SHALL continue to work unchanged.
Scenario: Migration adds column
- GIVEN an existing database without the
job_typecolumn - WHEN the engine starts
- THEN the column SHALL be added with default value "ingest"
Requirement: Engine config for safety threshold
The engine Config class SHALL read KB_BULK_SAFETY_PERCENT from the environment as an integer (default 70, range 0-100). This value SHALL be used as the default safety threshold for all bulk endpoints.
Requirement: MCP bulk delete tool
The MCP server SHALL expose a kb_bulk_delete tool with parameters: document_ids (optional list of int), tags (optional list of str), doc_type (optional str), from_id (optional int), to_id (optional int), force (optional bool).
The tool SHALL call POST /api/v1/bulk/delete on the engine via the engine client and return the JSON response.
The tool description SHALL clearly state that tags is a selection filter (which documents to delete), not tags to delete.
Scenario: MCP bulk delete by tag
- WHEN
kb_bulk_delete(tags=["old"])is called - THEN the engine client SHALL send
POST /api/v1/bulk/deletewith{"tags": ["old"]} - AND the tool SHALL return the engine's JSON response
Requirement: MCP bulk tags tool
The MCP server SHALL expose a kb_bulk_tags tool with parameters: document_ids, tags, doc_type, from_id, to_id (selection filters), plus add (optional list of str), remove (optional list of str), and force (optional bool).
The tool description SHALL clearly distinguish tags (selection filter) from add/remove (tag changes to apply).
Scenario: MCP bulk tag update
- WHEN
kb_bulk_tags(tags=["agent:mybot"], add=["reviewed"], remove=["draft"])is called - THEN the engine client SHALL send the appropriate
POST /api/v1/bulk/tagsrequest
Requirement: MCP bulk set-tags tool
The MCP server SHALL expose a kb_bulk_set_tags tool with parameters: document_ids, tags, doc_type, from_id, to_id (selection filters), plus new_tags (list of str) and force (optional bool).
Scenario: MCP bulk set tags
- WHEN
kb_bulk_set_tags(doc_type="note", new_tags=["clean"])is called - THEN the engine client SHALL send
POST /api/v1/bulk/set-tagswith{"doc_type": "note", "new_tags": ["clean"]}
Requirement: MCP engine client bulk methods
The MCP engine client (mcp/engine.py) SHALL provide three new methods:
bulk_delete(document_ids?, tags?, doc_type?, from_id?, to_id?, force?)→ dictbulk_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, add?, remove?, force?)→ dictbulk_set_tags(document_ids?, tags?, doc_type?, from_id?, to_id?, new_tags?, force?)→ dict
Each SHALL send a POST request to the corresponding /api/v1/bulk/* endpoint with the parameters as a JSON body. Each SHALL raise on non-2xx status codes, consistent with existing methods.
Requirement: CLI bulk-remove command
The CLI SHALL expose a kb bulk-remove command with flags: --tags (comma-separated), --type, --ids (comma-separated), --from-id, --to-id, --force/-f, --yes/-y.
Without --yes, the CLI SHALL first display the match count and ask for interactive confirmation before proceeding.
The command SHALL call POST /api/v1/bulk/delete with the constructed filter.
Scenario: CLI bulk remove with confirmation
- WHEN
kb bulk-remove --tags "draft,old" --type noteis run without--yes - THEN the CLI SHALL display "This will delete N documents matching: tags=[draft,old] type=note" and prompt "Proceed? [y/N]"
Scenario: CLI bulk remove with --yes
- WHEN
kb bulk-remove --tags "draft" --yesis run - THEN the CLI SHALL proceed without prompting
Requirement: CLI bulk-tag command
The CLI SHALL expose a kb bulk-tag command with the same filter flags as bulk-remove, plus --add and --remove (comma-separated tag lists).
The command SHALL call POST /api/v1/bulk/tags with the constructed filter and tag changes.
Requirement: CLI bulk-set-tags command
The CLI SHALL expose a kb bulk-set-tags command with the filter flags, plus --set (comma-separated list of replacement tags).
The command SHALL call POST /api/v1/bulk/set-tags with the constructed filter and new_tags.