b5a203d2aa
- Add bulk delete, bulk tags, and bulk set-tags engine endpoints (POST /api/v1/bulk/delete, /bulk/tags, /bulk/set-tags) - Filter-based selection: by tags, doc_type, ID list, ID range - Safety threshold (KB_BULK_SAFETY_PERCENT, default 70%) prevents accidental mass operations unless force=true - Synchronous execution with audit trail via jobs table - Add kb_bulk_delete, kb_bulk_tags, kb_bulk_set_tags MCP tools - Add kb bulk-remove, bulk-tag, bulk-set-tags CLI commands - Remove collection abstraction from MCP server (use tags instead) - Remove kb_set_collection MCP tool - Update SKILL.md, MCP.md, README.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
224 lines
8.0 KiB
Markdown
224 lines
8.0 KiB
Markdown
# kb-search skill
|
|
|
|
Search, manage, and add to the user's personal knowledge base containing PDFs, Word docs, HTML, markdown, code files, and text notes.
|
|
|
|
## When to use
|
|
|
|
- User asks a question that might be answered by their stored documents, notes, or code
|
|
- User explicitly says "check my notes", "search kb", "look in my knowledge base", "what do my docs say about..."
|
|
- User references documents or notes they've previously stored
|
|
- User asks "how do I..." style questions that their knowledge base likely covers
|
|
- User wants to save a note, add a file, or manage their knowledge base
|
|
|
|
## Adding notes
|
|
|
|
```bash
|
|
kb addnote "remember to update DNS records" # add a note
|
|
kb addnote "server room is building 3, floor 2" --tags ops # add a tagged note
|
|
```
|
|
|
|
The note text must be a single quoted argument.
|
|
|
|
## Search (primary use case)
|
|
|
|
```bash
|
|
kb search "<query>" --top 10 --format json
|
|
```
|
|
|
|
Returns JSON with ranked results combining full-text and semantic search.
|
|
|
|
**Flags:**
|
|
- `-n, --top N` — number of results (default: 10)
|
|
- `--tags tag1,tag2` — filter by tags (AND logic)
|
|
- `--type pdf|markdown|code|note` — filter by document type
|
|
- `--format json|human` — output format (always use json for parsing)
|
|
- `--fts-only` — keyword search only (skip semantic)
|
|
- `--vec-only` — semantic search only (skip keyword)
|
|
- `--threshold FLOAT` — minimum score cutoff
|
|
|
|
## Adding files
|
|
|
|
```bash
|
|
kb addfile report.pdf # single file
|
|
kb addfile report.pdf --tags admin,reference # with tags
|
|
kb addfile ~/docs/ --recursive # directory (recursive)
|
|
kb addfile ~/docs/ --recursive --tags reference # directory with tags
|
|
```
|
|
|
|
Supported file types: `.pdf`, `.docx`, `.html`, `.md`, `.txt`, `.py`, `.sh`, `.go`. Unsupported extensions are rejected before upload.
|
|
|
|
**Flags:**
|
|
- `--tags tag1,tag2` — tags (comma-separated)
|
|
- `-r, --recursive` — recursively add directory contents
|
|
|
|
## Document management
|
|
|
|
```bash
|
|
kb list --format json # list all documents
|
|
kb list --type pdf --format json # filter by type
|
|
kb list --tags admin --format json # filter by tags
|
|
kb info <doc_id> --format json # document details with chunks
|
|
kb export <doc_id> -o file.pdf # download original file
|
|
kb remove <doc_id> # remove (prompts for confirmation)
|
|
kb remove <doc_id> --yes # remove without confirmation
|
|
```
|
|
|
|
## Tag management
|
|
|
|
```bash
|
|
kb tags --format json # list all tags with counts
|
|
kb tag <doc_id> --add important,ops # add tags to a document
|
|
kb tag <doc_id> --remove draft # remove tags from a document
|
|
```
|
|
|
|
## Bulk operations
|
|
|
|
Operate on multiple documents at once using filter-based selection. Filters combine with AND logic.
|
|
|
|
**Filter flags (shared across all bulk commands):**
|
|
- `--tags tag1,tag2` — match documents with ALL specified tags
|
|
- `--type pdf|note|...` — match by document type
|
|
- `--ids 1,5,12` — match specific document IDs
|
|
- `--from-id N` — match documents with id >= N
|
|
- `--to-id N` — match documents with id <= N
|
|
- `--force` / `-f` — override safety threshold (blocks operations affecting >70% of all documents)
|
|
- `--yes` / `-y` — skip confirmation prompt
|
|
|
|
```bash
|
|
# Bulk delete
|
|
kb bulk-remove --tags "draft,old" --type note --yes # delete matching docs
|
|
kb bulk-remove --from-id 10 --to-id 50 --yes # delete by ID range
|
|
kb bulk-remove --ids "3,7,12" --yes # delete specific IDs
|
|
|
|
# Bulk tag add/remove
|
|
kb bulk-tag --tags "agent:mybot" --add "reviewed" --remove "pending" --yes
|
|
kb bulk-tag --type note --add "archived" --yes # tag all notes
|
|
|
|
# Bulk replace tags
|
|
kb bulk-set-tags --tags "old-scheme" --set "new-scheme,migrated" --yes
|
|
```
|
|
|
|
All bulk commands return a summary: matched count, succeeded count, failed count, and errors.
|
|
A safety threshold prevents accidentally affecting more than 70% of documents unless `--force` is used.
|
|
|
|
## Jobs (ingestion queue)
|
|
|
|
```bash
|
|
kb jobs --format json # list recent jobs
|
|
kb jobs --status failed --format json # filter by status
|
|
kb jobs <job_id> --format json # job details
|
|
```
|
|
|
|
## Examples
|
|
|
|
```bash
|
|
kb examples # show common usage examples
|
|
```
|
|
|
|
## Engine status and maintenance
|
|
|
|
```bash
|
|
kb status --format json # engine status, GPU info, DB stats
|
|
kb reindex --yes # re-embed all chunks (skip confirmation)
|
|
```
|
|
|
|
## Global flags
|
|
|
|
All commands support:
|
|
- `--format json|human` — output format (always use `json` for machine parsing)
|
|
- `--engine <url>` — engine API URL (default: http://localhost:8000)
|
|
- `--api-key <key>` — API key for authentication
|
|
|
|
## Search output format
|
|
|
|
```json
|
|
{
|
|
"query": "how to install git",
|
|
"results": [
|
|
{
|
|
"chunk_id": 1423,
|
|
"score": 0.031,
|
|
"text": "To install the latest version of git from source...",
|
|
"chunk_index": 3,
|
|
"chunk_metadata": {"page": 12},
|
|
"title": "Git Admin Guide",
|
|
"doc_type": "pdf",
|
|
"source_path": "/home/user/docs/git-admin.pdf",
|
|
"created_at": "2026-03-15T10:30:00",
|
|
"tags": ["git", "admin"]
|
|
}
|
|
],
|
|
"total_matches": 47,
|
|
"returned": 10
|
|
}
|
|
```
|
|
|
|
## How to answer search queries
|
|
|
|
1. Run `kb search "<query>" --top 10 --format json`
|
|
2. Read the returned chunks
|
|
3. Synthesise a natural language answer from the top results
|
|
4. **ALWAYS cite sources**: "According to [title] (p.X)..." or "From [title], section [header]..."
|
|
5. If results have low scores (all below 0.01) or `returned: 0`, tell the user: "I couldn't find anything in your knowledge base about this"
|
|
6. If initial results seem off-target, try refining the query and searching again
|
|
|
|
## Multi-query strategy
|
|
|
|
For complex questions, search multiple times with different queries:
|
|
|
|
- Decompose the question into sub-queries
|
|
- Run each query separately
|
|
- Combine and deduplicate results across queries
|
|
- Synthesise a unified answer citing all relevant sources
|
|
|
|
Example:
|
|
```
|
|
User: "What's the difference between git rebase and merge?"
|
|
|
|
Query 1: kb search "git rebase explanation" --top 5 --format json
|
|
Query 2: kb search "git merge explanation" --top 5 --format json
|
|
Query 3: kb search "git rebase vs merge" --top 5 --format json
|
|
```
|
|
|
|
## Filtering tips
|
|
|
|
Use filters when the question implies a specific domain:
|
|
|
|
- Code question → `--type code`
|
|
- From a specific topic → `--tags <topic>`
|
|
- Check available tags first: `kb tags --format json`
|
|
|
|
## Updating notes
|
|
|
|
```bash
|
|
kb updatenote 42 "revised note content" # update note by ID
|
|
```
|
|
|
|
Updates the text of an existing note in place, preserving its ID, creation timestamp, and tags. Re-chunks and re-embeds the new text.
|
|
|
|
## MCP server (agent integration)
|
|
|
|
For agent-to-agent integration, kb provides an MCP server alongside the CLI. The MCP server
|
|
exposes the same operations as native MCP tools over Streamable HTTP transport, which agents
|
|
can connect to directly without subprocess overhead.
|
|
|
|
**MCP tools:** `kb_search`, `kb_addnote`, `kb_update_note`, `kb_get`, `kb_delete`, `kb_status`,
|
|
`kb_jobs`, `kb_upload_start`, `kb_upload_chunk`, `kb_upload_finish`, `kb_bulk_delete`,
|
|
`kb_bulk_tags`, `kb_bulk_set_tags`.
|
|
|
|
Use tags to separate agent data from user documents (e.g. tag all agent notes with
|
|
`agent:mybot` and filter by that tag when searching). This convention is communicated
|
|
via system prompt — no special server-side enforcement needed.
|
|
|
|
If the kb engine is already running via Docker Compose, add the MCP server by deploying the
|
|
`kb-mcp` service from the same compose file. Agents connect to it on port 3000 (default).
|
|
|
|
## Important notes
|
|
|
|
- Always use `--format json` for machine parsing
|
|
- The `score` field is relative, not absolute — compare scores within a result set
|
|
- `chunk_metadata.page` is only present for PDF documents
|
|
- `chunk_metadata.section_header` is only present for markdown documents with headers
|
|
- Results are already ranked by relevance (hybrid FTS + vector search)
|
|
- Duplicate files are detected at upload time (HTTP 409) — the client handles this gracefully
|