# kb-search skill Search, manage, and add to the user's personal knowledge base containing PDFs, Word docs, HTML, markdown, code files, and text notes. ## When to use - User asks a question that might be answered by their stored documents, notes, or code - User explicitly says "check my notes", "search kb", "look in my knowledge base", "what do my docs say about..." - User references documents or notes they've previously stored - User asks "how do I..." style questions that their knowledge base likely covers - User wants to save a note, add a file, or manage their knowledge base ## Adding notes ```bash kb addnote "remember to update DNS records" # add a note kb addnote "server room is building 3, floor 2" --tags ops # add a tagged note ``` The note text must be a single quoted argument. ## Search (primary use case) ```bash kb search "" --top 10 --format json ``` Returns JSON with ranked results combining full-text and semantic search. **Flags:** - `-n, --top N` — number of results (default: 10) - `--tags tag1,tag2` — filter by tags (AND logic) - `--type pdf|markdown|code|note` — filter by document type - `--format json|human` — output format (always use json for parsing) - `--fts-only` — keyword search only (skip semantic) - `--vec-only` — semantic search only (skip keyword) - `--threshold FLOAT` — minimum score cutoff ## Adding files ```bash kb addfile report.pdf # single file kb addfile report.pdf --tags admin,reference # with tags kb addfile ~/docs/ --recursive # directory (recursive) kb addfile ~/docs/ --recursive --tags reference # directory with tags ``` Supported file types: `.pdf`, `.docx`, `.html`, `.md`, `.txt`, `.py`, `.sh`, `.go`. Unsupported extensions are rejected before upload. **Flags:** - `--tags tag1,tag2` — tags (comma-separated) - `-r, --recursive` — recursively add directory contents ## Document management ```bash kb list --format json # list all documents kb list --type pdf --format json # filter by type kb list --tags admin --format json # filter by tags kb info --format json # document details with chunks kb export -o file.pdf # download original file kb remove # remove (prompts for confirmation) kb remove --yes # remove without confirmation ``` ## Tag management ```bash kb tags --format json # list all tags with counts kb tag --add important,ops # add tags to a document kb tag --remove draft # remove tags from a document ``` ## Bulk operations Operate on multiple documents at once using filter-based selection. Filters combine with AND logic. **Filter flags (shared across all bulk commands):** - `--tags tag1,tag2` — match documents with ALL specified tags - `--type pdf|note|...` — match by document type - `--ids 1,5,12` — match specific document IDs - `--from-id N` — match documents with id >= N - `--to-id N` — match documents with id <= N - `--force` / `-f` — override safety threshold (blocks operations affecting >70% of all documents) - `--yes` / `-y` — skip confirmation prompt ```bash # Bulk delete kb bulk-remove --tags "draft,old" --type note --yes # delete matching docs kb bulk-remove --from-id 10 --to-id 50 --yes # delete by ID range kb bulk-remove --ids "3,7,12" --yes # delete specific IDs # Bulk tag add/remove kb bulk-tag --tags "agent:mybot" --add "reviewed" --remove "pending" --yes kb bulk-tag --type note --add "archived" --yes # tag all notes # Bulk replace tags kb bulk-set-tags --tags "old-scheme" --set "new-scheme,migrated" --yes ``` All bulk commands return a summary: matched count, succeeded count, failed count, and errors. A safety threshold prevents accidentally affecting more than 70% of documents unless `--force` is used. The threshold is configurable on the engine via `KB_BULK_SAFETY_PERCENT` (integer 0-100, default 70; 0 disables). ## Jobs (ingestion queue) ```bash kb jobs --format json # list recent jobs kb jobs --status failed --format json # filter by status kb jobs --format json # job details ``` ## Examples ```bash kb examples # show common usage examples ``` ## Engine status and maintenance ```bash kb status --format json # engine status, GPU info, DB stats kb reindex --yes # re-embed all chunks (skip confirmation) ``` ## Global flags All commands support: - `--format json|human` — output format (always use `json` for machine parsing) - `--engine ` — engine API URL (default: http://localhost:8000) - `--api-key ` — API key for authentication ## Search output format ```json { "query": "how to install git", "results": [ { "chunk_id": 1423, "score": 0.031, "text": "To install the latest version of git from source...", "chunk_index": 3, "chunk_metadata": {"page": 12}, "title": "Git Admin Guide", "doc_type": "pdf", "source_path": "/home/user/docs/git-admin.pdf", "created_at": "2026-03-15T10:30:00", "tags": ["git", "admin"] } ], "total_matches": 47, "returned": 10 } ``` ## How to answer search queries 1. Run `kb search "" --top 10 --format json` 2. Read the returned chunks 3. Synthesise a natural language answer from the top results 4. **ALWAYS cite sources**: "According to [title] (p.X)..." or "From [title], section [header]..." 5. If results have low scores (all below 0.01) or `returned: 0`, tell the user: "I couldn't find anything in your knowledge base about this" 6. If initial results seem off-target, try refining the query and searching again ## Multi-query strategy For complex questions, search multiple times with different queries: - Decompose the question into sub-queries - Run each query separately - Combine and deduplicate results across queries - Synthesise a unified answer citing all relevant sources Example: ``` User: "What's the difference between git rebase and merge?" Query 1: kb search "git rebase explanation" --top 5 --format json Query 2: kb search "git merge explanation" --top 5 --format json Query 3: kb search "git rebase vs merge" --top 5 --format json ``` ## Filtering tips Use filters when the question implies a specific domain: - Code question → `--type code` - From a specific topic → `--tags ` - Check available tags first: `kb tags --format json` ## Updating notes ```bash kb updatenote 42 "revised note content" # update note by ID ``` Updates the text of an existing note in place, preserving its ID, creation timestamp, and tags. Re-chunks and re-embeds the new text. ## MCP server (agent integration) For agent-to-agent integration, kb provides an MCP server alongside the CLI. The MCP server exposes the same operations as native MCP tools over Streamable HTTP transport, which agents can connect to directly without subprocess overhead. **MCP tools:** `kb_search`, `kb_addnote`, `kb_update_note`, `kb_get`, `kb_delete`, `kb_status`, `kb_jobs`, `kb_upload_start`, `kb_upload_chunk`, `kb_upload_finish`, `kb_bulk_delete`, `kb_bulk_tags`, `kb_bulk_set_tags`. Use tags to separate agent data from user documents (e.g. tag all agent notes with `agent:mybot` and filter by that tag when searching). This convention is communicated via system prompt — no special server-side enforcement needed. If the kb engine is already running via Docker Compose, add the MCP server by deploying the `kb-mcp` service from the same compose file. Agents connect to it on port 3000 (default). ## Important notes - Always use `--format json` for machine parsing - The `score` field is relative, not absolute — compare scores within a result set - `chunk_metadata.page` is only present for PDF documents - `chunk_metadata.section_header` is only present for markdown documents with headers - Results are already ranked by relevance (hybrid FTS + vector search) - Duplicate files are detected at upload time (HTTP 409) — the client handles this gracefully