steve/kb

Files

T

steve e9a282ddb1 Document KB_BULK_SAFETY_PERCENT in README, DEVELOPER, MCP, and SKILL docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-04 22:43:42 +01:00

8.1 KiB

Raw Permalink Blame History

kb-search skill

Search, manage, and add to the user's personal knowledge base containing PDFs, Word docs, HTML, markdown, code files, and text notes.

When to use

User asks a question that might be answered by their stored documents, notes, or code
User explicitly says "check my notes", "search kb", "look in my knowledge base", "what do my docs say about..."
User references documents or notes they've previously stored
User asks "how do I..." style questions that their knowledge base likely covers
User wants to save a note, add a file, or manage their knowledge base

Adding notes

kb addnote "remember to update DNS records"                # add a note
kb addnote "server room is building 3, floor 2" --tags ops # add a tagged note

The note text must be a single quoted argument.

Search (primary use case)

kb search "<query>" --top 10 --format json

Returns JSON with ranked results combining full-text and semantic search.

Flags:

-n, --top N — number of results (default: 10)
--tags tag1,tag2 — filter by tags (AND logic)
--type pdf|markdown|code|note — filter by document type
--format json|human — output format (always use json for parsing)
--fts-only — keyword search only (skip semantic)
--vec-only — semantic search only (skip keyword)
--threshold FLOAT — minimum score cutoff

Adding files

kb addfile report.pdf                           # single file
kb addfile report.pdf --tags admin,reference    # with tags
kb addfile ~/docs/ --recursive                  # directory (recursive)
kb addfile ~/docs/ --recursive --tags reference # directory with tags

Supported file types: .pdf, .docx, .html, .md, .txt, .py, .sh, .go. Unsupported extensions are rejected before upload.

Flags:

--tags tag1,tag2 — tags (comma-separated)
-r, --recursive — recursively add directory contents

Document management

kb list --format json                    # list all documents
kb list --type pdf --format json         # filter by type
kb list --tags admin --format json       # filter by tags
kb info <doc_id> --format json           # document details with chunks
kb export <doc_id> -o file.pdf           # download original file
kb remove <doc_id>                       # remove (prompts for confirmation)
kb remove <doc_id> --yes                 # remove without confirmation

Tag management

kb tags --format json                    # list all tags with counts
kb tag <doc_id> --add important,ops      # add tags to a document
kb tag <doc_id> --remove draft           # remove tags from a document

Bulk operations

Operate on multiple documents at once using filter-based selection. Filters combine with AND logic.

Filter flags (shared across all bulk commands):

--tags tag1,tag2 — match documents with ALL specified tags
--type pdf|note|... — match by document type
--ids 1,5,12 — match specific document IDs
--from-id N — match documents with id >= N
--to-id N — match documents with id <= N
--force / -f — override safety threshold (blocks operations affecting >70% of all documents)
--yes / -y — skip confirmation prompt

# Bulk delete
kb bulk-remove --tags "draft,old" --type note --yes             # delete matching docs
kb bulk-remove --from-id 10 --to-id 50 --yes                   # delete by ID range
kb bulk-remove --ids "3,7,12" --yes                             # delete specific IDs

# Bulk tag add/remove
kb bulk-tag --tags "agent:mybot" --add "reviewed" --remove "pending" --yes
kb bulk-tag --type note --add "archived" --yes                  # tag all notes

# Bulk replace tags
kb bulk-set-tags --tags "old-scheme" --set "new-scheme,migrated" --yes

All bulk commands return a summary: matched count, succeeded count, failed count, and errors. A safety threshold prevents accidentally affecting more than 70% of documents unless --force is used. The threshold is configurable on the engine via KB_BULK_SAFETY_PERCENT (integer 0-100, default 70; 0 disables).

Jobs (ingestion queue)

kb jobs --format json                    # list recent jobs
kb jobs --status failed --format json    # filter by status
kb jobs <job_id> --format json           # job details

Examples

kb examples                              # show common usage examples

Engine status and maintenance

kb status --format json                  # engine status, GPU info, DB stats
kb reindex --yes                         # re-embed all chunks (skip confirmation)

Global flags

All commands support:

--format json|human — output format (always use json for machine parsing)
--engine <url> — engine API URL (default: http://localhost:8000)
--api-key <key> — API key for authentication

Search output format

{
  "query": "how to install git",
  "results": [
    {
      "chunk_id": 1423,
      "score": 0.031,
      "text": "To install the latest version of git from source...",
      "chunk_index": 3,
      "chunk_metadata": {"page": 12},
      "title": "Git Admin Guide",
      "doc_type": "pdf",
      "source_path": "/home/user/docs/git-admin.pdf",
      "created_at": "2026-03-15T10:30:00",
      "tags": ["git", "admin"]
    }
  ],
  "total_matches": 47,
  "returned": 10
}

How to answer search queries

Run kb search "<query>" --top 10 --format json
Read the returned chunks
Synthesise a natural language answer from the top results
ALWAYS cite sources: "According to [title] (p.X)..." or "From [title], section [header]..."
If results have low scores (all below 0.01) or returned: 0, tell the user: "I couldn't find anything in your knowledge base about this"
If initial results seem off-target, try refining the query and searching again

Multi-query strategy

For complex questions, search multiple times with different queries:

Decompose the question into sub-queries
Run each query separately
Combine and deduplicate results across queries
Synthesise a unified answer citing all relevant sources

Example:

User: "What's the difference between git rebase and merge?"

Query 1: kb search "git rebase explanation" --top 5 --format json
Query 2: kb search "git merge explanation" --top 5 --format json
Query 3: kb search "git rebase vs merge" --top 5 --format json

Filtering tips

Use filters when the question implies a specific domain:

Code question → --type code
From a specific topic → --tags <topic>
Check available tags first: kb tags --format json

Updating notes

kb updatenote 42 "revised note content"           # update note by ID

Updates the text of an existing note in place, preserving its ID, creation timestamp, and tags. Re-chunks and re-embeds the new text.

MCP server (agent integration)

For agent-to-agent integration, kb provides an MCP server alongside the CLI. The MCP server exposes the same operations as native MCP tools over Streamable HTTP transport, which agents can connect to directly without subprocess overhead.

MCP tools: kb_search, kb_addnote, kb_update_note, kb_get, kb_delete, kb_status, kb_jobs, kb_upload_start, kb_upload_chunk, kb_upload_finish, kb_bulk_delete, kb_bulk_tags, kb_bulk_set_tags.

Use tags to separate agent data from user documents (e.g. tag all agent notes with agent:mybot and filter by that tag when searching). This convention is communicated via system prompt — no special server-side enforcement needed.

If the kb engine is already running via Docker Compose, add the MCP server by deploying the kb-mcp service from the same compose file. Agents connect to it on port 3000 (default).

Important notes

Always use --format json for machine parsing
The score field is relative, not absolute — compare scores within a result set
chunk_metadata.page is only present for PDF documents
chunk_metadata.section_header is only present for markdown documents with headers
Results are already ranked by relevance (hybrid FTS + vector search)
Duplicate files are detected at upload time (HTTP 409) — the client handles this gracefully

8.1 KiB Raw Permalink Blame History