# kb-search skill

Search, manage, and add to the user's personal knowledge base containing PDFs, Word docs, HTML, markdown, code files, and text notes.

## When to use

- User asks a question that might be answered by their stored documents, notes, or code
- User explicitly says "check my notes", "search kb", "look in my knowledge base", "what do my docs say about..."
- User references documents or notes they've previously stored
- User asks "how do I..." style questions that their knowledge base likely covers
- User wants to save a note, add a file, or manage their knowledge base

## Adding notes

```bash
kb addnote "remember to update DNS records"                # add a note
kb addnote "server room is building 3, floor 2" --tags ops # add a tagged note
```

The note text must be a single quoted argument.

## Search (primary use case)

```bash
kb search "<query>" --top 10 --format json
```

Returns JSON with ranked results combining full-text and semantic search.

**Flags:**
- `-n, --top N` — number of results (default: 10)
- `--tags tag1,tag2` — filter by tags (AND logic)
- `--type pdf|markdown|code|note` — filter by document type
- `--format json|human` — output format (always use json for parsing)
- `--fts-only` — keyword search only (skip semantic)
- `--vec-only` — semantic search only (skip keyword)
- `--threshold FLOAT` — minimum score cutoff

## Adding files

```bash
kb addfile report.pdf                           # single file
kb addfile report.pdf --tags admin,reference    # with tags
kb addfile ~/docs/ --recursive                  # directory (recursive)
kb addfile ~/docs/ --recursive --tags reference # directory with tags
```

Supported file types: `.pdf`, `.docx`, `.html`, `.md`, `.txt`, `.py`, `.sh`, `.go`. Unsupported extensions are rejected before upload.

**Flags:**
- `--tags tag1,tag2` — tags (comma-separated)
- `-r, --recursive` — recursively add directory contents

## Document management

```bash
kb list --format json                    # list all documents
kb list --type pdf --format json         # filter by type
kb list --tags admin --format json       # filter by tags
kb info <doc_id> --format json           # document details with chunks
kb export <doc_id> -o file.pdf           # download original file
kb remove <doc_id>                       # remove (prompts for confirmation)
kb remove <doc_id> --yes                 # remove without confirmation
```

## Tag management

```bash
kb tags --format json                    # list all tags with counts
kb tag <doc_id> --add important,ops      # add tags to a document
kb tag <doc_id> --remove draft           # remove tags from a document
```

## Bulk operations

Operate on multiple documents at once using filter-based selection. Filters combine with AND logic.

**Filter flags (shared across all bulk commands):**
- `--tags tag1,tag2` — match documents with ALL specified tags
- `--type pdf|note|...` — match by document type
- `--ids 1,5,12` — match specific document IDs
- `--from-id N` — match documents with id >= N
- `--to-id N` — match documents with id <= N
- `--force` / `-f` — override safety threshold (blocks operations affecting >70% of all documents)
- `--yes` / `-y` — skip confirmation prompt

```bash
# Bulk delete
kb bulk-remove --tags "draft,old" --type note --yes             # delete matching docs
kb bulk-remove --from-id 10 --to-id 50 --yes                   # delete by ID range
kb bulk-remove --ids "3,7,12" --yes                             # delete specific IDs

# Bulk tag add/remove
kb bulk-tag --tags "agent:mybot" --add "reviewed" --remove "pending" --yes
kb bulk-tag --type note --add "archived" --yes                  # tag all notes

# Bulk replace tags
kb bulk-set-tags --tags "old-scheme" --set "new-scheme,migrated" --yes
```

All bulk commands return a summary: matched count, succeeded count, failed count, and errors.
A safety threshold prevents accidentally affecting more than 70% of documents unless `--force` is used.
The threshold is configurable on the engine via `KB_BULK_SAFETY_PERCENT` (integer 0-100, default 70; 0 disables).

## Jobs (ingestion queue)

```bash
kb jobs --format json                    # list recent jobs
kb jobs --status failed --format json    # filter by status
kb jobs <job_id> --format json           # job details
```

## Examples

```bash
kb examples                              # show common usage examples
```

## Engine status and maintenance

```bash
kb status --format json                  # engine status, GPU info, DB stats
kb reindex --yes                         # re-embed all chunks (skip confirmation)
```

## Global flags

All commands support:
- `--format json|human` — output format (always use `json` for machine parsing)
- `--engine <url>` — engine API URL (default: http://localhost:8000)
- `--api-key <key>` — API key for authentication

## Search output format

```json
{
  "query": "how to install git",
  "results": [
    {
      "chunk_id": 1423,
      "score": 0.031,
      "text": "To install the latest version of git from source...",
      "chunk_index": 3,
      "chunk_metadata": {"page": 12},
      "title": "Git Admin Guide",
      "doc_type": "pdf",
      "source_path": "/home/user/docs/git-admin.pdf",
      "created_at": "2026-03-15T10:30:00",
      "tags": ["git", "admin"]
    }
  ],
  "total_matches": 47,
  "returned": 10
}
```

## How to answer search queries

1. Run `kb search "<query>" --top 10 --format json`
2. Read the returned chunks
3. Synthesise a natural language answer from the top results
4. **ALWAYS cite sources**: "According to [title] (p.X)..." or "From [title], section [header]..."
5. If results have low scores (all below 0.01) or `returned: 0`, tell the user: "I couldn't find anything in your knowledge base about this"
6. If initial results seem off-target, try refining the query and searching again

## Multi-query strategy

For complex questions, search multiple times with different queries:

- Decompose the question into sub-queries
- Run each query separately
- Combine and deduplicate results across queries
- Synthesise a unified answer citing all relevant sources

Example:
```
User: "What's the difference between git rebase and merge?"

Query 1: kb search "git rebase explanation" --top 5 --format json
Query 2: kb search "git merge explanation" --top 5 --format json
Query 3: kb search "git rebase vs merge" --top 5 --format json
```

## Filtering tips

Use filters when the question implies a specific domain:

- Code question → `--type code`
- From a specific topic → `--tags <topic>`
- Check available tags first: `kb tags --format json`

## Updating notes

```bash
kb updatenote 42 "revised note content"           # update note by ID
```

Updates the text of an existing note in place, preserving its ID, creation timestamp, and tags. Re-chunks and re-embeds the new text.

## MCP server (agent integration)

For agent-to-agent integration, kb provides an MCP server alongside the CLI. The MCP server
exposes the same operations as native MCP tools over Streamable HTTP transport, which agents
can connect to directly without subprocess overhead.

**MCP tools:** `kb_search`, `kb_addnote`, `kb_update_note`, `kb_get`, `kb_delete`, `kb_status`,
`kb_jobs`, `kb_upload_start`, `kb_upload_chunk`, `kb_upload_finish`, `kb_bulk_delete`,
`kb_bulk_tags`, `kb_bulk_set_tags`.

Use tags to separate agent data from user documents (e.g. tag all agent notes with
`agent:mybot` and filter by that tag when searching). This convention is communicated
via system prompt — no special server-side enforcement needed.

If the kb engine is already running via Docker Compose, add the MCP server by deploying the
`kb-mcp` service from the same compose file. Agents connect to it on port 3000 (default).

## Important notes

- Always use `--format json` for machine parsing
- The `score` field is relative, not absolute — compare scores within a result set
- `chunk_metadata.page` is only present for PDF documents
- `chunk_metadata.section_header` is only present for markdown documents with headers
- Results are already ranked by relevance (hybrid FTS + vector search)
- Duplicate files are detected at upload time (HTTP 409) — the client handles this gracefully