Clarify hybrid semantic + full-text search in MCP descriptions
Agents were misreading kb_search as keyword-only because the vector/semantic
component was only mentioned in the negative ("fts_only: no vector similarity").
Lead with hybrid semantic + BM25 + RRF in the server instructions, kb_search
docstring, and MCP.md so agents recognise it as a vector search tool.
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# MCP Server (Agent Integration)
|
||||
|
||||
The MCP server exposes kb operations as native MCP tools, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI.
|
||||
The MCP server exposes kb operations as native MCP tools, so agents can search, add notes, upload files, and manage documents without shelling out to the CLI. `kb_search` is hybrid: dense vector embeddings (semantic similarity) fused with BM25 full-text ranking via Reciprocal Rank Fusion, so agents can ask natural-language questions and find conceptually related content even when the exact words don't match.
|
||||
|
||||
## Start the MCP server
|
||||
|
||||
@@ -27,7 +27,7 @@ docker run -d --name kb-mcp \
|
||||
|
||||
| Tool | Description |
|
||||
|---|---|
|
||||
| `kb_search` | Hybrid search with optional tag/type filters |
|
||||
| `kb_search` | Hybrid semantic (vector) + full-text search with tag/type filters |
|
||||
| `kb_addnote` | Add a text note (queued for async ingestion) |
|
||||
| `kb_update_note` | Update an existing note in place |
|
||||
| `kb_get` | Get document details by ID or source path |
|
||||
|
||||
+24
-10
@@ -44,11 +44,16 @@ _transport_security = TransportSecuritySettings(
|
||||
mcp = FastMCP(
|
||||
"kb",
|
||||
instructions=(
|
||||
"Knowledge base MCP server. Provides tools for searching, adding, and "
|
||||
"managing documents and notes. Use tags to organise and filter documents "
|
||||
"(e.g. tag notes with 'agent:mybot' and filter searches by that tag). "
|
||||
"This server requires Bearer token authentication — all requests are "
|
||||
"authenticated via the Authorization header at the HTTP transport layer."
|
||||
"Knowledge base MCP server with hybrid semantic + full-text search. "
|
||||
"kb_search uses dense vector embeddings (semantic similarity) fused with "
|
||||
"BM25 full-text ranking, so it finds conceptually related content even "
|
||||
"when the exact words don't match — agents can ask natural-language "
|
||||
"questions rather than guessing keywords. Also provides tools for adding "
|
||||
"notes, uploading files, and managing documents and tags. Use tags to "
|
||||
"organise and filter documents (e.g. tag notes with 'agent:mybot' and "
|
||||
"filter searches by that tag). This server requires Bearer token "
|
||||
"authentication — all requests are authenticated via the Authorization "
|
||||
"header at the HTTP transport layer."
|
||||
),
|
||||
transport_security=_transport_security,
|
||||
)
|
||||
@@ -62,17 +67,25 @@ async def kb_search(
|
||||
doc_type: str | None = None,
|
||||
fts_only: bool = False,
|
||||
) -> str:
|
||||
"""Search the knowledge base for relevant documents and notes.
|
||||
"""Hybrid semantic (vector) + full-text search over the knowledge base.
|
||||
|
||||
Returns ranked chunks matching the query, with text content, relevance scores,
|
||||
and document metadata.
|
||||
Combines dense vector embeddings (semantic similarity — finds conceptually
|
||||
related content even when the wording differs) with BM25 keyword ranking,
|
||||
fused via reciprocal rank fusion. Because the search is semantic, you can
|
||||
ask natural-language questions ("what did we decide about X?") rather than
|
||||
guessing the exact keywords used in the source documents.
|
||||
|
||||
Returns ranked chunks matching the query, with text content, relevance
|
||||
scores, and document metadata.
|
||||
|
||||
Args:
|
||||
query: The search query. Can be a natural language question or keywords.
|
||||
query: The search query — a natural language question or keywords.
|
||||
top: Maximum number of results to return (default 10).
|
||||
tags: Filter results to documents with ALL of these tags.
|
||||
doc_type: Filter by document type (e.g. "note", "pdf", "markdown", "code").
|
||||
fts_only: If true, use only full-text search (no vector similarity).
|
||||
fts_only: Disable the vector/semantic component and use only BM25
|
||||
keyword matching. Default false (hybrid mode). Set true only when
|
||||
you need exact-string matching (e.g. an error code, identifier).
|
||||
|
||||
Tips for complex queries:
|
||||
- Consider expanding into 2-3 variant phrasings and calling this tool multiple
|
||||
@@ -80,6 +93,7 @@ async def kb_search(
|
||||
"pension revaluation rules" and "how are pensions revalued" to cast a wider net.
|
||||
- For precision, rerank the returned results using your own judgement based on
|
||||
relevance to the original question.
|
||||
- Call kb_status to see which embedding model is in use.
|
||||
"""
|
||||
result = engine.search(
|
||||
query=query,
|
||||
|
||||
Reference in New Issue
Block a user