Files
steve 9aab79d49b v2 restructure: Go client, Docker engine, release tooling
- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv)
- Add Go client with cross-platform build (client/)
- Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/)
- Add VERSION files for client and engine, wired into builds
- Add release.sh for automated build, tag, release, and Docker push
- Update README with build/release docs and ROCm migration note
- Clean up .gitignore for v2 project structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 21:52:25 +00:00

7.3 KiB

1. Project scaffolding

  • 1.1 Create v2 project structure: engine/ (Python/FastAPI) and client/ (Go) directories at repo root
  • 1.2 Set up engine/pyproject.toml with dependencies: fastapi, uvicorn, sentence-transformers, sqlite-vec, docling, pyyaml
  • 1.3 Set up client/go.mod with dependencies: cobra, gopkg.in/yaml.v3
  • 1.4 Create engine entry point (engine/main.py) with uvicorn startup, eager model loading, and readiness gating

2. Database layer

  • 2.1 Implement database module (engine/kb/database.py): connection factory with WAL mode, schema initialisation (documents, chunks, chunks_fts, chunks_vec, tags, document_tags, config tables)
  • 2.2 Add jobs table to schema: id, filename, status (queued/processing/done/failed/skipped), doc_type, tags_json, error, document_id, chunk_count, created_at, completed_at, staging_path
  • 2.3 Implement job CRUD functions: create_job, get_job, list_jobs, update_job_status
  • 3.1 Implement embeddings module (engine/kb/embeddings.py): model loading with device resolution (auto/cpu/cuda), embed_texts, get_model_dim — model loaded once and cached in-process
  • 3.2 Implement search module (engine/kb/search.py): FTS5 search, vector search via sqlite-vec, RRF merge, filter support (tags, doc_type, fts_only, vec_only, threshold)

4. Ingestion pipelines

  • 4.1 Implement file type detection (engine/kb/ingest/detector.py): extension-based detection for pdf, markdown, code, note
  • 4.2 Implement Docling pipeline (engine/kb/ingest/docling.py): PDF/DOCX conversion with AcceleratorOptions device control, hierarchy and fixed chunking
  • 4.3 Implement Markdown pipeline (engine/kb/ingest/markdown.py): header-based splitting with min/max token bounds
  • 4.4 Implement code pipeline (engine/kb/ingest/code.py): AST-based chunking for Python, regex for Bash/Go, fallback fixed-size
  • 4.5 Implement note pipeline (engine/kb/ingest/note.py): whole-text chunking with auto-title

5. Async job queue and background worker

  • 5.1 Implement staging manager (engine/kb/staging.py): write uploaded file/note to staging directory, generate staging path, cleanup after processing
  • 5.2 Implement background worker (engine/kb/worker.py): asyncio background task that polls for queued jobs, processes sequentially (detect type → chunk → embed → insert), updates job status on success/failure/skip (duplicate detection)
  • 5.3 Wire worker into FastAPI lifespan: start worker on app startup, graceful shutdown on app stop

6. API routes

  • 6.1 Implement health endpoint: GET /api/v1/health — returns 503 during startup, 200 when ready
  • 6.2 Implement search endpoint: POST /api/v1/search — accepts query, top, tags, doc_type, fts_only, vec_only, threshold in JSON body
  • 6.3 Implement ingestion endpoint: POST /api/v1/jobs — accepts multipart file upload or note text field with optional tags/doc_type/title metadata, writes to staging, creates job, returns 202
  • 6.4 Implement job status endpoints: GET /api/v1/jobs (list with status filter), GET /api/v1/jobs/{id} (details)
  • 6.5 Implement document endpoints: GET /api/v1/documents (list with filters), GET /api/v1/documents/{id} (details), DELETE /api/v1/documents/{id} (remove)
  • 6.6 Implement tag endpoints: GET /api/v1/tags (list), PUT /api/v1/documents/{id}/tags (add/remove)
  • 6.7 Implement status endpoint: GET /api/v1/status — model info, GPU info, DB stats, queue stats
  • 6.8 Implement reindex endpoint: POST /api/v1/reindex — re-embed all chunks with current model
  • 6.9 Implement API key authentication middleware: check KB_API_KEY env, validate Bearer token, skip when unset

7. Engine configuration

  • 7.1 Implement config module (engine/kb/config.py): read all settings from environment variables (KB_DATA_DIR, KB_MODEL, KB_DEVICE, KB_INGEST_DEVICE, KB_API_KEY), apply defaults

8. Docker images

  • 8.1 Create Dockerfile.nvidia: CUDA runtime base, system deps (libgl1, libglib2.0, poppler), uv install, onnxruntime-gpu overlay, engine entrypoint
  • 8.2 Create Dockerfile.rocm: ROCm/PyTorch base, system deps, uv install, onnxruntime-rocm, engine entrypoint
  • 8.3 Create compose.nvidia.yaml: NVIDIA runtime, GPU reservation, bind mount for /data, environment variables, restart policy, port mapping
  • 8.4 Create compose.rocm.yaml: ROCm device passthrough (/dev/kfd, /dev/dri), bind mount, environment variables, restart policy, port mapping
  • 8.5 Create .dockerignore for engine context

9. Go client — project setup and config

  • 9.1 Initialise Cobra CLI structure: root command with --engine, --format, --api-key persistent flags
  • 9.2 Implement client config loading: read ~/.kb/client.yaml, merge with env vars (KB_ENGINE_URL, KB_API_KEY), merge with CLI flags
  • 9.3 Implement HTTP client helper: base URL handling, Bearer token injection, JSON request/response helpers, error formatting for connection failures and HTTP errors

10. Go client — commands

  • 10.1 Implement kb search <query> command: POST to /api/v1/search, human and JSON output formatting
  • 10.2 Implement kb add <path> command: file discovery (single file, directory with --recursive), multipart upload to /api/v1/jobs, human summary output ("Queued: N files"), JSON output with job IDs
  • 10.3 Implement kb add --note <text> command: submit note via multipart to /api/v1/jobs
  • 10.4 Implement kb jobs command: list jobs (with --status filter), single job detail via kb jobs <id>
  • 10.5 Implement kb list command: GET /api/v1/documents with --type and --tags filters
  • 10.6 Implement kb info <id> command: GET /api/v1/documents/{id}
  • 10.7 Implement kb remove <id> command: confirmation prompt (skip with --yes), DELETE /api/v1/documents/{id}
  • 10.8 Implement kb tags command: GET /api/v1/tags
  • 10.9 Implement kb tag <id> command: --add and --remove flags, PUT /api/v1/documents/{id}/tags
  • 10.10 Implement kb status command: GET /api/v1/status with human formatting

11. Go client — build and distribution

  • 11.1 Create Makefile or build script: cross-compile for linux/amd64, linux/arm64, darwin/amd64, darwin/arm64, windows/amd64
  • 11.2 Add version injection via -ldflags at build time

12. Integration testing

  • 12.1 Test engine startup: health endpoint transitions from 503 → 200 after model load
  • 12.2 Test full ingestion flow: upload PDF via API → job queued → job completes → document appears in list → chunks searchable
  • 12.3 Test note ingestion: submit note via API → job completes → note searchable
  • 12.4 Test search: hybrid search returns ranked results, filters work, fts_only/vec_only modes work
  • 12.5 Test document management: list, info, remove, tag operations via API
  • 12.6 Test job queue: multiple uploads queue correctly, failures don't block queue, duplicates are skipped
  • 12.7 Test API authentication: requests rejected without key when KB_API_KEY set, accepted with valid key, all requests pass when unset
  • 12.8 Test Docker GPU: kb doctor-style verification that GPU is accessible inside container (NVIDIA build)
  • 12.9 Test data portability: copy data directory, start engine on new container, verify all documents and search work