9aab79d49b
- Remove v1 Python CLI (src/kb_search/, tests/, root pyproject.toml, uv.lock, .venv) - Add Go client with cross-platform build (client/) - Add FastAPI engine with NVIDIA and multi-stage ROCm Dockerfiles (engine/) - Add VERSION files for client and engine, wired into builds - Add release.sh for automated build, tag, release, and Docker push - Update README with build/release docs and ROCm migration note - Clean up .gitignore for v2 project structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7.3 KiB
7.3 KiB
1. Project scaffolding
- 1.1 Create v2 project structure:
engine/(Python/FastAPI) andclient/(Go) directories at repo root - 1.2 Set up
engine/pyproject.tomlwith dependencies: fastapi, uvicorn, sentence-transformers, sqlite-vec, docling, pyyaml - 1.3 Set up
client/go.modwith dependencies: cobra, gopkg.in/yaml.v3 - 1.4 Create engine entry point (
engine/main.py) with uvicorn startup, eager model loading, and readiness gating
2. Database layer
- 2.1 Implement database module (
engine/kb/database.py): connection factory with WAL mode, schema initialisation (documents, chunks, chunks_fts, chunks_vec, tags, document_tags, config tables) - 2.2 Add
jobstable to schema: id, filename, status (queued/processing/done/failed/skipped), doc_type, tags_json, error, document_id, chunk_count, created_at, completed_at, staging_path - 2.3 Implement job CRUD functions: create_job, get_job, list_jobs, update_job_status
3. Embeddings and search
- 3.1 Implement embeddings module (
engine/kb/embeddings.py): model loading with device resolution (auto/cpu/cuda), embed_texts, get_model_dim — model loaded once and cached in-process - 3.2 Implement search module (
engine/kb/search.py): FTS5 search, vector search via sqlite-vec, RRF merge, filter support (tags, doc_type, fts_only, vec_only, threshold)
4. Ingestion pipelines
- 4.1 Implement file type detection (
engine/kb/ingest/detector.py): extension-based detection for pdf, markdown, code, note - 4.2 Implement Docling pipeline (
engine/kb/ingest/docling.py): PDF/DOCX conversion with AcceleratorOptions device control, hierarchy and fixed chunking - 4.3 Implement Markdown pipeline (
engine/kb/ingest/markdown.py): header-based splitting with min/max token bounds - 4.4 Implement code pipeline (
engine/kb/ingest/code.py): AST-based chunking for Python, regex for Bash/Go, fallback fixed-size - 4.5 Implement note pipeline (
engine/kb/ingest/note.py): whole-text chunking with auto-title
5. Async job queue and background worker
- 5.1 Implement staging manager (
engine/kb/staging.py): write uploaded file/note to staging directory, generate staging path, cleanup after processing - 5.2 Implement background worker (
engine/kb/worker.py): asyncio background task that polls for queued jobs, processes sequentially (detect type → chunk → embed → insert), updates job status on success/failure/skip (duplicate detection) - 5.3 Wire worker into FastAPI lifespan: start worker on app startup, graceful shutdown on app stop
6. API routes
- 6.1 Implement health endpoint:
GET /api/v1/health— returns 503 during startup, 200 when ready - 6.2 Implement search endpoint:
POST /api/v1/search— accepts query, top, tags, doc_type, fts_only, vec_only, threshold in JSON body - 6.3 Implement ingestion endpoint:
POST /api/v1/jobs— accepts multipart file upload or note text field with optional tags/doc_type/title metadata, writes to staging, creates job, returns 202 - 6.4 Implement job status endpoints:
GET /api/v1/jobs(list with status filter),GET /api/v1/jobs/{id}(details) - 6.5 Implement document endpoints:
GET /api/v1/documents(list with filters),GET /api/v1/documents/{id}(details),DELETE /api/v1/documents/{id}(remove) - 6.6 Implement tag endpoints:
GET /api/v1/tags(list),PUT /api/v1/documents/{id}/tags(add/remove) - 6.7 Implement status endpoint:
GET /api/v1/status— model info, GPU info, DB stats, queue stats - 6.8 Implement reindex endpoint:
POST /api/v1/reindex— re-embed all chunks with current model - 6.9 Implement API key authentication middleware: check
KB_API_KEYenv, validate Bearer token, skip when unset
7. Engine configuration
- 7.1 Implement config module (
engine/kb/config.py): read all settings from environment variables (KB_DATA_DIR, KB_MODEL, KB_DEVICE, KB_INGEST_DEVICE, KB_API_KEY), apply defaults
8. Docker images
- 8.1 Create
Dockerfile.nvidia: CUDA runtime base, system deps (libgl1, libglib2.0, poppler), uv install, onnxruntime-gpu overlay, engine entrypoint - 8.2 Create
Dockerfile.rocm: ROCm/PyTorch base, system deps, uv install, onnxruntime-rocm, engine entrypoint - 8.3 Create
compose.nvidia.yaml: NVIDIA runtime, GPU reservation, bind mount for /data, environment variables, restart policy, port mapping - 8.4 Create
compose.rocm.yaml: ROCm device passthrough (/dev/kfd, /dev/dri), bind mount, environment variables, restart policy, port mapping - 8.5 Create
.dockerignorefor engine context
9. Go client — project setup and config
- 9.1 Initialise Cobra CLI structure: root command with
--engine,--format,--api-keypersistent flags - 9.2 Implement client config loading: read
~/.kb/client.yaml, merge with env vars (KB_ENGINE_URL, KB_API_KEY), merge with CLI flags - 9.3 Implement HTTP client helper: base URL handling, Bearer token injection, JSON request/response helpers, error formatting for connection failures and HTTP errors
10. Go client — commands
- 10.1 Implement
kb search <query>command: POST to /api/v1/search, human and JSON output formatting - 10.2 Implement
kb add <path>command: file discovery (single file, directory with --recursive), multipart upload to /api/v1/jobs, human summary output ("Queued: N files"), JSON output with job IDs - 10.3 Implement
kb add --note <text>command: submit note via multipart to /api/v1/jobs - 10.4 Implement
kb jobscommand: list jobs (with --status filter), single job detail viakb jobs <id> - 10.5 Implement
kb listcommand: GET /api/v1/documents with --type and --tags filters - 10.6 Implement
kb info <id>command: GET /api/v1/documents/{id} - 10.7 Implement
kb remove <id>command: confirmation prompt (skip with --yes), DELETE /api/v1/documents/{id} - 10.8 Implement
kb tagscommand: GET /api/v1/tags - 10.9 Implement
kb tag <id>command: --add and --remove flags, PUT /api/v1/documents/{id}/tags - 10.10 Implement
kb statuscommand: GET /api/v1/status with human formatting
11. Go client — build and distribution
- 11.1 Create Makefile or build script: cross-compile for linux/amd64, linux/arm64, darwin/amd64, darwin/arm64, windows/amd64
- 11.2 Add version injection via
-ldflagsat build time
12. Integration testing
- 12.1 Test engine startup: health endpoint transitions from 503 → 200 after model load
- 12.2 Test full ingestion flow: upload PDF via API → job queued → job completes → document appears in list → chunks searchable
- 12.3 Test note ingestion: submit note via API → job completes → note searchable
- 12.4 Test search: hybrid search returns ranked results, filters work, fts_only/vec_only modes work
- 12.5 Test document management: list, info, remove, tag operations via API
- 12.6 Test job queue: multiple uploads queue correctly, failures don't block queue, duplicates are skipped
- 12.7 Test API authentication: requests rejected without key when KB_API_KEY set, accepted with valid key, all requests pass when unset
- 12.8 Test Docker GPU:
kb doctor-style verification that GPU is accessible inside container (NVIDIA build) - 12.9 Test data portability: copy data directory, start engine on new container, verify all documents and search work