kb/openspec/changes/archive/2026-03-25-kb-v2-client-server/tasks.md

## 1. Project scaffolding

- [x] 1.1 Create v2 project structure: `engine/` (Python/FastAPI) and `client/` (Go) directories at repo root
- [x] 1.2 Set up `engine/pyproject.toml` with dependencies: fastapi, uvicorn, sentence-transformers, sqlite-vec, docling, pyyaml
- [x] 1.3 Set up `client/go.mod` with dependencies: cobra, gopkg.in/yaml.v3
- [x] 1.4 Create engine entry point (`engine/main.py`) with uvicorn startup, eager model loading, and readiness gating

## 2. Database layer

- [x] 2.1 Implement database module (`engine/kb/database.py`): connection factory with WAL mode, schema initialisation (documents, chunks, chunks_fts, chunks_vec, tags, document_tags, config tables)
- [x] 2.2 Add `jobs` table to schema: id, filename, status (queued/processing/done/failed/skipped), doc_type, tags_json, error, document_id, chunk_count, created_at, completed_at, staging_path
- [x] 2.3 Implement job CRUD functions: create_job, get_job, list_jobs, update_job_status

## 3. Embeddings and search

- [x] 3.1 Implement embeddings module (`engine/kb/embeddings.py`): model loading with device resolution (auto/cpu/cuda), embed_texts, get_model_dim — model loaded once and cached in-process
- [x] 3.2 Implement search module (`engine/kb/search.py`): FTS5 search, vector search via sqlite-vec, RRF merge, filter support (tags, doc_type, fts_only, vec_only, threshold)

## 4. Ingestion pipelines

- [x] 4.1 Implement file type detection (`engine/kb/ingest/detector.py`): extension-based detection for pdf, markdown, code, note
- [x] 4.2 Implement Docling pipeline (`engine/kb/ingest/docling.py`): PDF/DOCX conversion with AcceleratorOptions device control, hierarchy and fixed chunking
- [x] 4.3 Implement Markdown pipeline (`engine/kb/ingest/markdown.py`): header-based splitting with min/max token bounds
- [x] 4.4 Implement code pipeline (`engine/kb/ingest/code.py`): AST-based chunking for Python, regex for Bash/Go, fallback fixed-size
- [x] 4.5 Implement note pipeline (`engine/kb/ingest/note.py`): whole-text chunking with auto-title

## 5. Async job queue and background worker

- [x] 5.1 Implement staging manager (`engine/kb/staging.py`): write uploaded file/note to staging directory, generate staging path, cleanup after processing
- [x] 5.2 Implement background worker (`engine/kb/worker.py`): asyncio background task that polls for queued jobs, processes sequentially (detect type → chunk → embed → insert), updates job status on success/failure/skip (duplicate detection)
- [x] 5.3 Wire worker into FastAPI lifespan: start worker on app startup, graceful shutdown on app stop

## 6. API routes

- [x] 6.1 Implement health endpoint: `GET /api/v1/health` — returns 503 during startup, 200 when ready
- [x] 6.2 Implement search endpoint: `POST /api/v1/search` — accepts query, top, tags, doc_type, fts_only, vec_only, threshold in JSON body
- [x] 6.3 Implement ingestion endpoint: `POST /api/v1/jobs` — accepts multipart file upload or note text field with optional tags/doc_type/title metadata, writes to staging, creates job, returns 202
- [x] 6.4 Implement job status endpoints: `GET /api/v1/jobs` (list with status filter), `GET /api/v1/jobs/{id}` (details)
- [x] 6.5 Implement document endpoints: `GET /api/v1/documents` (list with filters), `GET /api/v1/documents/{id}` (details), `DELETE /api/v1/documents/{id}` (remove)
- [x] 6.6 Implement tag endpoints: `GET /api/v1/tags` (list), `PUT /api/v1/documents/{id}/tags` (add/remove)
- [x] 6.7 Implement status endpoint: `GET /api/v1/status` — model info, GPU info, DB stats, queue stats
- [x] 6.8 Implement reindex endpoint: `POST /api/v1/reindex` — re-embed all chunks with current model
- [x] 6.9 Implement API key authentication middleware: check `KB_API_KEY` env, validate Bearer token, skip when unset

## 7. Engine configuration

- [x] 7.1 Implement config module (`engine/kb/config.py`): read all settings from environment variables (KB_DATA_DIR, KB_MODEL, KB_DEVICE, KB_INGEST_DEVICE, KB_API_KEY), apply defaults

## 8. Docker images

- [x] 8.1 Create `Dockerfile.nvidia`: CUDA runtime base, system deps (libgl1, libglib2.0, poppler), uv install, onnxruntime-gpu overlay, engine entrypoint
- [x] 8.2 Create `Dockerfile.rocm`: ROCm/PyTorch base, system deps, uv install, onnxruntime-rocm, engine entrypoint
- [x] 8.3 Create `compose.nvidia.yaml`: NVIDIA runtime, GPU reservation, bind mount for /data, environment variables, restart policy, port mapping
- [x] 8.4 Create `compose.rocm.yaml`: ROCm device passthrough (/dev/kfd, /dev/dri), bind mount, environment variables, restart policy, port mapping
- [x] 8.5 Create `.dockerignore` for engine context

## 9. Go client — project setup and config

- [x] 9.1 Initialise Cobra CLI structure: root command with `--engine`, `--format`, `--api-key` persistent flags
- [x] 9.2 Implement client config loading: read `~/.kb/client.yaml`, merge with env vars (KB_ENGINE_URL, KB_API_KEY), merge with CLI flags
- [x] 9.3 Implement HTTP client helper: base URL handling, Bearer token injection, JSON request/response helpers, error formatting for connection failures and HTTP errors

## 10. Go client — commands

- [x] 10.1 Implement `kb search <query>` command: POST to /api/v1/search, human and JSON output formatting
- [x] 10.2 Implement `kb add <path>` command: file discovery (single file, directory with --recursive), multipart upload to /api/v1/jobs, human summary output ("Queued: N files"), JSON output with job IDs
- [x] 10.3 Implement `kb add --note <text>` command: submit note via multipart to /api/v1/jobs
- [x] 10.4 Implement `kb jobs` command: list jobs (with --status filter), single job detail via `kb jobs <id>`
- [x] 10.5 Implement `kb list` command: GET /api/v1/documents with --type and --tags filters
- [x] 10.6 Implement `kb info <id>` command: GET /api/v1/documents/{id}
- [x] 10.7 Implement `kb remove <id>` command: confirmation prompt (skip with --yes), DELETE /api/v1/documents/{id}
- [x] 10.8 Implement `kb tags` command: GET /api/v1/tags
- [x] 10.9 Implement `kb tag <id>` command: --add and --remove flags, PUT /api/v1/documents/{id}/tags
- [x] 10.10 Implement `kb status` command: GET /api/v1/status with human formatting

## 11. Go client — build and distribution

- [x] 11.1 Create Makefile or build script: cross-compile for linux/amd64, linux/arm64, darwin/amd64, darwin/arm64, windows/amd64
- [x] 11.2 Add version injection via `-ldflags` at build time

## 12. Integration testing

- [x] 12.1 Test engine startup: health endpoint transitions from 503 → 200 after model load
- [x] 12.2 Test full ingestion flow: upload PDF via API → job queued → job completes → document appears in list → chunks searchable
- [x] 12.3 Test note ingestion: submit note via API → job completes → note searchable
- [x] 12.4 Test search: hybrid search returns ranked results, filters work, fts_only/vec_only modes work
- [x] 12.5 Test document management: list, info, remove, tag operations via API
- [x] 12.6 Test job queue: multiple uploads queue correctly, failures don't block queue, duplicates are skipped
- [x] 12.7 Test API authentication: requests rejected without key when KB_API_KEY set, accepted with valid key, all requests pass when unset
- [x] 12.8 Test Docker GPU: `kb doctor`-style verification that GPU is accessible inside container (NVIDIA build)
- [x] 12.9 Test data portability: copy data directory, start engine on new container, verify all documents and search work