diff --git a/specifications/SPEC.md b/specifications/SPEC.md index edaa856..40dc0f4 100644 --- a/specifications/SPEC.md +++ b/specifications/SPEC.md @@ -104,6 +104,9 @@ accounts whitelist_in_enabled INTEGER -- 0 | 1 whitelist_out_enabled INTEGER -- 0 | 1 subject_regex TEXT -- nullable; blank/null = no subject filter + process_backlog INTEGER -- 0 | 1; baseline policy for newly-seen folders + -- 0 (default) = floor at current max UID + -- 1 = floor at 0 (process existing mail) whitelist_in account_id INTEGER FK @@ -113,19 +116,26 @@ whitelist_out account_id INTEGER FK address TEXT -read_pointers +folder_state -- one row per folder ever seen account_id INTEGER FK folder TEXT uidvalidity INTEGER - last_uid INTEGER + floor_uid INTEGER -- nothing at or below this is ever "new" PRIMARY KEY (account_id, folder) +acked -- individual processed UIDs above the floor + account_id INTEGER FK + folder TEXT + uidvalidity INTEGER + uid INTEGER + PRIMARY KEY (account_id, folder, uid) + audit_log id INTEGER PK ts TEXT -- RFC3339 UTC account TEXT - action TEXT -- 'list' | 'get' | 'send' - target TEXT -- folder or recipient set + action TEXT -- 'list' | 'get' | 'send' | 'ack' | 'search' + target TEXT -- folder, UID(s), or recipient set result TEXT -- 'allowed' | 'blocked' reason TEXT -- nullable; populated on block @@ -138,9 +148,22 @@ settings Notes: - Folders are agent-specified; there is no folder whitelist. Read state is tracked per `(account, folder)`. -- `read_pointers` stores `uidvalidity`; if the server reports a different `UIDVALIDITY` - for a folder than the stored value, the pointer is reset (treated as `last_uid = 0`) and - the new `uidvalidity` recorded. +- **"New" is a seen-set, not a watermark.** A message is "new" when it exists in the + folder, its `uid > floor_uid`, and its `uid` is not in `acked`. This makes + acknowledgement per-message and order-independent — essential when the agent fans + processing out to concurrent subagents that finish out of order. +- **Floor baseline.** On first contact with a folder, `floor_uid` is set from the + account's `process_backlog` policy: the current highest UID (default — existing mail is + treated as already handled) or `0` (process the existing backlog). +- **Compaction (internal, invisible to the agent).** When `acked` holds a contiguous run + of UIDs immediately above `floor_uid`, that run is collapsed: `floor_uid` advances past + it and the rows are deleted. This bounds storage without changing what counts as new. A + folder processed strictly in order degenerates to a single floor value with an empty + `acked` set (i.e. the watermark case); out-of-order processing leaves short-lived holes + above the floor until they fill in. +- **UIDVALIDITY change.** If the server reports a different `UIDVALIDITY` for a folder than + the stored value, UIDs are no longer comparable: `folder_state` and `acked` for that + folder are reset and the floor re-baselined per `process_backlog`. ## 7. Command surface @@ -148,22 +171,40 @@ Notes: All agent commands emit a single JSON object (Section 8) and nothing else on stdout. -**`emcli list --account --folder [--new] [--limit N]`** +All read commands are **stateless** — they never mutate floor or ack state. The only +command that advances read state is `ack`. + +**`emcli list --account --folder [--new] [--before ] [--since ] [--limit N]`** - Returns message headers only: `uid`, `from`, `to`, `subject`, `date`, `message_id`, - `has_attachments`. -- `--new` returns only messages with `uid` greater than the stored pointer for - `(account, folder)`, then advances the pointer to the highest UID returned. -- Without `--new`, the pointer is not advanced. -- `--limit` caps the number of messages returned (default applied if omitted; see 7.3). + `has_attachments`. Newest-first. +- `--new` filters to messages that are new per the seen-set rule (`uid > floor_uid` and + not in `acked`). It does **not** advance any state. +- `--before ` / `--since ` page through history by UID cursor (e.g. page older by + passing the lowest UID from the previous page as `--before`). +- `--limit` caps results (default applied if omitted; see 7.3). - Whitelist-in and subject-regex filtering are applied before results are returned - (Section 9). + (Section 9); filtered messages are invisible in every mode. **`emcli get --account --folder --uid `** - Returns full message: headers, decoded plain-text body, and attachments as `{name, size, mime, content_b64}`. +- Does **not** ack the message — fetching to inspect is distinct from consuming. - If the message is filtered by whitelist-in or subject-regex, returns an error envelope (not-found) — the agent cannot retrieve filtered mail. +**`emcli search --account --folder [--from ] [--subject-contains ] [--text ] [--since ] [--before ] [--limit N]`** +- Server-side IMAP SEARCH across the whole folder, regardless of floor/ack state. Returns + the same headers-only shape as `list`. Useful for finding specific historical mail. +- Subject to the same inbound filtering as `list`/`get`: filtered messages never appear. +- Stateless. + +**`emcli ack --account --folder --uid …`** +- Marks one or more UIDs as processed (adds them to the `acked` set; triggers compaction). +- Idempotent, batchable, and order-independent — safe to call from concurrent subagents. +- After ack, those UIDs no longer appear under `list --new`. +- A filtered/invisible UID cannot be acked (returns not-found), preventing the agent from + manipulating state for mail it isn't allowed to see. + **`emcli send --account --to … [--cc …] [--bcc …] --subject --body [--attach ]… [--reply-to ]`** - Sends a plain-text message via the account's SMTP endpoint. - `--reply-to ` fetches the source message's `Message-ID` and `References` and sets @@ -178,7 +219,9 @@ All agent commands emit a single JSON object (Section 8) and nothing else on std - **`emcli init`** — TUI flow: creates the DB (generating schema), adds the first account, and runs OAuth consent if the account is OAuth2. - **`emcli account add | edit | remove | list`** — interactive add/edit; `list` prints a - table (never secrets). + table (never secrets). `account add` accepts `--process-backlog` (default off) which sets + the account's baseline policy: off ⇒ newly-seen folders floor at their current max UID + (existing mail treated as handled); on ⇒ floor at 0 (existing mail is processed). - **`emcli whitelist in|out add|remove|list --account `** — manage whitelist entries. - **`emcli config set|get`** — global settings (e.g. `audit_retention_days`). - **`emcli audit list [--account ] [--limit N]`** — view recent audit entries. @@ -213,12 +256,12 @@ Every agent command prints exactly one object: Enforcement lives entirely in the `policy` package and is exercised on every agent action. -### Inbound (read: `list`, `get`) +### Inbound (read: `list`, `get`, `search`, `ack`) - If `whitelist_in_enabled`, the message sender must match a `whitelist_in` entry. - If `subject_regex` is set (non-empty), the subject must match the regex. -- A message that fails either check is **invisible**: excluded from `list` results and - not retrievable via `get` (returns not-found). The agent has no way to learn that the - message exists. +- A message that fails either check is **invisible**: excluded from `list` and `search` + results, not retrievable via `get`, and not ackable via `ack` (all return not-found). The + agent has no way to learn that the message exists or to alter read state for it. ### Outbound (send) - If account `mode` is `RO`, `send` is rejected. @@ -231,7 +274,8 @@ Enforcement lives entirely in the `policy` package and is exercised on every age - Any other entry matches a full address exactly. ### Audit -- Every action (`list`, `get`, `send`), allowed or blocked, writes one `audit_log` row. +- Every action (`list`, `get`, `search`, `ack`, `send`), allowed or blocked, writes one + `audit_log` row. - Blocked actions record a `reason` (e.g. `ro_mode`, `whitelist_out`, `filtered`). - On each run that opens the DB, audit rows older than `audit_retention_days` are purged. @@ -260,7 +304,9 @@ Enforcement lives entirely in the `policy` package and is exercised on every age whitelist-out × subject-regex, including domain-match and case-insensitivity, and the "any recipient fails ⇒ whole send blocked" rule. - **`store`** — encryption round-trip; decryption with the wrong key fails closed; schema - migration; pointer reset on `UIDVALIDITY` change. + migration; floor baseline per `process_backlog`; seen-set membership (`uid > floor` and + not acked); out-of-order ack correctness; compaction collapses contiguous runs into the + floor; `folder_state`/`acked` reset on `UIDVALIDITY` change. - **`mail`** — integration tests against a containerized IMAP/SMTP server (e.g. GreenMail or Dovecot) for list/get/send and threading headers. - **`oauth`** — token exchange/refresh against a mocked authorization server; loopback @@ -274,4 +320,5 @@ Enforcement lives entirely in the `policy` package and is exercised on every age - Additional auth mechanisms (e.g. OAuth for non-Google providers) follow the same model. - Whitelist semantics are currently per-account only; global defaults with overrides are explicitly out of scope for v1. -``` +- No `unack` / re-process command in v1; the agent acks deliberately and acks are final + (short of an admin reset). Add if a re-processing workflow proves necessary.