Spec: replace read pointer with per-message seen-set model

Reading state is now a per-(account,folder) floor plus an acked set of
UIDs above it, instead of a single monotonic pointer. This makes
acknowledgement per-message and order-independent so concurrent
subagents can process and ack out of order. Internal compaction collapses
contiguous acked runs into the floor to bound storage. Adds stateless
search and ack commands; reads no longer mutate state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-21 21:01:34 +01:00
parent 79b62b24c2
commit e3f8afbc7c
+69 -22
View File
@@ -104,6 +104,9 @@ accounts
whitelist_in_enabled INTEGER -- 0 | 1
whitelist_out_enabled INTEGER -- 0 | 1
subject_regex TEXT -- nullable; blank/null = no subject filter
process_backlog INTEGER -- 0 | 1; baseline policy for newly-seen folders
-- 0 (default) = floor at current max UID
-- 1 = floor at 0 (process existing mail)
whitelist_in
account_id INTEGER FK
@@ -113,19 +116,26 @@ whitelist_out
account_id INTEGER FK
address TEXT
read_pointers
folder_state -- one row per folder ever seen
account_id INTEGER FK
folder TEXT
uidvalidity INTEGER
last_uid INTEGER
floor_uid INTEGER -- nothing at or below this is ever "new"
PRIMARY KEY (account_id, folder)
acked -- individual processed UIDs above the floor
account_id INTEGER FK
folder TEXT
uidvalidity INTEGER
uid INTEGER
PRIMARY KEY (account_id, folder, uid)
audit_log
id INTEGER PK
ts TEXT -- RFC3339 UTC
account TEXT
action TEXT -- 'list' | 'get' | 'send'
target TEXT -- folder or recipient set
action TEXT -- 'list' | 'get' | 'send' | 'ack' | 'search'
target TEXT -- folder, UID(s), or recipient set
result TEXT -- 'allowed' | 'blocked'
reason TEXT -- nullable; populated on block
@@ -138,9 +148,22 @@ settings
Notes:
- Folders are agent-specified; there is no folder whitelist. Read state is tracked per
`(account, folder)`.
- `read_pointers` stores `uidvalidity`; if the server reports a different `UIDVALIDITY`
for a folder than the stored value, the pointer is reset (treated as `last_uid = 0`) and
the new `uidvalidity` recorded.
- **"New" is a seen-set, not a watermark.** A message is "new" when it exists in the
folder, its `uid > floor_uid`, and its `uid` is not in `acked`. This makes
acknowledgement per-message and order-independent — essential when the agent fans
processing out to concurrent subagents that finish out of order.
- **Floor baseline.** On first contact with a folder, `floor_uid` is set from the
account's `process_backlog` policy: the current highest UID (default — existing mail is
treated as already handled) or `0` (process the existing backlog).
- **Compaction (internal, invisible to the agent).** When `acked` holds a contiguous run
of UIDs immediately above `floor_uid`, that run is collapsed: `floor_uid` advances past
it and the rows are deleted. This bounds storage without changing what counts as new. A
folder processed strictly in order degenerates to a single floor value with an empty
`acked` set (i.e. the watermark case); out-of-order processing leaves short-lived holes
above the floor until they fill in.
- **UIDVALIDITY change.** If the server reports a different `UIDVALIDITY` for a folder than
the stored value, UIDs are no longer comparable: `folder_state` and `acked` for that
folder are reset and the floor re-baselined per `process_backlog`.
## 7. Command surface
@@ -148,22 +171,40 @@ Notes:
All agent commands emit a single JSON object (Section 8) and nothing else on stdout.
**`emcli list --account <name> --folder <folder> [--new] [--limit N]`**
All read commands are **stateless** — they never mutate floor or ack state. The only
command that advances read state is `ack`.
**`emcli list --account <name> --folder <folder> [--new] [--before <uid>] [--since <uid>] [--limit N]`**
- Returns message headers only: `uid`, `from`, `to`, `subject`, `date`, `message_id`,
`has_attachments`.
- `--new` returns only messages with `uid` greater than the stored pointer for
`(account, folder)`, then advances the pointer to the highest UID returned.
- Without `--new`, the pointer is not advanced.
- `--limit` caps the number of messages returned (default applied if omitted; see 7.3).
`has_attachments`. Newest-first.
- `--new` filters to messages that are new per the seen-set rule (`uid > floor_uid` and
not in `acked`). It does **not** advance any state.
- `--before <uid>` / `--since <uid>` page through history by UID cursor (e.g. page older by
passing the lowest UID from the previous page as `--before`).
- `--limit` caps results (default applied if omitted; see 7.3).
- Whitelist-in and subject-regex filtering are applied before results are returned
(Section 9).
(Section 9); filtered messages are invisible in every mode.
**`emcli get --account <name> --folder <folder> --uid <uid>`**
- Returns full message: headers, decoded plain-text body, and attachments as
`{name, size, mime, content_b64}`.
- Does **not** ack the message — fetching to inspect is distinct from consuming.
- If the message is filtered by whitelist-in or subject-regex, returns an error envelope
(not-found) — the agent cannot retrieve filtered mail.
**`emcli search --account <name> --folder <folder> [--from <addr>] [--subject-contains <s>] [--text <s>] [--since <date>] [--before <date>] [--limit N]`**
- Server-side IMAP SEARCH across the whole folder, regardless of floor/ack state. Returns
the same headers-only shape as `list`. Useful for finding specific historical mail.
- Subject to the same inbound filtering as `list`/`get`: filtered messages never appear.
- Stateless.
**`emcli ack --account <name> --folder <folder> --uid <uid>…`**
- Marks one or more UIDs as processed (adds them to the `acked` set; triggers compaction).
- Idempotent, batchable, and order-independent — safe to call from concurrent subagents.
- After ack, those UIDs no longer appear under `list --new`.
- A filtered/invisible UID cannot be acked (returns not-found), preventing the agent from
manipulating state for mail it isn't allowed to see.
**`emcli send --account <name> --to <addr>… [--cc <addr>…] [--bcc <addr>…] --subject <s> --body <text> [--attach <path>]… [--reply-to <uid>]`**
- Sends a plain-text message via the account's SMTP endpoint.
- `--reply-to <uid>` fetches the source message's `Message-ID` and `References` and sets
@@ -178,7 +219,9 @@ All agent commands emit a single JSON object (Section 8) and nothing else on std
- **`emcli init`** — TUI flow: creates the DB (generating schema), adds the first account,
and runs OAuth consent if the account is OAuth2.
- **`emcli account add | edit | remove | list`** — interactive add/edit; `list` prints a
table (never secrets).
table (never secrets). `account add` accepts `--process-backlog` (default off) which sets
the account's baseline policy: off ⇒ newly-seen folders floor at their current max UID
(existing mail treated as handled); on ⇒ floor at 0 (existing mail is processed).
- **`emcli whitelist in|out add|remove|list --account <name>`** — manage whitelist entries.
- **`emcli config set|get`** — global settings (e.g. `audit_retention_days`).
- **`emcli audit list [--account <name>] [--limit N]`** — view recent audit entries.
@@ -213,12 +256,12 @@ Every agent command prints exactly one object:
Enforcement lives entirely in the `policy` package and is exercised on every agent action.
### Inbound (read: `list`, `get`)
### Inbound (read: `list`, `get`, `search`, `ack`)
- If `whitelist_in_enabled`, the message sender must match a `whitelist_in` entry.
- If `subject_regex` is set (non-empty), the subject must match the regex.
- A message that fails either check is **invisible**: excluded from `list` results and
not retrievable via `get` (returns not-found). The agent has no way to learn that the
message exists.
- A message that fails either check is **invisible**: excluded from `list` and `search`
results, not retrievable via `get`, and not ackable via `ack` (all return not-found). The
agent has no way to learn that the message exists or to alter read state for it.
### Outbound (send)
- If account `mode` is `RO`, `send` is rejected.
@@ -231,7 +274,8 @@ Enforcement lives entirely in the `policy` package and is exercised on every age
- Any other entry matches a full address exactly.
### Audit
- Every action (`list`, `get`, `send`), allowed or blocked, writes one `audit_log` row.
- Every action (`list`, `get`, `search`, `ack`, `send`), allowed or blocked, writes one
`audit_log` row.
- Blocked actions record a `reason` (e.g. `ro_mode`, `whitelist_out`, `filtered`).
- On each run that opens the DB, audit rows older than `audit_retention_days` are purged.
@@ -260,7 +304,9 @@ Enforcement lives entirely in the `policy` package and is exercised on every age
whitelist-out × subject-regex, including domain-match and case-insensitivity, and the
"any recipient fails ⇒ whole send blocked" rule.
- **`store`** — encryption round-trip; decryption with the wrong key fails closed; schema
migration; pointer reset on `UIDVALIDITY` change.
migration; floor baseline per `process_backlog`; seen-set membership (`uid > floor` and
not acked); out-of-order ack correctness; compaction collapses contiguous runs into the
floor; `folder_state`/`acked` reset on `UIDVALIDITY` change.
- **`mail`** — integration tests against a containerized IMAP/SMTP server (e.g. GreenMail
or Dovecot) for list/get/send and threading headers.
- **`oauth`** — token exchange/refresh against a mocked authorization server; loopback
@@ -274,4 +320,5 @@ Enforcement lives entirely in the `policy` package and is exercised on every age
- Additional auth mechanisms (e.g. OAuth for non-Google providers) follow the same model.
- Whitelist semantics are currently per-account only; global defaults with overrides are
explicitly out of scope for v1.
```
- No `unack` / re-process command in v1; the agent acks deliberately and acks are final
(short of an admin reset). Add if a re-processing workflow proves necessary.