Files
steve 574370e8d1 Remove AMD ROCm support — CPU and NVIDIA only
BREAKING: Remove Dockerfile.rocm, compose.rocm.yaml, and ROCm image
build/push from the release pipeline. Remove AMD quick-start and ROCm
references from README and DEVELOPER docs. Update docker-deployment
and developer-docs specs to reflect CPU + NVIDIA only.

The ROCm variant added significant complexity (4.2GB torch wheel,
>20GB container) with limited usage. Users on AMD GPUs should stay
on engine v3.2.x or switch to CPU mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:39:37 +01:00

2.3 KiB

Context

The project currently ships three Docker image variants: CPU, NVIDIA, and AMD ROCm. The ROCm variant requires a 4.2GB pre-built torch wheel, a multi-stage Dockerfile with ROCm-specific runtime libraries, and additional build/push steps in the release pipeline. ROCm support is less tested and adds disproportionate complexity relative to its usage.

Goals / Non-Goals

Goals:

  • Remove all ROCm-specific files (Dockerfile, compose file, torch wheel)
  • Remove ROCm build/push from the release pipeline
  • Update all documentation to reflect CPU + NVIDIA only
  • Update the docker-deployment spec to remove ROCm requirements

Non-Goals:

  • Changing any engine application code (it is already GPU-vendor-agnostic via PyTorch)
  • Modifying the CPU or NVIDIA Dockerfiles (beyond what's already in-flight)
  • Providing a migration path for ROCm users (they can stay on 3.2.x or use CPU mode)

Decisions

1. Delete ROCm files outright rather than deprecating

Remove Dockerfile.rocm, compose.rocm.yaml, and assets/ immediately rather than marking them deprecated. There are no downstream consumers that depend on automated ROCm builds — anyone needing AMD support can pin to the last ROCm-supporting release.

Alternative considered: Keep files but stop publishing images. Rejected — dead code is confusing and still requires maintenance awareness.

2. Leave archived openspec changes untouched

Archived changes under openspec/changes/archive/ contain historical ROCm references. These are historical records and should not be modified.

3. Update GPU-vendor-agnostic requirement to reflect NVIDIA-only scope

The existing spec requirement "Application code is GPU-vendor-agnostic" remains true at the code level (PyTorch abstracts GPU vendors), but the project no longer provides or tests ROCm images. The spec should be simplified to reflect that only NVIDIA and CPU are supported deployment targets.

Risks / Trade-offs

  • [Breaking change for AMD users] → Users on AMD GPUs must stay on 3.2.x or use CPU mode. Mitigated by the fact that ROCm support was already "less tested" per the original design risk assessment.
  • [Future re-addition harder] → If ROCm support is needed later, the Dockerfile and compose file would need to be recreated. Mitigated by git history preserving the removed files.