Re: [PATCH v2] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
From: Shanker Donthineni
Date: Wed Jun 10 2026 - 09:27:28 EST
Hi Will,
On 6/10/2026 6:28 AM, Will Deacon wrote:
External email: Use caution opening links or attachments
[+Jason G]
On Fri, Jun 05, 2026 at 09:45:51AM -0500, Shanker Donthineni wrote:
On systems with NVIDIA Olympus cores, a Device-nGnR* load can beI think you can drop the paragraph above. A store-release isn't enough
observed by a peripheral before an older, non-overlapping Device-nGnR*
store to the same peripheral. This breaks the program-order guarantee
that software expects for Device-nGnR* accesses and can leave a
peripheral in an incorrect state, as a load is observed before an
earlier store takes effect.
The erratum can occur only when all of the following apply:
- A PE executes a Device-nGnR* store followed by a younger
Device-nGnR* load.
- The store is not a store-release.
- The accesses target the same peripheral and do not overlap in bytes.
- There is at most one intervening Device-nGnR* store in program
order, and there are no intervening Device-nGnR* loads.
- There is no DSB, and no DMB that orders loads, between the store and
the load.
- Specific micro-architectural and timing conditions occur.
Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
orders loads) between the store and the load, or make the store a
store-release. A load-acquire on the load side would not help, because
acquire semantics do not prevent a load from being observed ahead of an
older store; only the store side (release or a barrier) closes the
window.
to order against a later load in the architecture either, so we're
clearly in micro-architecture territory and I don't think you need to
describe mechanisms that don't work here.
Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
to stlr* (Store-Release), which removes the "store is not a
store-release" condition for every device write the kernel issues.
Because writel() and writel_relaxed() are both built on __raw_writel()
in asm-generic/io.h, patching the raw variants covers both the
non-relaxed and relaxed APIs without touching the higher layers. Note
that writel()'s own barrier sits before the store, so it does not order
the store against a subsequent readl(); the store-release promotion is
what provides that ordering.
Based on the existing code comments and after reviewing this path again,
__const_memcpy_toio_aligned32() and __const_memcpy_toio_aligned64()
appear to be intended for WC regions. Since the erratum is scoped to
Device-nGnR* accesses, and WC mappings are Normal-NC on arm64, I don’t
think the STLR workaround should apply to these helpers by default.
Applying it there would also break the contiguous STR grouping that
this path relies on for write combining.
-Shanker