Re: [PATCH v1] iommu/riscv: Support 32-bit register accesses

From: Guo Ren

Date: Sun Jun 28 2026 - 04:20:52 EST


Hi Zong Li,

On Tue, Jun 23, 2026 at 5:21 PM Zong Li <zong.li@xxxxxxxxxx> wrote:
>
> On Fri, Jun 19, 2026 at 1:14 AM Guo Ren <guoren@xxxxxxxxxx> wrote:
> >
> > On Thu, Jun 18, 2026 at 9:36 PM David Laight
> > <david.laight.linux@xxxxxxxxx> wrote:
> > >
> > > On Thu, 18 Jun 2026 17:51:34 +0800
> > > Guo Ren <guoren@xxxxxxxxxx> wrote:
> > >
> > > > Hi Vivian,
> > > >
> > > > As noted in the RISC-V IOMMU Specification, Chapter 6:
> > > > > Whether an 8-byte access to an IOMMU register is single-copy atomic is UNSPECIFIED, and such an access may appear, internally to the IOMMU, as if two separate 4-byte accesses — first to the high half and second to the low half — were performed.
> > > >
> > > > Therefore, the atomicity of 64-bit MMIO accesses is UNSPECIFIED and
> > > > not clearly defined in the current ratified RISC-V IOMMU
> > > > specification. To handle this correctly, the Linux RISC-V IOMMU driver
> > > > should fall back to 32-bit MMIO accesses when reading 64-bit registers
> > > > (e.g., performance counters). The behavior of 32-bit MMIO accesses is
> > > > more precisely defined in the RISC-V IOMMU specification.
> > > >
> > > > Thus, many hardware vendors implement 32-bit MMIO (rather than 64-bit
> > > > MMIO) based on the current ratified RISC-V IOMMU specification, and
> > > > this driver does not appear to benefit from 64-bit MMIO access either.
> > > > Performance is fundamentally constrained by bus latency; assuming that
> > > > simply reducing the number of accesses will improve performance is an
> > > > oversimplification that ignores the underlying hardware
> > > > characteristics.
> > >
> > > If the bus latency is significant it is almost certainly worth using
> > > memory accesses to avoid re-reading the hi register.
> > >
> > > Something like this might work:
> > >
> > > static volatile u32 hi_prev, lo_prev;
> > >
> > > u32 hi = read_reg_hi();
> > > u32 lo = read_reg_lo();
> > >
> > > if (lo <= lo_prev || hi != hi_prev) {
> > > u32 hi_tmp = read_reg_hi;
> > > if (hi_tmp != hi) {
> > > hi = hi_tmp;
> > > lo = 0;
> > > }
> > > lo_prev = ~0u;
> > > hi_prev = hi;
> > > }
> > > lo_prev = lo;
> > > return (u64)hi << 32 | lo;
> > >
> > > It shouldn't need any locking but the accesses do need to be ordered.
> > Thank you for the suggestion. However, I believe this feedback is more
> > relevant to the RISC-V IOMMU HPM patchset [1], as no counter registers
> > are involved in the current patchset. That said, the idea of improving
> > the hi-lo-hi slow-path mechanism to better handle high-latency
> > hardware scenarios is well taken and worth discussing in the
> > appropriate thread.
> > [1]: https://lore.kernel.org/linux-riscv/20260208063848.3547817-2-zong.li@xxxxxxxxxx/
> >
> > P.S. The hardware I have at hand exhibits very low interconnect
> > latency. And I have never observed the slow path where hi_tmp != hi
> > being triggered — my approach was to remove the retry mechanism
> > directly in 32-bit mmio mode and run stress tests to check whether
> > perf stat produced incorrect results. That said, I may have simply
> > been lucky instead of hw guarantee.
> >
>
>
> Hi everyone,
>
> Thank you for adding me to this discussion. I took some time to read
> the previous messages.
> Regarding the GitHub issue mentioned by Vivian, I noticed someone
> pointed out that the hardware must support 64-bit access.
>
> https://github.com/riscv-non-isa/riscv-iommu/issues/765#issuecomment-4742941894
>
> I would like to confirm if that is correct. If so, should we update
> the spec? And does this mean we do not need to modify the driver?

Currently, no one has proposed updating the SPEC. Discussions are
still focused on clarifying the meaning of the existing specification.
There are two main points under discussion: the first has been
clarified, while the second remains controversial.

The first point — "Whether an 8-byte access to an IOMMU register is
single-copy atomic is UNSPECIFIED" — is uncontroversial. Ved further
explained that such an access may appear internally to the IOMMU as
two separate 4-byte accesses. Therefore, for hpm COUNTER counters,
32-bit MMIO access must be used to guarantee the correct hi-lo-hi
ordering. Using 64-bit MMIO access can result in a lo-hi-hi-lo
sequence, because the specification does not define the order in which
the access may be split and explicitly marks this behavior as
UNSPECIFIED. At present, there is unanimous agreement on this point,
that is, "Whether an 8-byte access to an IOMMU register is single-copy
atomic is UNSPECIFIED".

The second point — "Registers that are 64-bit wide may be accessed
using either a 32-bit or a 64-bit access." — is controversial.
According to the clear definition in RFC 2119: MAY This word, or the
adjective "OPTIONAL", mean that an item is truly optional. One vendor
may choose to include the item because a particular marketplace
requires it or because the vendor feels that it enhances the product
while another vendor may omit the same item. An implementation which
does not include a particular option MUST be prepared to interoperate
with another implementation which does include the option, though
perhaps with reduced functionality. In the same vein an implementation
which does include a particular option MUST be prepared to
interoperate with another implementation which does not include the
option (except, of course, for the feature the option provides).
Consequently, any claim that this wording imposes a requirement for
hardware to support 64-bit MMIO access has generated significant
controversy.

The current explicit consensus on the specification is therefore as follows:
1. All hardware must support access to 64-bit registers using 32-bit
MMIO accesses.
2. Only 32-bit MMIO accesses are guaranteed to be single-copy atomic.

>From both a single-copy atomicity perspective and a hardware
requirements standpoint, Linux kernel drivers should currently follow
the consensus described above and avoid the controversial 64-bit MMIO
access. Furthermore, I see no performance benefit to using 64-bit MMIO
accesses in this driver, as the only 64-bit registers involved — such
as RISCV_IOMMU_REG_DDTP, RISCV_IOMMU_REG_ICVEC, and queue->qbr — are
not frequently accessed.

--
Best Regards
Guo Ren