Re: [PATCH v1] iommu/riscv: Support 32-bit register accesses

From: Zong Li

Date: Sun Jun 28 2026 - 21:16:04 EST


On Sun, Jun 28, 2026 at 4:20 PM Guo Ren <guoren@xxxxxxxxxx> wrote:
>
> Hi Zong Li,
>
> On Tue, Jun 23, 2026 at 5:21 PM Zong Li <zong.li@xxxxxxxxxx> wrote:
> >
> > On Fri, Jun 19, 2026 at 1:14 AM Guo Ren <guoren@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Jun 18, 2026 at 9:36 PM David Laight
> > > <david.laight.linux@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, 18 Jun 2026 17:51:34 +0800
> > > > Guo Ren <guoren@xxxxxxxxxx> wrote:
> > > >
> > > > > Hi Vivian,
> > > > >
> > > > > As noted in the RISC-V IOMMU Specification, Chapter 6:
> > > > > > Whether an 8-byte access to an IOMMU register is single-copy atomic is UNSPECIFIED, and such an access may appear, internally to the IOMMU, as if two separate 4-byte accesses — first to the high half and second to the low half — were performed.
> > > > >
> > > > > Therefore, the atomicity of 64-bit MMIO accesses is UNSPECIFIED and
> > > > > not clearly defined in the current ratified RISC-V IOMMU
> > > > > specification. To handle this correctly, the Linux RISC-V IOMMU driver
> > > > > should fall back to 32-bit MMIO accesses when reading 64-bit registers
> > > > > (e.g., performance counters). The behavior of 32-bit MMIO accesses is
> > > > > more precisely defined in the RISC-V IOMMU specification.
> > > > >
> > > > > Thus, many hardware vendors implement 32-bit MMIO (rather than 64-bit
> > > > > MMIO) based on the current ratified RISC-V IOMMU specification, and
> > > > > this driver does not appear to benefit from 64-bit MMIO access either.
> > > > > Performance is fundamentally constrained by bus latency; assuming that
> > > > > simply reducing the number of accesses will improve performance is an
> > > > > oversimplification that ignores the underlying hardware
> > > > > characteristics.
> > > >
> > > > If the bus latency is significant it is almost certainly worth using
> > > > memory accesses to avoid re-reading the hi register.
> > > >
> > > > Something like this might work:
> > > >
> > > > static volatile u32 hi_prev, lo_prev;
> > > >
> > > > u32 hi = read_reg_hi();
> > > > u32 lo = read_reg_lo();
> > > >
> > > > if (lo <= lo_prev || hi != hi_prev) {
> > > > u32 hi_tmp = read_reg_hi;
> > > > if (hi_tmp != hi) {
> > > > hi = hi_tmp;
> > > > lo = 0;
> > > > }
> > > > lo_prev = ~0u;
> > > > hi_prev = hi;
> > > > }
> > > > lo_prev = lo;
> > > > return (u64)hi << 32 | lo;
> > > >
> > > > It shouldn't need any locking but the accesses do need to be ordered.
> > > Thank you for the suggestion. However, I believe this feedback is more
> > > relevant to the RISC-V IOMMU HPM patchset [1], as no counter registers
> > > are involved in the current patchset. That said, the idea of improving
> > > the hi-lo-hi slow-path mechanism to better handle high-latency
> > > hardware scenarios is well taken and worth discussing in the
> > > appropriate thread.
> > > [1]: https://lore.kernel.org/linux-riscv/20260208063848.3547817-2-zong.li@xxxxxxxxxx/
> > >
> > > P.S. The hardware I have at hand exhibits very low interconnect
> > > latency. And I have never observed the slow path where hi_tmp != hi
> > > being triggered — my approach was to remove the retry mechanism
> > > directly in 32-bit mmio mode and run stress tests to check whether
> > > perf stat produced incorrect results. That said, I may have simply
> > > been lucky instead of hw guarantee.
> > >
> >
> >
> > Hi everyone,
> >
> > Thank you for adding me to this discussion. I took some time to read
> > the previous messages.
> > Regarding the GitHub issue mentioned by Vivian, I noticed someone
> > pointed out that the hardware must support 64-bit access.
> >
> > https://github.com/riscv-non-isa/riscv-iommu/issues/765#issuecomment-4742941894
> >
> > I would like to confirm if that is correct. If so, should we update
> > the spec? And does this mean we do not need to modify the driver?
>
> Currently, no one has proposed updating the SPEC. Discussions are
> still focused on clarifying the meaning of the existing specification.
> There are two main points under discussion: the first has been
> clarified, while the second remains controversial.
>
> The first point — "Whether an 8-byte access to an IOMMU register is
> single-copy atomic is UNSPECIFIED" — is uncontroversial. Ved further
> explained that such an access may appear internally to the IOMMU as
> two separate 4-byte accesses. Therefore, for hpm COUNTER counters,
> 32-bit MMIO access must be used to guarantee the correct hi-lo-hi
> ordering. Using 64-bit MMIO access can result in a lo-hi-hi-lo
> sequence, because the specification does not define the order in which
> the access may be split and explicitly marks this behavior as
> UNSPECIFIED. At present, there is unanimous agreement on this point,
> that is, "Whether an 8-byte access to an IOMMU register is single-copy
> atomic is UNSPECIFIED".
>
> The second point — "Registers that are 64-bit wide may be accessed
> using either a 32-bit or a 64-bit access." — is controversial.
> According to the clear definition in RFC 2119: MAY This word, or the
> adjective "OPTIONAL", mean that an item is truly optional. One vendor
> may choose to include the item because a particular marketplace
> requires it or because the vendor feels that it enhances the product
> while another vendor may omit the same item. An implementation which
> does not include a particular option MUST be prepared to interoperate
> with another implementation which does include the option, though
> perhaps with reduced functionality. In the same vein an implementation
> which does include a particular option MUST be prepared to
> interoperate with another implementation which does not include the
> option (except, of course, for the feature the option provides).
> Consequently, any claim that this wording imposes a requirement for
> hardware to support 64-bit MMIO access has generated significant
> controversy.
>
> The current explicit consensus on the specification is therefore as follows:
> 1. All hardware must support access to 64-bit registers using 32-bit
> MMIO accesses.
> 2. Only 32-bit MMIO accesses are guaranteed to be single-copy atomic.
>
> From both a single-copy atomicity perspective and a hardware
> requirements standpoint, Linux kernel drivers should currently follow
> the consensus described above and avoid the controversial 64-bit MMIO
> access. Furthermore, I see no performance benefit to using 64-bit MMIO
> accesses in this driver, as the only 64-bit registers involved — such
> as RISCV_IOMMU_REG_DDTP, RISCV_IOMMU_REG_ICVEC, and queue->qbr — are
> not frequently accessed.
>

Thanks for your clarification, I will address this topic in PMU driver
in the next version.

> --
> Best Regards
> Guo Ren