Re: [PATCH v1] iommu/riscv: Support 32-bit register accesses

From: Zong Li

Date: Tue Jun 23 2026 - 05:22:31 EST


On Fri, Jun 19, 2026 at 1:14 AM Guo Ren <guoren@xxxxxxxxxx> wrote:
>
> On Thu, Jun 18, 2026 at 9:36 PM David Laight
> <david.laight.linux@xxxxxxxxx> wrote:
> >
> > On Thu, 18 Jun 2026 17:51:34 +0800
> > Guo Ren <guoren@xxxxxxxxxx> wrote:
> >
> > > Hi Vivian,
> > >
> > > As noted in the RISC-V IOMMU Specification, Chapter 6:
> > > > Whether an 8-byte access to an IOMMU register is single-copy atomic is UNSPECIFIED, and such an access may appear, internally to the IOMMU, as if two separate 4-byte accesses — first to the high half and second to the low half — were performed.
> > >
> > > Therefore, the atomicity of 64-bit MMIO accesses is UNSPECIFIED and
> > > not clearly defined in the current ratified RISC-V IOMMU
> > > specification. To handle this correctly, the Linux RISC-V IOMMU driver
> > > should fall back to 32-bit MMIO accesses when reading 64-bit registers
> > > (e.g., performance counters). The behavior of 32-bit MMIO accesses is
> > > more precisely defined in the RISC-V IOMMU specification.
> > >
> > > Thus, many hardware vendors implement 32-bit MMIO (rather than 64-bit
> > > MMIO) based on the current ratified RISC-V IOMMU specification, and
> > > this driver does not appear to benefit from 64-bit MMIO access either.
> > > Performance is fundamentally constrained by bus latency; assuming that
> > > simply reducing the number of accesses will improve performance is an
> > > oversimplification that ignores the underlying hardware
> > > characteristics.
> >
> > If the bus latency is significant it is almost certainly worth using
> > memory accesses to avoid re-reading the hi register.
> >
> > Something like this might work:
> >
> > static volatile u32 hi_prev, lo_prev;
> >
> > u32 hi = read_reg_hi();
> > u32 lo = read_reg_lo();
> >
> > if (lo <= lo_prev || hi != hi_prev) {
> > u32 hi_tmp = read_reg_hi;
> > if (hi_tmp != hi) {
> > hi = hi_tmp;
> > lo = 0;
> > }
> > lo_prev = ~0u;
> > hi_prev = hi;
> > }
> > lo_prev = lo;
> > return (u64)hi << 32 | lo;
> >
> > It shouldn't need any locking but the accesses do need to be ordered.
> Thank you for the suggestion. However, I believe this feedback is more
> relevant to the RISC-V IOMMU HPM patchset [1], as no counter registers
> are involved in the current patchset. That said, the idea of improving
> the hi-lo-hi slow-path mechanism to better handle high-latency
> hardware scenarios is well taken and worth discussing in the
> appropriate thread.
> [1]: https://lore.kernel.org/linux-riscv/20260208063848.3547817-2-zong.li@xxxxxxxxxx/
>
> P.S. The hardware I have at hand exhibits very low interconnect
> latency. And I have never observed the slow path where hi_tmp != hi
> being triggered — my approach was to remove the retry mechanism
> directly in 32-bit mmio mode and run stress tests to check whether
> perf stat produced incorrect results. That said, I may have simply
> been lucky instead of hw guarantee.
>


Hi everyone,

Thank you for adding me to this discussion. I took some time to read
the previous messages.
Regarding the GitHub issue mentioned by Vivian, I noticed someone
pointed out that the hardware must support 64-bit access.

https://github.com/riscv-non-isa/riscv-iommu/issues/765#issuecomment-4742941894

I would like to confirm if that is correct. If so, should we update
the spec? And does this mean we do not need to modify the driver?

Thanks

> --
> Best Regards
> Guo Ren
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-riscv