Re: [PATCH v1] iommu/riscv: Support 32-bit register accesses
From: Zhanpeng Zhang
Date: Fri Jun 26 2026 - 05:18:32 EST
Hi Vivian, Guo Ren,
Thanks for the discussion. I just caught up with the thread and the
GitHub issue [1].
I think we probably do not need to spend too much time discussing the exact
meaning of RSCV0004 around 32-bit vs 64-bit MMIO access. At least from my
side, I do not see much practical benefit in splitting the standard HID for
this distinction.
The original problem I want to solve is simple: Linux should have a compatible
and maintainable way to support IOMMUs that need 32-bit MMIO accesses for
64-bit registers, instead of taking an access fault that leads to panic.
Ideally, one generic kernel should support both access sequences without a
build-time option.
Would it be reasonable to keep RSCV0004 / riscv,iommu as the common device
identifier, and describe the required register access width as a runtime
firmware property instead? For ACPI, this might be an _DSD property on the
IOMMU device; for devicetree, it could be a matching DT property. The driver
could then check it during enumeration and select the appropriate register
accessors.
This avoids redefining the ACPI HID or compatible string, keeps one generic
kernel image, and makes the 32-bit MMIO fallback explicit in firmware. My goal
is just to implement the spec wording that "Registers that are 64-bit wide may
be accessed using either a 32-bit or a 64-bit access." in a lightweight way
that is acceptable for the Linux driver.
[1] https://github.com/riscv-non-isa/riscv-iommu/issues/765#issuecomment-4742941894
Best regards,
Zhanpeng
> From: "Guo Ren"<guoren@xxxxxxxxxx>
> Date: Fri, Jun 19, 2026, 00:41
> Subject: [External] Re: [PATCH v1] iommu/riscv: Support 32-bit register accesses
> To: "David Laight"<david.laight.linux@xxxxxxxxx>
> Cc: "Vivian Wang"<wangruikang@xxxxxxxxxxx>, <zhangzhanpeng.jasper@xxxxxxxxxxxxx>, <alex@xxxxxxxx>, <aou@xxxxxxxxxxxxxxxxx>, <cuiyunhui@xxxxxxxxxxxxx>, <iommu@xxxxxxxxxxxxxxx>, <joro@xxxxxxxxxx>, <linux-kernel@xxxxxxxxxxxxxxx>, <linux-riscv@xxxxxxxxxxxxxxxxxxx>, <luxu.kernel@xxxxxxxxxxxxx>, <palmer@xxxxxxxxxxx>, <pjw@xxxxxxxxxx>, <robin.murphy@xxxxxxx>, <tjeznach@xxxxxxxxxxxx>, <will@xxxxxxxxxx>, <yuanzhu@xxxxxxxxxxxxx>
> On Thu, Jun 18, 2026 at 9:36 PM David Laight
> <david.laight.linux@xxxxxxxxx> wrote:
> >
> > On Thu, 18 Jun 2026 17:51:34 +0800
> > Guo Ren <guoren@xxxxxxxxxx> wrote:
> >
> > > Hi Vivian,
> > >
> > > As noted in the RISC-V IOMMU Specification, Chapter 6:
> > > > Whether an 8-byte access to an IOMMU register is single-copy atomic is UNSPECIFIED, and such an access may appear, internally to the IOMMU, as if two separate 4-byte accesses — first to the high half and second to the low half — were performed.
> > >
> > > Therefore, the atomicity of 64-bit MMIO accesses is UNSPECIFIED and
> > > not clearly defined in the current ratified RISC-V IOMMU
> > > specification. To handle this correctly, the Linux RISC-V IOMMU driver
> > > should fall back to 32-bit MMIO accesses when reading 64-bit registers
> > > (e.g., performance counters). The behavior of 32-bit MMIO accesses is
> > > more precisely defined in the RISC-V IOMMU specification.
> > >
> > > Thus, many hardware vendors implement 32-bit MMIO (rather than 64-bit
> > > MMIO) based on the current ratified RISC-V IOMMU specification, and
> > > this driver does not appear to benefit from 64-bit MMIO access either.
> > > Performance is fundamentally constrained by bus latency; assuming that
> > > simply reducing the number of accesses will improve performance is an
> > > oversimplification that ignores the underlying hardware
> > > characteristics.
> >
> > If the bus latency is significant it is almost certainly worth using
> > memory accesses to avoid re-reading the hi register.
> >
> > Something like this might work:
> >
> > static volatile u32 hi_prev, lo_prev;
> >
> > u32 hi = read_reg_hi();
> > u32 lo = read_reg_lo();
> >
> > if (lo <= lo_prev || hi != hi_prev) {
> > u32 hi_tmp = read_reg_hi;
> > if (hi_tmp != hi) {
> > hi = hi_tmp;
> > lo = 0;
> > }
> > lo_prev = ~0u;
> > hi_prev = hi;
> > }
> > lo_prev = lo;
> > return (u64)hi << 32 | lo;
> >
> > It shouldn't need any locking but the accesses do need to be ordered.
> Thank you for the suggestion. However, I believe this feedback is more
> relevant to the RISC-V IOMMU HPM patchset [1], as no counter registers
> are involved in the current patchset. That said, the idea of improving
> the hi-lo-hi slow-path mechanism to better handle high-latency
> hardware scenarios is well taken and worth discussing in the
> appropriate thread.
> [1]: https://lore.kernel.org/linux-riscv/20260208063848.3547817-2-zong.li@xxxxxxxxxx/
>
> P.S. The hardware I have at hand exhibits very low interconnect
> latency. And I have never observed the slow path where hi_tmp != hi
> being triggered — my approach was to remove the retry mechanism
> directly in 32-bit mmio mode and run stress tests to check whether
> perf stat produced incorrect results. That said, I may have simply
> been lucky instead of hw guarantee.
>
> --
> Best Regards
> Guo Ren
>