Re: [PATCH v3] io: add io_pgtable abstraction

From: Boris Brezillon

Date: Fri Dec 12 2025 - 03:44:37 EST

Hi Jason,

On Fri, 28 Nov 2025 14:02:55 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Wed, Nov 12, 2025 at 10:15:00AM +0000, Alice Ryhl wrote:
> > + /// Map a physically contiguous range of pages of the same size.
> > + ///
> > + /// # Safety
> > + ///
> > + /// * This page table must not contain any mapping that overlaps with the mapping created by
> > + /// this call.
> > + /// * If this page table is live, then the caller must ensure that it's okay to access the
> > + /// physical address being mapped for the duration in which it is mapped.
>
> All the iommu page table code runs with a caller provided locking
> regime.
>
> The caller must effectively hold a range lock over the IOVA it is
> mapping/unmapping/iova_to_phys'ing and it is illegal to race those
> three functions with overlapping IOVA.
>
> For example the caller cannot map to IOVA A-B while unmapping C-B or
> iova_to_physing(A).
>
> This locking detail should be a safety statement.

Sure, adding this to the list sounds good.

>
> > +// These bindings are currently designed for use by GPU drivers, which use this page table together
> > +// with GPUVM. When using GPUVM, a single mapping operation may be translated into many operations
>
> Now that we have the generic pt stuff I wonder if GPUVM should be
> providing its own version of the page table implementation that
> matches its semantics better.

Not too sure what you mean here. Are you saying that we should fork
io-pgtable-arm.c (or rather a subset of it), and have it all
implemented in panthor? I actually asked Rob if that was a good way to
go when rolling out the initial panthor version, and he advised me
against it. Now, I see good reasons to do that, like the fact we
would be able to add features like batched repeat mapping updates
(mapping the same page over a wide virtual range without having to
duplicate the intermediate page table levels that are exactly the
same), or the ability to extend the mapping arguments with
shareability/coherency info (that we can do by adding IOMMU_xx flags
too). But there's also downsides to it, like the fact we wouldn't
benefit from bugfixes materializing in io-pgtable-arm.c, if any.

It's a discussion I'm fine having, but it looks like even the IOMMU
maintainers are not on the same page here ;-p.

>
> > +// on the page table, and in that case you generally want to flush the TLB only once per GPUVM
> > +// operation. Thus, do not use these callbacks as they would flush more often than needed.
>
> This doesn't sound quite right to me, the flushing requirements are a
> consequence of what flushing HW is available in the xTLB that is
> caching this. The flushes generated by the common arm code match the
> ARM64 architecture semantics. If the GPU has different semantics then
> it can do something else, but one must be careful not to match a GPU
> that is expecting ARM semantics with "do not use these callbacks".
>
> IOW it doesn't seem right that common code would be making decisions
> like this, the nature and requirements of the flushing are entirely up
> to the driver binding to HW.

We're not saying this will work for everyone, but rather, this is a
default implementation that does nothing, and if you need to do
something, override it with your own. I guess if that's really
problematic, we can force the user to provide one and keep the NOP
implementation on Tyr's side.

Regards,

Boris