Re: [PATCH kernel 4/9] dma/swiotlb: Stop forcing SWIOTLB for TDISP devices

From: Jason Gunthorpe

Date: Mon Mar 02 2026 - 19:19:20 EST

On Mon, Mar 02, 2026 at 03:53:13PM -0800, dan.j.williams@xxxxxxxxx wrote:
> > > The specification allows it, but Linux DMA mapping core is not yet ready
> > > for it. So the expectation to start is that the device loses access to
> > > its original shared IOMMU mappings when converted to private operation.
> >
> > Yes, the underlying translation changes, but no, it doesn't loose DMA
> > access to any shared pages, it just goes through the T=1 IOMMU now.
>
> Yes, what I meant to say is that Linux may need to be prepared for
> implementations that do not copy over the shared mappings. At least for
> early staging / minimum viable implementation for first merge.
>
> > The T=1 IOMMU will still have them mapped on all three platforms
> > AFAIK.
>
> Oh, I thought SEV-TIO had trouble with this, if this is indeed the case,
> great, ignore my first comment.

Alexey?

I think it is really important that shared mappings continue to be
reachable by TDISP device.

> I have a v2 of a TEE I/O set going out shortly and sounds like it will
> need a rethink for this attribute proposal for v3. I think it still helps to
> have combo sets at this stage so the whole lifecycle is visible in one
> set, but it is nearly at the point of being too big a set to consider in
> one sitting.

My problem is I can't get in one place an actually correct picture of
how the IOVA translation works in all the arches and how the
phys_addr_t works.

So it is hard to make sense of all these proposals. What I would love
to see is one series that deals with this:

[PATCH v2 11/19] x86, dma: Allow accepted devices to map private memory

For *all* the arches, along with a description for each of:
* how their phys_addr_t is constructed
* how their S2 IOMMU mapping works
* how a vIOMMU S1 would change any of the above.

Then maybe we can see if we are actually doing it properly or not.

> > ARM has a "solution" right now. The location of the high bit is
> > controlled by the VMM and the VMM cannot create a CC VM where the IPA
> > space exceeds the dma_mask of any assigned device.
> >
> > Thus the VMM must limit the total available DRAM to fit within the HW
> > restrictions.
> >
> > Hopefully TDX can do the same.
>
> TDX does not have the same problem, but the ARM "solution" seems
> reasonable for now.

I'm surprised because Xu said:

This is same as Intel TDX, the GPA shared bit are used by IOMMU to
target shared/private. You can imagine for T=1, there are 2 IOPTs, or
1 IOPT with all private at lower address & all shared at higher address.

https://lore.kernel.org/all/aaF6HD2gfe%2Fudl%2Fx@yilunxu-OptiPlex-7050/

So how come that not have exactly the same problem as ARM?

Jason