Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI

From: Jason Gunthorpe
Date: Wed Jan 15 2025 - 10:11:28 EST


On Wed, Jan 15, 2025 at 03:30:47PM +0100, Christian König wrote:

> > Those rules are not something we cam up with because of some limitation
> > of the DMA-API, but rather from experience working with different device
> > driver and especially their developers.

I would say it stems from the use of scatter list. You do not have
enough information exchanged between exporter and importer to
implement something sane and correct. At that point being restrictive
is a reasonable path.

Because of scatterlist developers don't have APIs that correctly solve
the problems they want to solve, so of course things get into a mess.

> > Applying and enforcing those restrictions is absolutely mandatory must
> > have for extending DMA-buf.

You said to come to the maintainers with the problems, here are the
problems. Your answer is don't use dmabuf.

That doesn't make the problems go away :(

> > > I really don't want to make a dmabuf2 - everyone would have to
> > > implement it, including all the GPU drivers if they want to work with
> > > RDMA. I don't think this makes any sense compared to incrementally
> > > evolving dmabuf with more optional capabilities.
> >
> > The point is that a dmabuf2 would most likely be rejected as well or
> > otherwise run into the same issues we have seen before.

You'd need to be much more concrete and technical in your objections
to cause a rejection. "We tried something else before and it didn't
work" won't cut it.

There is a very simple problem statement here, we need a FD handle for
various kinds of memory, with a lifetime model that fits a couple of
different use cases. The exporter and importer need to understand what
type of memory it is and what rules apply to working with it. The
required importers are more general that just simple PCI DMA.

I feel like this is already exactly DMABUF's mission.

Besides, you have been saying to go do this in TEE or whatever, how is
that any different from dmabuf2?

> > > > > > > That sounds more something for the TEE driver instead of anything DMA-buf
> > > > > > > should be dealing with.
> > > > > > Has nothing to do with TEE.
> > > > > Why?
> > > The Linux TEE framework is not used as part of confidential compute.
> > >
> > > CC already has guest memfd for holding it's private CPU memory.
> >
> > Where is that coming from and how it is used?

What do you mean? guest memfd is the result of years of negotiation in
the mm and x86 arch subsystems :( It is used like a normal memfd, and
we now have APIs in KVM and iommufd to directly intake and map from a
memfd. I expect guestmemfd will soon grow some more generic
dmabuf-like lifetime callbacks to avoid pinning - it already has some
KVM specific APIs IIRC.

But it is 100% exclusively focused on CPU memory and nothing else.

> > > This is about confidential MMIO memory.
> >
> > Who is the exporter and who is the importer of the DMA-buf in this use
> > case?

In this case Xu is exporting MMIO from VFIO and importing to KVM and
iommufd.

> > This is also not just about the KVM side, the VM side also has issues
> > with DMABUF and CC - only co-operating devices can interact with the
> > VM side "encrypted" memory and there needs to be a negotiation as part
> > of all buffer setup what the mutual capability is. :\ swiotlb hides
> > some of this some times, but confidential P2P is currently unsolved.
>
> Yes and it is documented by now how that is supposed to happen with
> DMA-buf.

I doubt that. It is complex and not fully solved in the core code
today. Many scenarios do not work correctly, devices don't even exist
yet that can exercise the hard paths. This is a future problem :(

Jason