Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

From: Koichiro Den

Date: Wed Mar 04 2026 - 22:23:41 EST


On Wed, Mar 04, 2026 at 09:53:42AM -0700, Dave Jiang wrote:
>
>
> On 3/4/26 8:56 AM, Koichiro Den wrote:
> > On Tue, Mar 03, 2026 at 08:42:53AM -0700, Dave Jiang wrote:
> >>
> >>
> >> On 3/2/26 9:56 PM, Koichiro Den wrote:
> >>> On Mon, Mar 02, 2026 at 09:52:08AM -0700, Dave Jiang wrote:
> >>>>
> >>>>
> >>>> On 3/2/26 7:45 AM, Koichiro Den wrote:
> >>>>> Some NTB implementations are backed by a "virtual" PCI device, while the
> >>>>> actual DMA mapping context (IOMMU domain) belongs to a different device.
> >>>>>
> >>>>> One example is vNTB, where the NTB device is represented as a virtual
> >>>>> PCI endpoint function, but DMA operations must be performed against the
> >>>>> EPC parent device, which owns the IOMMU context.
> >>>>>
> >>>>> Today, ntb_transport implicitly relies on the NTB device's parent device
> >>>>> as the DMA mapping device. This works for most PCIe NTB hardware, but
> >>>>> breaks implementations where the NTB PCI function is not the correct
> >>>>> device to use for DMA API operations.
> >>>>
> >>>> Actually it doesn't quite work. This resulted in 061a785a114f ("ntb: Force
> >>>> physically contiguous allocation of rx ring buffers"). As you can see it
> >>>> tries to get around the issue as a temp measure. The main issue is the
> >>>> memory window buffer is allocated before the dmaengine devices are allocated.
> >>>> So the buffer is mapped against the NTB device rather than the DMA device.
> >>>> So I think we may need to come up with a better scheme to clean up this
> >>>> issue as some of the current NTBs can utilize this change as well.
> >>>
> >>> Thanks for the feedback.
> >>>
> >>> I think there are two issues which are related but separable:
> >>>
> >>> - 1). Ensuring the correct DMA-mapping device is used for the MW translation
> >>> (i.e. inbound accesses from the peer).
> >>> - 2). RX-side DMA memcpy re-maps the MW source buffer against the dmaengine
> >>> device ("double mapping").
> >>>
> >>> (1) is what this series is addressing. I think this series does not worsen (2).
> >>> I agree that (2) should be improved eventually.
> >>>
> >>> (Note that in some setups such as vNTB, the device returned by ntb_get_dma_dev()
> >>> can be the same as chan->device->dev, in that case the double mapping could be
> >>> optimized away. However, I undersntand that you are talking about a more
> >>> fundamental improvement.)
> >>>
> >>>>
> >>>> The per queue DMA device presents an initialization hierarchy challenge with the
> >>>> memory window context. I'm open to suggestions.
> >>>
> >>> In my view, what is written in 061a785a114f looks like the most viable long-term
> >>> direction:
> >>>
> >>> A potential future solution may be having the DMA mapping API providing a
> >>> way to alias an existing IOVA mapping to a new device perhaps.
> >>>
> >>> I do not immediately see a more practical alternative. E.g., deferring MW
> >>> inbound mapping until ntb_transport_create_queue() would require a substantial
> >>> rework, since dma_chan is determined per-QP at that stage and the mapping would
> >>> become dynamic per subrange. I doubt it would be worth doing or acceptable.
> >>> Pre-allocating dma_chans only for this purpose also seems excessive.
> >>>
> >>> So I agree that (2) needs a clean-up eventually. However, in my opinion the
> >>> problem this series tries to solve is independent, and the approach here does
> >>> not interfere with that direction.
> >>
> >> Fair assessment. For the series:
> >> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> >
> > Thanks for the review.
> >
> > Once this looks good to Jon as well and gets queued in the NTB tree, I'll submit
> > a small patch to PCI EP for vNTB (the real user of the interface), something
> > like the following:
> >
> >
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index be6c03f4516e..8aeacbae8b77 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -1501,6 +1501,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
> > return 0;
> > }
> >
> > +static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
> > +{
> > + struct epf_ntb *ntb = ntb_ndev(ndev);
> > +
> > + if (!ntb || !ntb->epf)
> > + return NULL;
> > + return ntb->epf->epc->dev.parent;
> > +}
> > +
> > static const struct ntb_dev_ops vntb_epf_ops = {
> > .mw_count = vntb_epf_mw_count,
> > .spad_count = vntb_epf_spad_count,
> > @@ -1522,6 +1531,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
> > .db_clear_mask = vntb_epf_db_clear_mask,
> > .db_clear = vntb_epf_db_clear,
> > .link_disable = vntb_epf_link_disable,
> > + .get_dma_dev = vntb_epf_get_dma_dev,
> > };
> >
> > static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >
> >
>
> Probably should include it with this series if it's small. Having the user with new code is usually preferred.

I thought that, since the vNTB patch wouldn't work until the NTB changes are in,
asking both the NTB and PCI EP maintainers to coordinate the apply order might
be a bit awkward.

That said, if preferable, I can include the vNTB change in this series and
explicitly ask the PCI EP maintainers not to pick up (new) Patch 3 until the NTB
maintainers have acked and applied Patch 1-2.

I'd also appreciate any thoughts from Jon or others on this (i.e. keeping
this series NTB tree-only vs. including the vNTB change as well), as well
as any feedback on this v1 series itself.

P.S. I sent a corrected code snippet a few minutes after my original post. The
original snippet above was wrong, as it would violate the kernel-doc in Patch 1:

"Drivers that implement .get_dma_dev() must return a non-NULL pointer."

Best regards,
Koichiro

>