Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device
From: Koichiro Den
Date: Wed Mar 04 2026 - 11:13:47 EST
On Thu, Mar 05, 2026 at 12:56:12AM +0900, Koichiro Den wrote:
> On Tue, Mar 03, 2026 at 08:42:53AM -0700, Dave Jiang wrote:
> >
> >
> > On 3/2/26 9:56 PM, Koichiro Den wrote:
> > > On Mon, Mar 02, 2026 at 09:52:08AM -0700, Dave Jiang wrote:
> > >>
> > >>
> > >> On 3/2/26 7:45 AM, Koichiro Den wrote:
> > >>> Some NTB implementations are backed by a "virtual" PCI device, while the
> > >>> actual DMA mapping context (IOMMU domain) belongs to a different device.
> > >>>
> > >>> One example is vNTB, where the NTB device is represented as a virtual
> > >>> PCI endpoint function, but DMA operations must be performed against the
> > >>> EPC parent device, which owns the IOMMU context.
> > >>>
> > >>> Today, ntb_transport implicitly relies on the NTB device's parent device
> > >>> as the DMA mapping device. This works for most PCIe NTB hardware, but
> > >>> breaks implementations where the NTB PCI function is not the correct
> > >>> device to use for DMA API operations.
> > >>
> > >> Actually it doesn't quite work. This resulted in 061a785a114f ("ntb: Force
> > >> physically contiguous allocation of rx ring buffers"). As you can see it
> > >> tries to get around the issue as a temp measure. The main issue is the
> > >> memory window buffer is allocated before the dmaengine devices are allocated.
> > >> So the buffer is mapped against the NTB device rather than the DMA device.
> > >> So I think we may need to come up with a better scheme to clean up this
> > >> issue as some of the current NTBs can utilize this change as well.
> > >
> > > Thanks for the feedback.
> > >
> > > I think there are two issues which are related but separable:
> > >
> > > - 1). Ensuring the correct DMA-mapping device is used for the MW translation
> > > (i.e. inbound accesses from the peer).
> > > - 2). RX-side DMA memcpy re-maps the MW source buffer against the dmaengine
> > > device ("double mapping").
> > >
> > > (1) is what this series is addressing. I think this series does not worsen (2).
> > > I agree that (2) should be improved eventually.
> > >
> > > (Note that in some setups such as vNTB, the device returned by ntb_get_dma_dev()
> > > can be the same as chan->device->dev, in that case the double mapping could be
> > > optimized away. However, I undersntand that you are talking about a more
> > > fundamental improvement.)
> > >
> > >>
> > >> The per queue DMA device presents an initialization hierarchy challenge with the
> > >> memory window context. I'm open to suggestions.
> > >
> > > In my view, what is written in 061a785a114f looks like the most viable long-term
> > > direction:
> > >
> > > A potential future solution may be having the DMA mapping API providing a
> > > way to alias an existing IOVA mapping to a new device perhaps.
> > >
> > > I do not immediately see a more practical alternative. E.g., deferring MW
> > > inbound mapping until ntb_transport_create_queue() would require a substantial
> > > rework, since dma_chan is determined per-QP at that stage and the mapping would
> > > become dynamic per subrange. I doubt it would be worth doing or acceptable.
> > > Pre-allocating dma_chans only for this purpose also seems excessive.
> > >
> > > So I agree that (2) needs a clean-up eventually. However, in my opinion the
> > > problem this series tries to solve is independent, and the approach here does
> > > not interfere with that direction.
> >
> > Fair assessment. For the series:
> > Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
>
> Thanks for the review.
>
> Once this looks good to Jon as well and gets queued in the NTB tree, I'll submit
> a small patch to PCI EP for vNTB (the real user of the interface), something
> like the following:
>
>
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index be6c03f4516e..8aeacbae8b77 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -1501,6 +1501,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
> return 0;
> }
>
> +static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
> +{
> + struct epf_ntb *ntb = ntb_ndev(ndev);
> +
> + if (!ntb || !ntb->epf)
> + return NULL;
> + return ntb->epf->epc->dev.parent;
> +}
> +
> static const struct ntb_dev_ops vntb_epf_ops = {
> .mw_count = vntb_epf_mw_count,
> .spad_count = vntb_epf_spad_count,
> @@ -1522,6 +1531,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
> .db_clear_mask = vntb_epf_db_clear_mask,
> .db_clear = vntb_epf_db_clear,
> .link_disable = vntb_epf_link_disable,
> + .get_dma_dev = vntb_epf_get_dma_dev,
> };
>
> static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
No, sorry, my mistake. That was incorrect. It should look like the following:
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 20a400e83439..e5433404f573 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1436,6 +1436,14 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
return 0;
}
+static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
+{
+ struct epf_ntb *ntb = ntb_ndev(ndev);
+ struct pci_epc *epc = ntb->epf->epc;
+
+ return epc->dev.parent;
+}
+
static const struct ntb_dev_ops vntb_epf_ops = {
.mw_count = vntb_epf_mw_count,
.spad_count = vntb_epf_spad_count,
@@ -1457,6 +1465,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
.db_clear_mask = vntb_epf_db_clear_mask,
.db_clear = vntb_epf_db_clear,
.link_disable = vntb_epf_link_disable,
+ .get_dma_dev = vntb_epf_get_dma_dev,
};
static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
Sorry for the noise.
Best regards,
Koichiro
>
>
> Best regards,
> Koichiro
>
> >
> > >
> > > Best regards,
> > > Koichiro
> > >
> > >>
> > >> DJ
> > >>
> > >>>
> > >>> This small series introduces an optional .get_dma_dev() callback in
> > >>> struct ntb_dev_ops, together with a helper ntb_get_dma_dev(). If the
> > >>> callback is not implemented, the helper falls back to the existing
> > >>> default behavior. Drivers that implement .get_dma_dev() must return a
> > >>> non-NULL struct device.
> > >>>
> > >>> - Patch 1/2: Add .get_dma_dev() to struct ntb_dev_ops and provide
> > >>> ntb_get_dma_dev().
> > >>>
> > >>> - Patch 2/2: Switch ntb_transport coherent allocations and frees to use
> > >>> ntb_get_dma_dev().
> > >>>
> > >>> No functional changes are intended by this series itself.
> > >>>
> > >>> A follow-up patch implementing .get_dma_dev() for the vNTB EPF driver
> > >>> (drivers/pci/endpoint/functions/pci-epf-vntb.c) will be submitted
> > >>> separately to the PCI Endpoint subsystem tree. That will enable
> > >>> ntb_transport to work correctly in IOMMU-backed EPC setups.
> > >>>
> > >>> Best regards,
> > >>> Koichiro
> > >>>
> > >>>
> > >>> Koichiro Den (2):
> > >>> NTB: core: Add .get_dma_dev() callback to ntb_dev_ops
> > >>> NTB: ntb_transport: Use ntb_get_dma_dev() for DMA buffers
> > >>>
> > >>> drivers/ntb/ntb_transport.c | 14 +++++++-------
> > >>> include/linux/ntb.h | 23 +++++++++++++++++++++++
> > >>> 2 files changed, 30 insertions(+), 7 deletions(-)
> > >>>
> > >>
> > >
> >