Re: [PATCH v5 5/5] PCI: endpoint: pci-epf-vntb: manage ntb_dev lifetime and fix vpci bus teardown

From: Koichiro Den

Date: Tue Mar 03 2026 - 08:43:33 EST


On Tue, Mar 03, 2026 at 10:52:05AM +0530, Manivannan Sadhasivam wrote:
> On Thu, Feb 26, 2026 at 05:41:42PM +0900, Koichiro Den wrote:
> > Currently ntb_dev is embedded in epf_ntb, while configfs allows starting
> > or stopping controller and linking or unlinking functions as you want.
> > In fact, re-linking and re-starting is not possible with the embedded
> > design and leads to oopses.
> >
> > Allocate ntb_dev with devm and add a .remove callback to the pci driver
> > that calls ntb_unregister_device(). This allows a fresh device to be
> > created on the next .bind call.
> >
> > With these changes, the controller can now be stopped, a function
> > unlinked, configfs settings updated, and the controller re-linked and
> > restarted without rebooting the endpoint, as long as the underlying
> > pci_epc_ops .stop() operation is non-destructive, and .start() can
> > restore normal operations.
> >
> > Reviewed-by: Frank Li <Frank.Li@xxxxxxx>
> > Signed-off-by: Koichiro Den <den@xxxxxxxxxxxxx>
> > ---
> > Changes since v4:
> > - Adjusted context due to commit:
> > dc693d606644 ("PCI: endpoint: pci-epf-vntb: Add MSI doorbell support")
> >
> > drivers/pci/endpoint/functions/pci-epf-vntb.c | 53 ++++++++++++++-----
> > 1 file changed, 40 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index f353e9a57194..41a2f42e8a39 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -120,7 +120,7 @@ struct epf_ntb_ctrl {
> > } __packed;
> >
> > struct epf_ntb {
> > - struct ntb_dev ntb;
> > + struct ntb_dev *ntb;
> > struct pci_epf *epf;
> > struct config_group group;
> >
> > @@ -147,10 +147,16 @@ struct epf_ntb {
> > void __iomem *vpci_mw_addr[MAX_MW];
> >
> > struct delayed_work cmd_handler;
> > +
> > + struct pci_bus *vpci_bus;
> > };
> >
> > #define to_epf_ntb(epf_group) container_of((epf_group), struct epf_ntb, group)
> > -#define ntb_ndev(__ntb) container_of(__ntb, struct epf_ntb, ntb)
> > +
> > +static inline struct epf_ntb *ntb_ndev(struct ntb_dev *ntb)
>
> Nit: No need of 'inline' keyword in a .c file.

Thanks for catching this. I'll drop the 'inline' in v6.


While revisiting Patch 5/5 carefully, I also noticed a lifetime issue around
cmd_handler (and doorbell IRQ handlers).

1. Race window between pci_unregister_driver() and work/IRQ teardown

With this patch, the following window exists:

-> epf_ntb_unbind()
-> pci_unregister_driver()
...
-> pci_vntb_remove()
-> ntb_unregister_device()
-> ndev->ntb = NULL ----------------------------- (A)
-> epf_ntb_epc_cleanup()
-> disable_delayed_work_sync(&ntb->cmd_handler) ---- (B)
-> epf_ntb_db_bar_clear() (free_irq, etc.) --------- (C)

If cmd_handler runs between (A) and (B) (or if epf_ntb_doorbell_handler()
runs between (A) and (C)), they may call ntb_{db,link}_event(ndev->ntb, ...)
with ndev->ntb == NULL, leading to a NULL deref. Also, since the doorbell
IRQs are requested during bind, the IRQ handler can run even before
pci_vntb_probe() ever sets up ndev->ntb.

In current mainline, cmd_handler is initially scheduled in
epf_ntb_epc_init(), and the doorbell IRQs are also requested during bind,
i.e. both are set up before pci_register_driver() and ndev->ntb becomes
valid.

I think the cleanest solution is to tie cmd_handler and the "enabled" state
of the doorbell IRQs to the lifetime of struct ntb_dev:

- Move the initial scheduling (queue_work(..., &ndev->cmd_handler.work)) to
pci_vntb_probe(), after ntb_register_device() succeeds, and stop it in
pci_vntb_remove() using disable_delayed_work_sync() before
ntb_unregister_device().

- Keep request_irq()/free_irq() in the existing bind/unbind paths, but gate
delivery in vPCI probe/remove: keep the doorbell IRQs disabled until
pci_vntb_probe() succeeds (enable_irq() after ndev->ntb becomes valid), and
disable_irq() them in pci_vntb_remove() before clearing ndev->ntb /
ntb_unregister_device().

That way, the unsafe windows I wrote above disappear, and the lifetime of
struct ntb_dev and its associated resources are managed symmetrically across
bind/unbind and probe/remove paths.

2. Stale ndev->ntb on probe failure

pci_vntb_probe() assigns ndev->ntb before ntb_register_device(). If probe
fails after devm_kzalloc(), ndev->ntb may still hold a stale pointer. I'll
fix this by avoiding publishing ndev->ntb until ntb_register_device()
succeeds (or clear it on the error path).

I'll include these fixes into v6.

Best regards,
Koichiro

>
> - Mani
>
> > +{
> > + return (struct epf_ntb *)ntb->pdev->sysdata;
> > +}
> >
> > static struct pci_epf_header epf_ntb_header = {
> > .vendorid = PCI_ANY_ID,
> > @@ -176,7 +182,7 @@ static int epf_ntb_link_up(struct epf_ntb *ntb, bool link_up)
> > else
> > ntb->reg->link_status &= ~LINK_STATUS_UP;
> >
> > - ntb_link_event(&ntb->ntb);
> > + ntb_link_event(ntb->ntb);
> > return 0;
> > }
> >
> > @@ -264,7 +270,7 @@ static void epf_ntb_cmd_handler(struct work_struct *work)
> > for (i = 1; i < ntb->db_count && !ntb->msi_doorbell; i++) {
> > if (ntb->epf_db[i]) {
> > atomic64_or(1 << (i - 1), &ntb->db);
> > - ntb_db_event(&ntb->ntb, i);
> > + ntb_db_event(ntb->ntb, i);
> > ntb->epf_db[i] = 0;
> > }
> > }
> > @@ -333,7 +339,7 @@ static irqreturn_t epf_ntb_doorbell_handler(int irq, void *data)
> > for (i = 1; i < ntb->db_count; i++)
> > if (irq == ntb->epf->db_msg[i].virq) {
> > atomic64_or(1 << (i - 1), &ntb->db);
> > - ntb_db_event(&ntb->ntb, i);
> > + ntb_db_event(ntb->ntb, i);
> > }
> >
> > return IRQ_HANDLED;
> > @@ -1237,6 +1243,7 @@ static int vpci_scan_bus(void *sysdata)
> > pr_err("create pci bus failed\n");
> > return -EINVAL;
> > }
> > + ndev->vpci_bus = vpci_bus;
> >
> > pci_bus_add_devices(vpci_bus);
> >
> > @@ -1281,7 +1288,7 @@ static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
> > int ret;
> > struct device *dev;
> >
> > - dev = &ntb->ntb.dev;
> > + dev = &ntb->ntb->dev;
> > barno = ntb->epf_ntb_bar[BAR_MW1 + idx];
> > epf_bar = &ntb->epf->bar[barno];
> > epf_bar->phys_addr = addr;
> > @@ -1381,7 +1388,7 @@ static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
> > ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
> > PCI_IRQ_MSI, interrupt_num + 1);
> > if (ret)
> > - dev_err(&ntb->ntb.dev, "Failed to raise IRQ\n");
> > + dev_err(&ntb->ntb->dev, "Failed to raise IRQ\n");
> >
> > return ret;
> > }
> > @@ -1468,9 +1475,12 @@ static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > struct epf_ntb *ndev = (struct epf_ntb *)pdev->sysdata;
> > struct device *dev = &pdev->dev;
> >
> > - ndev->ntb.pdev = pdev;
> > - ndev->ntb.topo = NTB_TOPO_NONE;
> > - ndev->ntb.ops = &vntb_epf_ops;
> > + ndev->ntb = devm_kzalloc(dev, sizeof(*ndev->ntb), GFP_KERNEL);
> > + if (!ndev->ntb)
> > + return -ENOMEM;
> > + ndev->ntb->pdev = pdev;
> > + ndev->ntb->topo = NTB_TOPO_NONE;
> > + ndev->ntb->ops = &vntb_epf_ops;
> >
> > ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > if (ret) {
> > @@ -1478,7 +1488,7 @@ static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > return ret;
> > }
> >
> > - ret = ntb_register_device(&ndev->ntb);
> > + ret = ntb_register_device(ndev->ntb);
> > if (ret) {
> > dev_err(dev, "Failed to register NTB device\n");
> > return ret;
> > @@ -1488,6 +1498,17 @@ static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > return 0;
> > }
> >
> > +static void pci_vntb_remove(struct pci_dev *pdev)
> > +{
> > + struct epf_ntb *ndev = (struct epf_ntb *)pdev->sysdata;
> > +
> > + if (!ndev || !ndev->ntb)
> > + return;
> > +
> > + ntb_unregister_device(ndev->ntb);
> > + ndev->ntb = NULL;
> > +}
> > +
> > static struct pci_device_id pci_vntb_table[] = {
> > {
> > PCI_DEVICE(0xffff, 0xffff),
> > @@ -1499,6 +1520,7 @@ static struct pci_driver vntb_pci_driver = {
> > .name = "pci-vntb",
> > .id_table = pci_vntb_table,
> > .probe = pci_vntb_probe,
> > + .remove = pci_vntb_remove,
> > };
> >
> > /* ============ PCIe EPF Driver Bind ====================*/
> > @@ -1581,10 +1603,15 @@ static void epf_ntb_unbind(struct pci_epf *epf)
> > {
> > struct epf_ntb *ntb = epf_get_drvdata(epf);
> >
> > + pci_unregister_driver(&vntb_pci_driver);
> > +
> > + pci_lock_rescan_remove();
> > + pci_stop_root_bus(ntb->vpci_bus);
> > + pci_remove_root_bus(ntb->vpci_bus);
> > + pci_unlock_rescan_remove();
> > +
> > epf_ntb_epc_cleanup(ntb);
> > epf_ntb_config_spad_bar_free(ntb);
> > -
> > - pci_unregister_driver(&vntb_pci_driver);
> > }
> >
> > // EPF driver probe
> > --
> > 2.51.0
> >
>
> --
> மணிவண்ணன் சதாசிவம்