RE: [EXTERNAL] Re: [PATCH net v3] net/mana: Fix auxiliary device double-delete race
From: Shiraz Saleem
Date: Mon May 18 2026 - 12:39:34 EST
> Subject: [EXTERNAL] Re: [PATCH net v3] net/mana: Fix auxiliary device double-
> delete race
>
> This is an AI-generated review of your patch. The human sending this email has
> considered the AI review valid, or at least plausible.
> ---
> net/mana: Fix auxiliary device double-delete race
>
> This commit modifies remove_adev() to use xchg() to atomically claim the
> auxiliary device pointer. This change is intended to prevent a double-free race
> between the PCI eject path and the service reset workqueue.
>
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index a654b3699c4c5..dd4f4215ab434 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -3465,14 +3465,19 @@ static void adev_release(struct device *dev)
> >
> > static void remove_adev(struct gdma_dev *gd) {
> > - struct auxiliary_device *adev = gd->adev;
> > - int id = adev->id;
> > + struct auxiliary_device *adev;
> > + int id;
> > +
> > + adev = xchg(&gd->adev, NULL);
> > + if (!adev)
> > + return;
>
> Does this xchg() operation act as an ad-hoc synchronization gate rather than
> properly synchronizing the concurrent teardown paths?
>
> While using xchg() ensures the adev pointer itself is only freed once, the rest of
> the teardown sequence appears to remain unprotected. If the PCI remove path
> and the service reset workqueue run concurrently, one thread might successfully
> claim and process the adev, causing the other thread to return early from
> remove_adev().
>
> However, the thread returning early will then continue its execution into the rest
> of mana_remove() or mana_rdma_remove(). This could allow the first thread to
> proceed and free critical structures:
>
> mana_remove() {
> ...
> remove_adev(gd);
> ...
> free_netdev(ndev);
> ...
> kfree(ac);
> }
>
> while the second thread concurrently iterates and accesses those same
> structures.
>
> Could this concurrent execution lead to a use-after-free on ndev or ac?
> Should the teardown sequence be synchronized using a proper lock, such as the
> device lock, or by synchronously canceling the service work in the remove path,
> instead of relying on an atomic operation?
>
> [ ... ]
> --
> pw-bot: cr
Agree with this feedback.
We need proper serialization between the teardown paths in the service worker and PCI eject worker, as there is other shared driver state at risk (gd->driver_data (ac), ac->eqs, hwc->caller_ctx).
Dipayaan is working on properly synchronizing these paths, which should resolve this double device-delete bug as well. Konstantin, let's drop this patch in favor of that.
Shiraz