Re: [PATCH net v3] net/mana: Fix auxiliary device double-delete race
From: Jakub Kicinski
Date: Tue May 05 2026 - 21:28:45 EST
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mana: Fix auxiliary device double-delete race
This commit modifies remove_adev() to use xchg() to atomically claim the
auxiliary device pointer. This change is intended to prevent a double-free
race between the PCI eject path and the service reset workqueue.
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index a654b3699c4c5..dd4f4215ab434 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -3465,14 +3465,19 @@ static void adev_release(struct device *dev)
>
> static void remove_adev(struct gdma_dev *gd)
> {
> - struct auxiliary_device *adev = gd->adev;
> - int id = adev->id;
> + struct auxiliary_device *adev;
> + int id;
> +
> + adev = xchg(&gd->adev, NULL);
> + if (!adev)
> + return;
Does this xchg() operation act as an ad-hoc synchronization gate rather
than properly synchronizing the concurrent teardown paths?
While using xchg() ensures the adev pointer itself is only freed once, the
rest of the teardown sequence appears to remain unprotected. If the PCI
remove path and the service reset workqueue run concurrently, one thread
might successfully claim and process the adev, causing the other thread to
return early from remove_adev().
However, the thread returning early will then continue its execution into
the rest of mana_remove() or mana_rdma_remove(). This could allow the first
thread to proceed and free critical structures:
mana_remove() {
...
remove_adev(gd);
...
free_netdev(ndev);
...
kfree(ac);
}
while the second thread concurrently iterates and accesses those same
structures.
Could this concurrent execution lead to a use-after-free on ndev or ac?
Should the teardown sequence be synchronized using a proper lock, such as
the device lock, or by synchronously canceling the service work in the
remove path, instead of relying on an atomic operation?
[ ... ]
--
pw-bot: cr