RE: [EXTERNAL] Re: [PATCH net v2] net/mana: Fix auxiliary device double-delete race

From: Shiraz Saleem

Date: Wed Mar 25 2026 - 11:18:26 EST


> Subject: [EXTERNAL] Re: [PATCH net v2] net/mana: Fix auxiliary device double-
> delete race
>
> On Tue, 17 Mar 2026 07:39:43 -0700 Konstantin Taranov wrote:
> > Make remove_adev() safe to call concurrently from the service reset
> > and PCI eject paths by using xchg() to atomically claim the adev
> > pointer. This prevents double auxiliary_device_delete/uninit when
> > hv_eject_device_work races with the service reset workqueue.
>
> Really seems like you should add proper locking to these paths instead. Are the
> accesses to is_suspended, rdma_teardown etc really safe as is?

is_suspended is only accessed from mana_rdma_service_handle on the ordered service_wq - single-threaded by definition.

rdma_teardown is a one-way stop flag set in mana_rdma_remove() via WRITE_ONCE, with flush_workqueue providing ordering against the
READ_ONCE in the service handler. Concurrent writers are idempotent (both set true).

The field that actually races is gd->adev. Two remove_adev() callers on different workqueues can race - mana_serv_func on the events workqueue
vs hv_eject_device_work on PCI hot-remove - and this patch fixes it via xchg(). If we think mutex makes intent clearer, can switch.

>
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 9017e806e..9ae5f01d8 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -3410,14 +3410,18 @@ static void adev_release(struct device *dev)
> >
> > static void remove_adev(struct gdma_dev *gd) {
> > - struct auxiliary_device *adev = gd->adev;
> > - int id = adev->id;
> > + struct auxiliary_device *adev = xchg(&gd->adev, NULL);
>
> nit: avoid falling functions with side effects as variable init

Sure. Can fix.
>
> > + int id;
> > +
> > + if (!adev)
> > + return;