Re: [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port

From: Dipayaan Roy

Date: Thu May 28 2026 - 17:27:08 EST


On Thu, May 28, 2026 at 11:30:39AM +0200, Paolo Abeni wrote:
> On 5/25/26 10:08 AM, Dipayaan Roy wrote:
> > When mana_per_port_queue_reset_work_handler() runs after a previous
> > detach succeeded but attach failed, the port is left in a detached
> > state with apc->tx_qp and apc->rxqs already freed. Calling
> > mana_detach() again unconditionally leads to NULL pointer dereferences
> > during queue teardown.
> >
> > Add an early exit in mana_detach() when the port is already in
> > detached state (!netif_device_present) for non-close callers, making
> > it safe to call idempotently. This allows the queue reset handler and
> > other recovery paths to simply retry mana_attach() without redundant
> > teardown.
> >
> > Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.")
> > Reviewed-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > Signed-off-by: Dipayaan Roy <dipayanroy@xxxxxxxxxxxxxxxxxxx>
> > ---
> > drivers/net/ethernet/microsoft/mana/mana_en.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 0582803907a8..1e1ad2795c3c 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -3350,6 +3350,12 @@ int mana_detach(struct net_device *ndev, bool from_close)
> >
> > ASSERT_RTNL();
> >
> > + /* If already detached (indicates detach succeeded but attach failed
> > + * previously). Now skip mana detach and just retry mana_attach.
> > + */
> > + if (!from_close && !netif_device_present(ndev))
> > + return 0;
> > +
> > apc->port_st_save = apc->port_is_up;
> > apc->port_is_up = false;
>
> sashiko(gemini) notes the above can lead to different race:
>
> ---
> Can this early return cause state machine corruption by bypassing the
> updates
> to apc->port_st_save?
> Consider this sequence:
> 1. queue_reset_work runs, mana_detach() succeeds (apc->port_st_save = true,
> apc->port_is_up = false), but mana_attach() fails.
> 2. The admin brings the interface down (ip link set dev eth0 down), skipping
> mana_close() since apc->port_is_up is false.
> 3. The admin changes the MTU, triggering mana_change_mtu() which calls
> mana_detach() followed by mana_attach().
> 4. mana_detach() hits this new early return, preserving
> apc->port_st_save == true.
> When mana_attach() runs, it sees apc->port_st_save == true and allocates
> queues, setting apc->vport_use_count = 1 and apc->port_is_up = true, even
> though the interface is administratively down.
> If the admin then brings the interface up, mana_open() will unconditionally
> call mana_alloc_queues(). That function calls mana_cfg_vport(), which will
> return -EBUSY because apc->vport_use_count is already 1.
> This leaves mana_open() failing and the interface down. Since the interface
> is already down, trying to bring it down again is a no-op, meaning
> mana_close() is never called to clean up the orphaned queues.
> Does this sequence permanently brick the port until the driver is reloaded?
> ---
>
> I think you need to be more restrictive in the early return check.
>
> /P
>
Hi Paolo,

Thank you for the comments,
I think the scenario pointed out by sashiko does not seems valid,
as it mentioned in step 2 and 3 admin changing MTU after bringing
interface down. This is becasue netif_set_mtu_ext() in dev.c checks
netif_device_present and returns -ENODEV before calling
ndo_change_mtu. So mana_change_mtu() is never reachable when the
device is in the !present state.

https://elixir.bootlin.com/linux/v7.0/source/net/core/dev.c#L9906

Please let me know if the above check is good enough?

Regards
Dipayaan Roy