Re: [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev"

From: Saeed Mahameed
Date: Thu Sep 24 2020 - 01:49:45 EST


On Wed, 2020-09-23 at 17:23 -0700, David Miller wrote:
> From: David Miller <davem@xxxxxxxxxxxxx>
> Date: Wed, 23 Sep 2020 17:21:25 -0700 (PDT)
>
> > If an async code path tests 'present', gets true, and then the RTNL
> > holding synchronous code path puts the device into D3hot
> immediately
> > afterwards, the async code path will still continue and access the
> > chips registers and fault.
>
> Wait, is the sequence:
>
> ->ndo_stop()
> mark device not present and put into D3hot
> triggers linkwatch event
> ...
> ->ndo_get_stats64()
>
> ???
>

I assume it is, since normally device drivers do carrier_off() on
ndo_stop()

1) One problematic sequence would be
(for drivers doing D3hot on ndo_stop())

__dev_close_many()
->ndo_stop()
netif_device_detach() //Mark !present;
... D3hot
carrier_off()->linkwatch_event()
... // !present && IFF_UP

2) Another problematic scenario which i see is repeated in many
drivers:

shutdown/suspend()
rtnl_lock()
netif_device_detach()//Mark !present;
stop()->carrier_off()->linkwatch_event()
// at this point device is still IFF_UP and !present
// due to the early detach above..
rtnl_unlock();

For scenario 1) we can fix by marking IFF_UP at the beginning, but for
2), i think we need to fix the drivers to detach only after stop :(

> Then yeah we might have to clear IFF_UP at the beginning of taking
> a netdev down.