Re: [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev"

From: David Miller
Date: Wed Sep 23 2020 - 20:21:29 EST


From: Saeed Mahameed <saeed@xxxxxxxxxx>
Date: Wed, 23 Sep 2020 15:42:17 -0700

> Maybe we need to clear IFF_UP before calling ops->ndo_stop(dev),
> instead of after on __dev_close_many(). Assuming no driver is checking
> IFF_UP state on its own ndo_stop(), other than this, the order
> shouldn't really matter, since clearing the flag and calling ndo_stop()
> should be considered as one atomic operation.

This is my biggest concern, that some ndo_stop, or some helper called
by ndo_stop, checks IFF_UP or similar.

There is also something else. We have both synchronous and async code
that checks state like IFF_UP and 'present' and makes a decision based
upon that.

If an async code path tests 'present', gets true, and then the RTNL
holding synchronous code path puts the device into D3hot immediately
afterwards, the async code path will still continue and access the
chips registers and fault.

I'm saying all of this because the only way this bug makes sense
is if the ->ndo_stop() sequence that marks the device !present and
then clears IFF_UP runs with the RTNL mutex held, and the code path
that tests this state in the linkwatch bits in question do not.