Re: [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev"
From: Saeed Mahameed
Date: Wed Sep 23 2020 - 18:42:21 EST
On Wed, 2020-09-23 at 22:44 +0200, Heiner Kallweit wrote:
> On 23.09.2020 22:15, David Miller wrote:
> > From: Heiner Kallweit <hkallweit1@xxxxxxxxx>
> > Date: Wed, 23 Sep 2020 21:58:59 +0200
> >
> > > On 23.09.2020 20:35, Saeed Mahameed wrote:
> > > > Why would a driver detach the device on ndo_stop() ?
> > > > seems like this is the bug you need to be chasing ..
> > > > which driver is doing this ?
> > > >
> > > Some drivers set the device to PCI D3hot at the end of ndo_stop()
> > > to save power (using e.g. Runtime PM). Marking the device as
> > > detached
> > > makes clear to to the net core that the device isn't accessible
> > > any
> > > longer.
> >
> > That being the case, the problem is that IFF_UP+!present is not a
> > valid netdev state.
> >
> If this combination is invalid, then netif_device_detach() should
> clear IFF_UP? At a first glance this should be sufficient to avoid
> the issue I was dealing with.
>
Feels like a work around and would conflict with the assumption that
netif_device_detach() should only be called when !IFF_UP
Maybe we need to clear IFF_UP before calling ops->ndo_stop(dev),
instead of after on __dev_close_many(). Assuming no driver is checking
IFF_UP state on its own ndo_stop(), other than this, the order
shouldn't really matter, since clearing the flag and calling ndo_stop()
should be considered as one atomic operation.
> > Is it simply the issue that, upon resume, IFF_UP is marked true
> > before
> > the device is brought out from D3hot state and thus marked as
> > present
> > again?
> >
> I can't really comment on that. The issue I was dealing with at the
> time I submitted this change was about an async linkwatch event
> (caused by powering down the PHY in ndo_stop) trying to access the
> device when it was powered down already.