Re: [PATCH] net: called rtnl_unlock() before runpm resumes devices

From: AceLan Kao
Date: Thu Apr 22 2021 - 02:30:54 EST


Yes, should add

Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach")
and also
Fixes: 9513d2a5dc7f ("igc: Add legacy power management support")

Jakub Kicinski <kuba@xxxxxxxxxx> 於 2021年4月21日 週三 上午3:27寫道:
>
> On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote:
> > On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao <acelan.kao@xxxxxxxxxxxxx> wrote:
> > >
> > > From: "Chia-Lin Kao (AceLan)" <acelan.kao@xxxxxxxxxxxxx>
> > >
> > > The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in
> > > __dev_open() it calls pm_runtime_resume() to resume devices, and in
> > > some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
> > > again. That leads to a recursive lock.
> > >
> > > It should leave the devices' resume function to decide if they need to
> > > call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling
> > > pm_runtime_resume() and then call rtnl_lock() after it in __dev_open().
> > >
> > >
> >
> > Hi Acelan
> >
> > When was the bugg added ?
> > Please add a Fixes: tag
>
> For immediate cause probably:
>
> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach")
>
> > By doing so, you give more chances for reviewers to understand why the
> > fix is not risky,
> > and help stable teams work.
>
> IMO the driver lacks internal locking. Taking rtnl from resume is just
> one example, git history shows many more places that lacked locking and
> got papered over with rtnl here.