Re: [RESEND][PATCH] net: called rtnl_unlock() before runpm resumes devices

From: AceLan Kao
Date: Sun Aug 22 2021 - 23:35:47 EST


Leon Romanovsky <leon@xxxxxxxxxx> 於 2021年8月10日 週二 下午3:08寫道:
>
> On Tue, Aug 10, 2021 at 09:57:57AM +0800, AceLan Kao wrote:
> > Leon Romanovsky <leon@xxxxxxxxxx> 於 2021年8月9日 週一 下午1:51寫道:
> > >
> > > On Mon, Aug 09, 2021 at 11:28:09AM +0800, AceLan Kao wrote:
> > > > From: "Chia-Lin Kao (AceLan)" <acelan.kao@xxxxxxxxxxxxx>
> > > >
> > > > The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in
> > > > __dev_open() it calls pm_runtime_resume() to resume devices, and in
> > > > some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
> > > > again. That leads to a recursive lock.
> > > >
> > > > It should leave the devices' resume function to decide if they need to
> > > > call rtnl_lock()/rtnl_unlock(),
> > >
> > > Why? It doesn't sound right that drivers internally decide if to take or
> > > release some external to them lock without seeing full picture.
> > From what I observed, this is the only calling path that acquired
> > rtnl_lock() before calling drivers' resume function.
> > So, it encounters recursive lock while driver is going to cal rtnl_lock() again.
>
> I clearly see the problem, but don't agree with a solution.
>
> >
> > >
> > > Most of the time, device driver authors do it wrong. I afraid that igs
> > > is one of such drivers that did it wrong.
> > The issues could be if we remove rtnl_lock in device drivers, then in
> > other calling path, it won't be protected by the rtnl lock,
> > and maybe we shouldn't call pm_runtime_resume() here(within rtnl
> > lock), for device drivers don't know if they are protected by the rtnl
> > lock while their resume() got called.
>
> This is exactly the problem, every driver guesses if rtnl_lock is needed
> or not in specific path. It is wrong by design. You should ensure that
> all paths that are triggered through rtnl should hold rtnl_lock.
Hi Jesse, Tony,

As you are the Intel Ethernet drivers' maintainers, do you have any
idea about this?
We can reproduce this issue on the machine with PCI Ethernet card
using igb or igc driver.

>
> You dropped rtnl_lock() without any protection, it is 100% bug.
>
> Thanks
>
> >
> > >
> > > Thanks
> > >
> > > > so call rtnl_unlock() before calling pm_runtime_resume() and then call
> > > > rtnl_lock() after it in __dev_open().
> > > >
> > > > [ 967.723577] INFO: task ip:6024 blocked for more than 120 seconds.
> > > > [ 967.723588] Not tainted 5.12.0-rc3+ #1
> > > > [ 967.723592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > [ 967.723594] task:ip state:D stack: 0 pid: 6024 ppid: 5957 flags:0x00004000
> > > > [ 967.723603] Call Trace:
> > > > [ 967.723610] __schedule+0x2de/0x890
> > > > [ 967.723623] schedule+0x4f/0xc0
> > > > [ 967.723629] schedule_preempt_disabled+0xe/0x10
> > > > [ 967.723636] __mutex_lock.isra.0+0x190/0x510
> > > > [ 967.723644] __mutex_lock_slowpath+0x13/0x20
> > > > [ 967.723651] mutex_lock+0x32/0x40
> > > > [ 967.723657] rtnl_lock+0x15/0x20
> > > > [ 967.723665] igb_resume+0xee/0x1d0 [igb]
> > > > [ 967.723687] ? pci_pm_default_resume+0x30/0x30
> > > > [ 967.723696] igb_runtime_resume+0xe/0x10 [igb]
> > > > [ 967.723713] pci_pm_runtime_resume+0x74/0x90
> > > > [ 967.723718] __rpm_callback+0x53/0x1c0
> > > > [ 967.723725] rpm_callback+0x57/0x80
> > > > [ 967.723730] ? pci_pm_default_resume+0x30/0x30
> > > > [ 967.723735] rpm_resume+0x547/0x760
> > > > [ 967.723740] __pm_runtime_resume+0x52/0x80
> > > > [ 967.723745] __dev_open+0x56/0x160
> > > > [ 967.723753] ? _raw_spin_unlock_bh+0x1e/0x20
> > > > [ 967.723758] __dev_change_flags+0x188/0x1e0
> > > > [ 967.723766] dev_change_flags+0x26/0x60
> > > > [ 967.723773] do_setlink+0x723/0x10b0
> > > > [ 967.723782] ? __nla_validate_parse+0x5b/0xb80
> > > > [ 967.723792] __rtnl_newlink+0x594/0xa00
> > > > [ 967.723800] ? nla_put_ifalias+0x38/0xa0
> > > > [ 967.723807] ? __nla_reserve+0x41/0x50
> > > > [ 967.723813] ? __nla_reserve+0x41/0x50
> > > > [ 967.723818] ? __kmalloc_node_track_caller+0x49b/0x4d0
> > > > [ 967.723824] ? pskb_expand_head+0x75/0x310
> > > > [ 967.723830] ? nla_reserve+0x28/0x30
> > > > [ 967.723835] ? skb_free_head+0x25/0x30
> > > > [ 967.723843] ? security_sock_rcv_skb+0x2f/0x50
> > > > [ 967.723850] ? netlink_deliver_tap+0x3d/0x210
> > > > [ 967.723859] ? sk_filter_trim_cap+0xc1/0x230
> > > > [ 967.723863] ? skb_queue_tail+0x43/0x50
> > > > [ 967.723870] ? sock_def_readable+0x4b/0x80
> > > > [ 967.723876] ? __netlink_sendskb+0x42/0x50
> > > > [ 967.723888] ? security_capable+0x3d/0x60
> > > > [ 967.723894] ? __cond_resched+0x19/0x30
> > > > [ 967.723900] ? kmem_cache_alloc_trace+0x390/0x440
> > > > [ 967.723906] rtnl_newlink+0x49/0x70
> > > > [ 967.723913] rtnetlink_rcv_msg+0x13c/0x370
> > > > [ 967.723920] ? _copy_to_iter+0xa0/0x460
> > > > [ 967.723927] ? rtnl_calcit.isra.0+0x130/0x130
> > > > [ 967.723934] netlink_rcv_skb+0x55/0x100
> > > > [ 967.723939] rtnetlink_rcv+0x15/0x20
> > > > [ 967.723944] netlink_unicast+0x1a8/0x250
> > > > [ 967.723949] netlink_sendmsg+0x233/0x460
> > > > [ 967.723954] sock_sendmsg+0x65/0x70
> > > > [ 967.723958] ____sys_sendmsg+0x218/0x290
> > > > [ 967.723961] ? copy_msghdr_from_user+0x5c/0x90
> > > > [ 967.723966] ? lru_cache_add_inactive_or_unevictable+0x27/0xb0
> > > > [ 967.723974] ___sys_sendmsg+0x81/0xc0
> > > > [ 967.723980] ? __mod_memcg_lruvec_state+0x22/0xe0
> > > > [ 967.723987] ? kmem_cache_free+0x244/0x420
> > > > [ 967.723991] ? dentry_free+0x37/0x70
> > > > [ 967.723996] ? mntput_no_expire+0x4c/0x260
> > > > [ 967.724001] ? __cond_resched+0x19/0x30
> > > > [ 967.724007] ? security_file_free+0x54/0x60
> > > > [ 967.724013] ? call_rcu+0xa4/0x250
> > > > [ 967.724021] __sys_sendmsg+0x62/0xb0
> > > > [ 967.724026] ? exit_to_user_mode_prepare+0x3d/0x1a0
> > > > [ 967.724032] __x64_sys_sendmsg+0x1f/0x30
> > > > [ 967.724037] do_syscall_64+0x38/0x90
> > > > [ 967.724044] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > >
> > > > Fixes: bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")
> > > > Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@xxxxxxxxxxxxx>
> > > > ---
> > > > net/core/dev.c | 5 ++++-
> > > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 8f1a47ad6781..dd43a29419fd 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -1585,8 +1585,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
> > > >
> > > > if (!netif_device_present(dev)) {
> > > > /* may be detached because parent is runtime-suspended */
> > > > - if (dev->dev.parent)
> > > > + if (dev->dev.parent) {
> > > > + rtnl_unlock();
> > > > pm_runtime_resume(dev->dev.parent);
> > > > + rtnl_lock();
> > > > + }
> > > > if (!netif_device_present(dev))
> > > > return -ENODEV;
> > > > }
> > > > --
> > > > 2.25.1
> > > >