Re: [syzbot] [net?] BUG: sleeping function called from invalid context in synchronize_net

From: Jay Vosburgh
Date: Thu Jul 11 2024 - 17:25:28 EST


syzbot <syzbot+9b277e2c2076e2661f61@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit: 523b23f0bee3 Add linux-next specific files for 20240710
>git tree: linux-next
>console output: https://syzkaller.appspot.com/x/log.txt?x=10d88fb9980000
>kernel config: https://syzkaller.appspot.com/x/.config?x=98dd8c4bab5cdce
>dashboard link: https://syzkaller.appspot.com/bug?extid=9b277e2c2076e2661f61
>compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>
>Unfortunately, I don't have any reproducer for this issue yet.

Looking at the code, I'm pretty sure this can happen when:

a) the bond is configured with use_carrier = 0; this causes
bonding to query ethtool to get carrier state.

b) bond_mii_monitor runs while dev->link_watch_list is
non-empty, in this case with a pending link down transition. In this
case, linkwatch_sync_dev (called by ethtool_op_get_link) will in turn
call linkwatch_do_dev -> dev_deactivate -> dev_deactivate_many ->
synchronize_net -> might_sleep.

The use_carrier option was added 20-ish years ago for device
drivers that didn't do netif_carrier_on / _off at the time.

I was about to say that I'd expect all drivers today to do
netif_carrier, but 1386c36b3038 ("bonding: allow carrier and link status
to determine link state") suggests that something needed it as recently
as 2018.

-J

>Downloadable assets:
>disk image: https://storage.googleapis.com/syzbot-assets/345bcd25ed2f/disk-523b23f0.raw.xz
>vmlinux: https://storage.googleapis.com/syzbot-assets/a4508962d345/vmlinux-523b23f0.xz
>kernel image: https://storage.googleapis.com/syzbot-assets/4ba5eb555639/bzImage-523b23f0.xz
>
>IMPORTANT: if you fix the issue, please add the following tag to the commit:
>Reported-by: syzbot+9b277e2c2076e2661f61@xxxxxxxxxxxxxxxxxxxxxxxxx
>
>BUG: sleeping function called from invalid context at net/core/dev.c:11239
>in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/u8:1
>preempt_count: 0, expected: 0
>RCU nest depth: 1, expected: 0
>INFO: lockdep is turned off.
>CPU: 1 UID: 0 PID: 12 Comm: kworker/u8:1 Not tainted 6.10.0-rc7-next-20240710-syzkaller #0
>Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
>Workqueue: bond0 bond_mii_monitor
>Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
> __might_resched+0x5d4/0x780 kernel/sched/core.c:8526
> synchronize_net+0x1b/0x50 net/core/dev.c:11239
> dev_deactivate_many+0x4a7/0xb10 net/sched/sch_generic.c:1371
> dev_deactivate+0x184/0x280 net/sched/sch_generic.c:1397
> linkwatch_do_dev+0x10a/0x170 net/core/link_watch.c:175
> ethtool_op_get_link+0x15/0x60 net/ethtool/ioctl.c:62
> bond_check_dev_link+0x1f1/0x3f0 drivers/net/bonding/bond_main.c:757
> bond_miimon_inspect drivers/net/bonding/bond_main.c:2604 [inline]
> bond_mii_monitor+0x49a/0x3170 drivers/net/bonding/bond_main.c:2826
> process_one_work kernel/workqueue.c:3228 [inline]
> process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3309
> worker_thread+0x86d/0xd40 kernel/workqueue.c:3387
> kthread+0x2f0/0x390 kernel/kthread.c:389
> ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> </TASK>
>------------[ cut here ]------------
>Voluntary context switch within RCU read-side critical section!
>WARNING: CPU: 1 PID: 12 at kernel/rcu/tree_plugin.h:330 rcu_note_context_switch+0xcf4/0xff0 kernel/rcu/tree_plugin.h:330
>Modules linked in:
>CPU: 1 UID: 0 PID: 12 Comm: kworker/u8:1 Tainted: G W 6.10.0-rc7-next-20240710-syzkaller #0
>Tainted: [W]=WARN
>Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
>Workqueue: bond0 bond_mii_monitor
>RIP: 0010:rcu_note_context_switch+0xcf4/0xff0 kernel/rcu/tree_plugin.h:330
>Code: 00 ba 02 00 00 00 e8 bb 02 fe ff 4c 8b b4 24 80 00 00 00 eb 91 c6 05 a4 4f 1f 0e 01 90 48 c7 c7 e0 2d cc 8b e8 ad c5 da ff 90 <0f> 0b 90 90 e9 3b f4 ff ff 90 0f 0b 90 45 84 ed 0f 84 00 f4 ff ff
>RSP: 0018:ffffc90000116f00 EFLAGS: 00010046
>RAX: b02efd3a29e78a00 RBX: ffff8880172cde44 RCX: ffff8880172cda00
>RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>RBP: ffffc90000117050 R08: ffffffff815583f2 R09: fffffbfff1c39f8c
>R10: dffffc0000000000 R11: fffffbfff1c39f8c R12: ffff8880172cda00
>R13: 0000000000000000 R14: 1ffff92000022df8 R15: dffffc0000000000
>FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>CR2: 000000110c444d58 CR3: 000000006ca0e000 CR4: 00000000003506f0
>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>Call Trace:
> <TASK>
> __schedule+0x348/0x4a60 kernel/sched/core.c:6491
> __schedule_loop kernel/sched/core.c:6680 [inline]
> schedule+0x14b/0x320 kernel/sched/core.c:6695
> schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6752
> __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> __mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
> exp_funnel_lock kernel/rcu/tree_exp.h:329 [inline]
> synchronize_rcu_expedited+0x451/0x830 kernel/rcu/tree_exp.h:967
> synchronize_rcu+0x11b/0x360 kernel/rcu/tree.c:4020
> dev_deactivate_many+0x4a7/0xb10 net/sched/sch_generic.c:1371
> dev_deactivate+0x184/0x280 net/sched/sch_generic.c:1397
> linkwatch_do_dev+0x10a/0x170 net/core/link_watch.c:175
> ethtool_op_get_link+0x15/0x60 net/ethtool/ioctl.c:62
> bond_check_dev_link+0x1f1/0x3f0 drivers/net/bonding/bond_main.c:757
> bond_miimon_inspect drivers/net/bonding/bond_main.c:2604 [inline]
> bond_mii_monitor+0x49a/0x3170 drivers/net/bonding/bond_main.c:2826
> process_one_work kernel/workqueue.c:3228 [inline]
> process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3309
> worker_thread+0x86d/0xd40 kernel/workqueue.c:3387
> kthread+0x2f0/0x390 kernel/kthread.c:389
> ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> </TASK>
>
>
>---
>This report is generated by a bot. It may contain errors.
>See https://goo.gl/tpsmEJ for more information about syzbot.
>syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
>
>syzbot will keep track of this issue. See:
>https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
>If the report is already addressed, let syzbot know by replying with:
>#syz fix: exact-commit-title
>
>If you want to overwrite report's subsystems, reply with:
>#syz set subsystems: new-subsystem
>(See the list of subsystem names on the web dashboard)
>
>If the report is a duplicate of another one, reply with:
>#syz dup: exact-subject-of-another-report
>
>If you want to undo deduplication, reply with:
>#syz undup
>

---
-Jay Vosburgh, jv@xxxxxxxxxxxxx