Re: [syzbot] [mm?] [arch?] BUG: sleeping function called from invalid context in __tlb_batch_free_encoded_pages

From: Will Deacon

Date: Wed May 27 2026 - 08:09:25 EST


I finally got a chance to look at this one...

On Thu, Apr 30, 2026 at 12:21:35AM -0700, syzbot wrote:
> syzbot found the following issue on:
>
> HEAD commit: dca922e019dd Merge tag 'xsa48x-7.1-tag' of git://git.kerne..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11cd6b6c580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=59da38148f3a3d24
> dashboard link: https://syzkaller.appspot.com/bug?extid=a169a27b0538ba43e5d3
> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-dca922e0.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/7b447b1b93a9/vmlinux-dca922e0.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/af7830f5dabf/bzImage-dca922e0.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+a169a27b0538ba43e5d3@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> BUG: sleeping function called from invalid context at mm/mmu_gather.c:142
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5677, name: rm
> preempt_count: 0, expected: 0
> RCU nest depth: 1, expected: 0
> 2 locks held by rm/5677:
> #0: ffff888022c20338 (&mm->mmap_lock){++++}-{4:4}, at: mmap_write_lock include/linux/mmap_lock.h:536 [inline]
> #0: ffff888022c20338 (&mm->mmap_lock){++++}-{4:4}, at: exit_mmap+0x22c/0xa10 mm/mmap.c:1308
> #1: ffffffff8e7e54e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire.constprop.0+0x7/0x30 include/linux/rcupdate.h:300
> CPU: 1 UID: 0 PID: 5677 Comm: rm Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
> __might_resched.cold+0x1ec/0x232 kernel/sched/core.c:9162
> __tlb_batch_free_encoded_pages+0x11e/0x280 mm/mmu_gather.c:142
> tlb_batch_pages_flush mm/mmu_gather.c:151 [inline]

... so we're getting deep in the gather code tearing down an mm, where
we free the unmapped pages but we're somehow inside an RCU read-side
critical section.

The last report on the sysbot dashboard is from 30th April, which
coincides with 99ebc509eef5 ("mm: memcontrol: fix rcu unbalance in
get_non_dying_memcg_end()") landing upstream. Unfortunately, there's no
reproducer available to test that concretely but it looks like we can
end up in there via the page freeing path. So hopefully this is fixed.

As an aside, it's a bit of a pity that the rcu_read_lock() callsite is
identified only as the useless rcu_lock_acquire.constprop.0() function
in this backtrace.

Will

> tlb_flush_mmu_free mm/mmu_gather.c:417 [inline]
> tlb_flush_mmu mm/mmu_gather.c:424 [inline]
> tlb_finish_mmu+0x1b0/0x810 mm/mmu_gather.c:549
> exit_mmap+0x454/0xa10 mm/mmap.c:1313
> __mmput+0x12a/0x410 kernel/fork.c:1178
> mmput+0x67/0x80 kernel/fork.c:1201
> exit_mm kernel/exit.c:581 [inline]
> do_exit+0x833/0x2a60 kernel/exit.c:963
> do_group_exit+0xd5/0x2a0 kernel/exit.c:1117
> __do_sys_exit_group kernel/exit.c:1128 [inline]
> __se_sys_exit_group kernel/exit.c:1126 [inline]
> __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1126
> x64_sys_call+0x102c/0x1530 arch/x86/include/generated/asm/syscalls_64.h:232
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f9fc102d6c5
> Code: Unable to access opcode bytes at 0x7f9fc102d69b.
> RSP: 002b:00007ffd51104bf8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 00007f9fc112efe8 RCX: 00007f9fc102d6c5
> RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
> RBP: 0000000000000001 R08: 00007ffd51104b88 R09: 0000000000000000
> R10: 00007ffd51104a20 R11: 0000000000000202 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007f9fc112d680 R15: 00007f9fc112f000
> </TASK>
>
> ====================================
> WARNING: rm/5677 still has locks held!
> syzkaller #0 Tainted: G W
> ------------------------------------
> 1 lock held by rm/5677:
> #0: ffffffff8e7e54e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire.constprop.0+0x7/0x30 include/linux/rcupdate.h:300
>
> stack backtrace:
> CPU: 1 UID: 0 PID: 5677 Comm: rm Tainted: G W syzkaller #0 PREEMPT(full)
> Tainted: [W]=WARN
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
> print_held_locks_bug kernel/locking/lockdep.c:6752 [inline]
> debug_check_no_locks_held+0x90/0xa0 kernel/locking/lockdep.c:6760
> do_exit+0x13ea/0x2a60 kernel/exit.c:997
> do_group_exit+0xd5/0x2a0 kernel/exit.c:1117
> __do_sys_exit_group kernel/exit.c:1128 [inline]
> __se_sys_exit_group kernel/exit.c:1126 [inline]
> __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1126
> x64_sys_call+0x102c/0x1530 arch/x86/include/generated/asm/syscalls_64.h:232
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f9fc102d6c5
> Code: Unable to access opcode bytes at 0x7f9fc102d69b.
> RSP: 002b:00007ffd51104bf8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 00007f9fc112efe8 RCX: 00007f9fc102d6c5
> RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
> RBP: 0000000000000001 R08: 00007ffd51104b88 R09: 0000000000000000
> R10: 00007ffd51104a20 R11: 0000000000000202 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007f9fc112d680 R15: 00007f9fc112f000
> </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup