Re: [linus:master] [mm] efa7df3e3b: kernel_BUG_at_include/linux/page_ref.h

From: Yang Shi
Date: Fri May 31 2024 - 12:51:36 EST


On Fri, May 31, 2024 at 1:24 AM kernel test robot <oliver.sang@xxxxxxxxx> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed "kernel_BUG_at_include/linux/page_ref.h" on:
>
> commit: efa7df3e3bb5da8e6abbe37727417f32a37fba47 ("mm: align larger anonymous mappings on THP boundaries")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc]
> [test failed on linux-next/master 6dc544b66971c7f9909ff038b62149105272d26a]
>
> in testcase: trinity
> version: trinity-x86_64-6a17c218-1_20240527
> with following parameters:
>
> runtime: 300s
> group: group-00
> nr_groups: 5
>
>
>
> compiler: gcc-13
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> we noticed the issue does not always happen. 34 times out of 50 runs as below.
> the parent is clean.
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :50 68% 34:50 dmesg.Kernel_panic-not_syncing:Fatal_exception
> :50 68% 34:50 dmesg.RIP:try_get_folio
> :50 68% 34:50 dmesg.invalid_opcode:#[##]
> :50 68% 34:50 dmesg.kernel_BUG_at_include/linux/page_ref.h
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> | Closes: https://lore.kernel.org/oe-lkp/202405311534.86cd4043-lkp@xxxxxxxxx
>
>
> [ 275.267158][ T4335] ------------[ cut here ]------------
> [ 275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275!
> [ 275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI
> [ 275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainted 6.7.0-rc4-00061-gefa7df3e3bb5 #1
> [ 275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
> [ 275.271159][ T4335] Code: c3 cc cc cc cc 44 89 e6 48 89 df e8 e4 54 11 00 eb ae 90 0f 0b 90 31 db eb d5 9c 58 0f 1f 40 00 f6 c4 02 0f 84 46 ff ff ff 90 <0f> 0b 48 c7 c6 a0 54 d2 87 48 89 df e8 a9 e9 ff ff 90 0f 0b be 04

If I read this BUG correctly, it is:

VM_BUG_ON(!in_atomic() && !irqs_disabled());

try_grab_folio() actually assumes it is in an atomic context (irq
disabled or preempt disabled) for this call path. This is achieved by
disabling irq in gup fast or calling it in rcu critical section in
page cache lookup path.

And try_grab_folio() is used when the folio is a large folio. The
bisected commit made the fuzzy test get PMD aligned address and large
folio more likely than before, and process_vm_readv/writev actually
doesn't take care of the large folio case at all. A properly aligned
address, for example, allocated by posix_memalign, should be able to
trigger this BUG even though the bisected commit doesn't exist.

We can't call pin_user_pages_remote() in rcu critical section since it
may sleep, and I don't think we have GUP fast remote either if I
remember correctly. It also doesn't make sense to disallow large folio
for process_vm_readv/writev either.

Maybe a new GUP flag or just use FOLL_LONGTERM to let GUP call
try_glab_folio() in rcu critical section? Added more GUP folks in this
loop.


> All code
> ========
> 0: c3 ret
> 1: cc int3
> 2: cc int3
> 3: cc int3
> 4: cc int3
> 5: 44 89 e6 mov %r12d,%esi
> 8: 48 89 df mov %rbx,%rdi
> b: e8 e4 54 11 00 call 0x1154f4
> 10: eb ae jmp 0xffffffffffffffc0
> 12: 90 nop
> 13: 0f 0b ud2
> 15: 90 nop
> 16: 31 db xor %ebx,%ebx
> 18: eb d5 jmp 0xffffffffffffffef
> 1a: 9c pushf
> 1b: 58 pop %rax
> 1c: 0f 1f 40 00 nopl 0x0(%rax)
> 20: f6 c4 02 test $0x2,%ah
> 23: 0f 84 46 ff ff ff je 0xffffffffffffff6f
> 29: 90 nop
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: 48 c7 c6 a0 54 d2 87 mov $0xffffffff87d254a0,%rsi
> 33: 48 89 df mov %rbx,%rdi
> 36: e8 a9 e9 ff ff call 0xffffffffffffe9e4
> 3b: 90 nop
> 3c: 0f 0b ud2
> 3e: be .byte 0xbe
> 3f: 04 .byte 0x4
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: 48 c7 c6 a0 54 d2 87 mov $0xffffffff87d254a0,%rsi
> 9: 48 89 df mov %rbx,%rdi
> c: e8 a9 e9 ff ff call 0xffffffffffffe9ba
> 11: 90 nop
> 12: 0f 0b ud2
> 14: be .byte 0xbe
> 15: 04 .byte 0x4
> [ 275.272813][ T4335] RSP: 0018:ffffc90005dcf650 EFLAGS: 00010202
> [ 275.273346][ T4335] RAX: 0000000000000246 RBX: ffffea00066e0000 RCX: 0000000000000000
> [ 275.274032][ T4335] RDX: fffff94000cdc007 RSI: 0000000000000004 RDI: ffffea00066e0034
> [ 275.274719][ T4335] RBP: ffffea00066e0000 R08: 0000000000000000 R09: fffff94000cdc006
> [ 275.275404][ T4335] R10: ffffea00066e0037 R11: 0000000000000000 R12: 0000000000000136
> [ 275.276106][ T4335] R13: ffffea00066e0034 R14: dffffc0000000000 R15: ffffea00066e0008
> [ 275.276790][ T4335] FS: 00007fa2f9b61740(0000) GS:ffffffff89d0d000(0000) knlGS:0000000000000000
> [ 275.277570][ T4335] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 275.278143][ T4335] CR2: 00007fa2f6c00000 CR3: 0000000134b04000 CR4: 00000000000406f0
> [ 275.278833][ T4335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 275.279521][ T4335] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 275.280201][ T4335] Call Trace:
> [ 275.280499][ T4335] <TASK>
> [ 275.280751][ T4335] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447)
> [ 275.281087][ T4335] ? do_trap (arch/x86/kernel/traps.c:112 arch/x86/kernel/traps.c:153)
> [ 275.281463][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
> [ 275.281884][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
> [ 275.282300][ T4335] ? do_error_trap (arch/x86/kernel/traps.c:174)
> [ 275.282711][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
> [ 275.283129][ T4335] ? handle_invalid_op (arch/x86/kernel/traps.c:212)
> [ 275.283561][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
> [ 275.283990][ T4335] ? exc_invalid_op (arch/x86/kernel/traps.c:264)
> [ 275.284415][ T4335] ? asm_exc_invalid_op (arch/x86/include/asm/idtentryh:568)
> [ 275.284859][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
> [ 275.285278][ T4335] try_grab_folio (mm/gup.c:148)
> [ 275.285684][ T4335] __get_user_pages (mm/gup.c:1297 (discriminator 1))
> [ 275.286111][ T4335] ? __pfx___get_user_pages (mm/gup.c:1188)
> [ 275.286579][ T4335] ? __pfx_validate_chain (kernel/locking/lockdep.c:3825)
> [ 275.287034][ T4335] ? mark_lock (kernel/locking/lockdep.c:4656 (discriminator 1))
> [ 275.287416][ T4335] __gup_longterm_locked (mm/gup.c:1509 mm/gup.c:2209)
> [ 275.288192][ T4335] ? __pfx___gup_longterm_locked (mm/gup.c:2204)
> [ 275.288697][ T4335] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
> [ 275.289135][ T4335] ? __pfx___might_resched (kernel/sched/core.c:10106)
> [ 275.289595][ T4335] pin_user_pages_remote (mm/gup.c:3350)
> [ 275.290041][ T4335] ? __pfx_pin_user_pages_remote (mm/gup.c:3350)
> [ 275.290545][ T4335] ? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1))
> [ 275.290961][ T4335] ? mm_access (kernel/fork.c:1573)
> [ 275.291353][ T4335] process_vm_rw_single_vec+0x142/0x360
> [ 275.291900][ T4335] ? __pfx_process_vm_rw_single_vec+0x10/0x10
> [ 275.292471][ T4335] ? mm_access (kernel/fork.c:1573)
> [ 275.292859][ T4335] process_vm_rw_core+0x272/0x4e0
> [ 275.293384][ T4335] ? hlock_class (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228)
> [ 275.293780][ T4335] ? __pfx_process_vm_rw_core+0x10/0x10
> [ 275.294350][ T4335] process_vm_rw (mm/process_vm_access.c:284)
> [ 275.294748][ T4335] ? __pfx_process_vm_rw (mm/process_vm_access.c:259)
> [ 275.295197][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1))
> [ 275.295634][ T4335] __x64_sys_process_vm_readv (mm/process_vm_access.c:291)
> [ 275.296139][ T4335] ? syscall_enter_from_user_mode (kernel/entry/commonc:94 kernel/entry/common.c:112)
> [ 275.296642][ T4335] do_syscall_64 (arch/x86/entry/common.c:51 (discriminator 1) arch/x86/entry/common.c:82 (discriminator 1))
> [ 275.297032][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1))
> [ 275.297470][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359)
> [ 275.297988][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
> [ 275.298389][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359)
> [ 275.298906][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
> [ 275.299304][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
> [ 275.299703][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
> [ 275.300115][ T4335] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
> [ 275.300622][ T4335] RIP: 0033:0x7fa2f9c65719
> [ 275.301011][ T4335] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48
> All code
> ========
> 0: 08 89 e8 5b 5d c3 or %cl,-0x3ca2a418(%rcx)
> 6: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> d: 00 00 00
> 10: 90 nop
> 11: 48 89 f8 mov %rdi,%rax
> 14: 48 89 f7 mov %rsi,%rdi
> 17: 48 89 d6 mov %rdx,%rsi
> 1a: 48 89 ca mov %rcx,%rdx
> 1d: 4d 89 c2 mov %r8,%r10
> 20: 4d 89 c8 mov %r9,%r8
> 23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
> 28: 0f 05 syscall
> 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
> 30: 73 01 jae 0x33
> 32: c3 ret
> 33: 48 8b 0d b7 06 0d 00 mov 0xd06b7(%rip),%rcx # 0xd06f1
> 3a: f7 d8 neg %eax
> 3c: 64 89 01 mov %eax,%fs:(%rcx)
> 3f: 48 rex.W
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
> 6: 73 01 jae 0x9
> 8: c3 ret
> 9: 48 8b 0d b7 06 0d 00 mov 0xd06b7(%rip),%rcx # 0xd06c7
> 10: f7 d8 neg %eax
> 12: 64 89 01 mov %eax,%fs:(%rcx)
> 15: 48 rex.W
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240531/202405311534.86cd4043-lkp@xxxxxxxxx
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>