Re: BUG: unable to handle page fault for address: 0000000000030368

From: Marco Elver
Date: Tue Apr 09 2024 - 15:22:32 EST


On Thu, 28 Mar 2024 at 17:17, Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote:
>
> Dear Marco, dear Linux folks,
>
>
> Am 26.03.24 um 13:44 schrieb Paul Menzel:
> > [Cc: +X86 maintainers]
>
> > Thank you for your quick reply. (Note, that your mailer wrapped the
> > pasted lines.)
> >
> > Am 26.03.24 um 11:07 schrieb Marco Elver:
> >> On Tue, 26 Mar 2024 at 10:23, Paul Menzel wrote:
> >
> >>> Trying KCSAN the first time – configuration attached –, it fails to boot
> >>> on the Dell XPS 13 9360 and QEMU q35. I couldn’t get logs on the Dell
> >>> XPS 13 9360, so here are the QEMU ones:
> >>
> >> If there's a bad access somewhere which is instrumented by KCSAN, it
> >> will unfortunately still crash inside KCSAN.
> >>
> >> What happens if you compile with CONFIG_KCSAN_EARLY_ENABLE=n? It
> >> disables KCSAN (but otherwise the kernel image is the same) and
> >> requires turning it on manually with "echo on >
> >> /sys/kernel/debug/kcsan" after boot.
> >>
> >> If it still crashes, then there's definitely a bug elsewhere. If it
> >> doesn't crash, and only crashes with KCSAN enabled, my guess is that
> >> KCSAN's delays of individual threads are perturbing execution to
> >> trigger previously undetected bugs.
> >
> > Such a Linux kernel booted with a warning on the Dell XPS 13 9360 (but
> > booted with *no* warning on QEMU q35) [1], but enabling KCSAN on the
> > laptop hangs the laptop right away. I couldn’t get any logs of the laptop.
>
> In the QEMU q35 virtual machine `echo on | sudo tee
> /sys/kernel/debug/kcsan` also locks up the system. Please find the logs
> attached.
>
> [ 78.241245] BUG: unable to handle page fault for address:
> 0000000000019a18
> [ 78.242815] #PF: supervisor read access in kernel mode
> [ 78.244001] #PF: error_code(0x0000) - not-present page
> [ 78.245186] PGD 0 P4D 0
> [ 78.245828] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 78.246878] CPU: 4 PID: 783 Comm: sudo Not tainted 6.9.0-rc1+ #83
> [ 78.248289] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
> [ 78.250763] RIP: 0010:kcsan_setup_watchpoint+0x2b3/0x400
> [ 78.252108] Code: ea 00 f0 48 ff 05 25 b4 8f 02 eb e0 65 48 8b
> 05 7b 53 23 4f 48 8d 98 c0 02 03 00 e9 9f fd ff ff 48 83 fd 08 0f 85 fd
> 00 00 00 <4d> 8b 04 24 e9 bf fe ff ff 49 85 d1 75 54 ba 01 00 00 00 4a 84
> [ 78.256284] RSP: 0018:ffffbae1c0f5bc48 EFLAGS: 00010046
> [ 78.257548] RAX: 0000000000000000 RBX: ffff9b95c4ba93b0 RCX:
> 0000000000000019
> [ 78.259158] RDX: 0000000000000001 RSI: ffffffffb0f82d36 RDI:
> 0000000000000000
> [ 78.260781] RBP: 0000000000000008 R08: 00000000aaaaaaab R09:
> 0000000000000000
> [ 78.262417] R10: 0000000000000086 R11: 0010000000019a18 R12:
> 0000000000019a18
> [ 78.264040] R13: 000000000000001a R14: 0000000000000000 R15:
> 0000000000000000
> [ 78.265658] FS: 00007f65e3a91f00(0000)
> GS:ffff9b9d1f000000(0000) knlGS:0000000000000000
> [ 78.267480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 78.268804] CR2: 0000000000019a18 CR3: 0000000102e26000 CR4:
> 00000000003506f0
> [ 78.270424] Call Trace:
> [ 78.271036] <TASK>
> [ 78.271572] ? __die+0x23/0x70
> [ 78.272344] ? page_fault_oops+0x173/0x4f0
> [ 78.273400] ? exc_page_fault+0x81/0x190
> [ 78.274373] ? asm_exc_page_fault+0x26/0x30
> [ 78.275395] ? refill_obj_stock+0x36/0x2e0
> [ 78.276410] ? kcsan_setup_watchpoint+0x2b3/0x400
> [ 78.277556] refill_obj_stock+0x36/0x2e0
> [ 78.278540] obj_cgroup_uncharge+0x13/0x20
> [ 78.279596] __memcg_slab_free_hook+0xac/0x140
> [ 78.280661] ? free_pipe_info+0x135/0x150
> [ 78.281631] kfree+0x2de/0x310
> [ 78.282419] free_pipe_info+0x135/0x150
> [ 78.283395] pipe_release+0x188/0x1a0
> [ 78.284303] __fput+0x127/0x4e0
> [ 78.285114] __fput_sync+0x35/0x40
> [ 78.285958] __x64_sys_close+0x54/0xa0
> [ 78.286914] do_syscall_64+0x88/0x1a0
> [ 78.287810] ? fpregs_assert_state_consistent+0x7e/0x90
> [ 78.289185] ? srso_return_thunk+0x5/0x5f
> [ 78.290203] ? arch_exit_to_user_mode_prepare.isra.0+0x69/0xa0
> [ 78.291568] ? srso_return_thunk+0x5/0x5f
> [ 78.292518] ? syscall_exit_to_user_mode+0x40/0xe0
> [ 78.293651] ? srso_return_thunk+0x5/0x5f
> [ 78.294606] ? do_syscall_64+0x94/0x1a0
> [ 78.295516] ? arch_exit_to_user_mode_prepare.isra.0+0x69/0xa0
> [ 78.296876] ? srso_return_thunk+0x5/0x5f
>
> Can you reproduce this?

This seems to be a compiler issue with a new feature introduced in
6.9-rc1, and it's fixed in 6.9-rc2. It was fixed by: b6540de9b5c8
x86/percpu: Disable named address spaces for KCSAN