Re: [syzbot] [kernel?] BUG: unable to handle kernel NULL pointer dereference in __hrtimer_run_queues

From: Will Deacon
Date: Tue Jun 04 2024 - 09:35:00 EST


Hi Thomas,

On Tue, Jun 04, 2024 at 02:29:57PM +0200, Thomas Gleixner wrote:
> On Mon, Jun 03 2024 at 03:22, syzbot wrote:
> Cc+ ARM64 folks
>
> Content untrimmed for reference.

Thanks! I'll trim it now...

> > __clear_young_dirty_ptes arch/arm64/include/asm/pgtable.h:1311 [inline]
> > contpte_clear_young_dirty_ptes+0x68/0x128 arch/arm64/mm/contpte.c:389
> > walk_pmd_range mm/pagewalk.c:143 [inline]
> > walk_pud_range mm/pagewalk.c:221 [inline]
> > walk_p4d_range mm/pagewalk.c:256 [inline]
> > walk_pgd_range+0x4b0/0x8a4 mm/pagewalk.c:293
> > __walk_page_range+0x178/0x180 mm/pagewalk.c:395
> > walk_page_range+0x144/0x224 mm/pagewalk.c:521
> > madvise_free_single_vma+0x134/0x2bc mm/madvise.c:815
> > madvise_dontneed_free mm/madvise.c:929 [inline]
> > madvise_vma_behavior+0x1d0/0x790 mm/madvise.c:1046
> > madvise_walk_vmas+0xbc/0x12c mm/madvise.c:1268
> > do_madvise+0x160/0x418 mm/madvise.c:1464
> > __do_sys_madvise mm/madvise.c:1481 [inline]
> > __se_sys_madvise mm/madvise.c:1479 [inline]
> > __arm64_sys_madvise+0x24/0x34 mm/madvise.c:1479
> > __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
> > invoke_syscall+0x48/0x118 arch/arm64/kernel/syscall.c:48
> > el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:133
> > do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:152
> > el0_svc+0x34/0xf8 arch/arm64/kernel/entry-common.c:712
> > el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
> > el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
> > Code: 54000200 f9400401 b4000141 aa0103e0 (f9400821)
> > ---[ end trace 0000000000000000 ]---
> > ----------------
> > Code disassembly (best guess):
> > 0: 54000200 b.eq 0x40 // b.none
> > 4: f9400401 ldr x1, [x0, #8]
> > 8: b4000141 cbz x1, 0x30
> > c: aa0103e0 mov x0, x1
> > * 10: f9400821 ldr x1, [x1, #16] <-- trapping instruction
>
> So this is the following code in rb_next():
>
> > 4: f9400401 ldr x1, [x0, #8] // Offset 8 in @node
> > 8: b4000141 cbz x1, 0x30
> if (node->rb_right) {
>
> > c: aa0103e0 mov x0, x1 // Saves node::rb_right
> node = node->rb_right;
>
> > * 10: f9400821 ldr x1, [x1, #16] <-- trapping instruction
> while (node->rb_left)
>
> > x2 : ff7000007f8cf8e8 x1 : 0000000000000080 x0 : 0000000000000080
>
> which obviously crashes. Now the question is how does the original node
> end up with node::rb_right == 0x80?
>
> I doubt that this is a hrtimer or rbtree problem. It smells like random
> data corruption caused by whatever. It might not even be an ARM64
> specific issue though the C repro does not trigger on x86...
>
> Handing it over to Catalin and Will.

I suspect this is a duplicate of:

https://lore.kernel.org/lkml/20240604110119.GA20284@willie-the-truck/

and there's a fix queued in the -mm tree.

Will