Re: [RFC PATCH RESEND] timerqueue: Complete rb_node initialization within timerqueue_init

From: Thomas Gleixner
Date: Sun Apr 06 2025 - 07:46:54 EST


On Sat, Apr 05 2025 at 16:05, I. Hsin Cheng wrote:
> The children of "node" within "struct timerqueue_node" may be uninit
> status after the initialization. Initialize them as NULL under
> timerqueue_init to prevent the problem.

Which problem?

It's completely sufficient to use RB_INIT_NODE() on initialization.

As you did not provide a link and no explanation, I had to waste some
time to search though the syzbot site and looked at the actual issue:

BUG: KMSAN: uninit-value in rb_next+0x200/0x210 lib/rbtree.c:505
rb_next+0x200/0x210 lib/rbtree.c:505
rb_erase_cached include/linux/rbtree.h:124 [inline]
timerqueue_del+0xee/0x1a0 lib/timerqueue.c:57
__remove_hrtimer kernel/time/hrtimer.c:1123 [inline]
__run_hrtimer kernel/time/hrtimer.c:1771 [inline]
__hrtimer_run_queues+0x3b7/0xe40 kernel/time/hrtimer.c:1855
hrtimer_interrupt+0x41b/0xb10 kernel/time/hrtimer.c:1917
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1038 [inline]
__sysvec_apic_timer_interrupt+0xa7/0x420 arch/x86/kernel/apic/apic.c:1055
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
sysvec_apic_timer_interrupt+0x7e/0x90 arch/x86/kernel/apic/apic.c:1049

So this code removes a queued timer from the RB tree and that KMSAN
warning happens in rb_next(), which is invoked from rb_erase_cached().

The issue happens in lib/rbtree.c:505

505: while (node->rb_left)
506: node = node->rb_left;

which is walking the tree down left. So that means it hits a pointer
which points to uninitialized memory.

All timers are queued with rb_add_cached(), which calls rb_link_node()
and that does:

node->rb_left = node->rb_right = NULL;

Which means there can't be a timer enqueued in the RB tree which has
rb_left/right uninitialized.

So how does this end up at uninitialized memory? There are two
obvious explanations:

1) A stray pointer corrupts the RB tree

2) A queued timer has been freed

So what would this "initialization" help? Nothing at all.

We are not adding some random pointless initialization to paper
over a problem which is absolutely not understood.

Thanks,

tglx