Re: [PATCH-tip v3] debugobjects: Don't call fill_pool() in early boot non-task context

From: Sebastian Andrzej Siewior

Date: Wed Jun 03 2026 - 03:57:15 EST


On 2026-05-20 16:15:09 [-0400], Waiman Long wrote:
> When booting a debug PREEMPT_RT kernel on an arm64 system with grace
> processor, the following lockdep warning was reported during early boot.
>
> ================================
> WARNING: inconsistent lock state
> 7.1.0-rc4-test+ #1 Not tainted
> --------------------------------
> inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> ffff0000803346a0 (&n->list_lock){?.+.}-{3:3}, at: get_from_partial_node+0x74/0xa0
> :
> Call trace:
> :
> rt_spin_lock+0xa0/0x400
> get_from_partial_node+0x74/0xa0
> ___slab_alloc+0x94/0x4f8
> kmem_cache_alloc_noprof+0x2d4/0x598
> kmem_alloc_batch+0x54/0x170
> fill_pool+0x12c/0x438
> debug_objects_fill_pool+0x58/0x60
> debug_object_activate+0xfc/0x3d0
> add_timer_on+0x250/0x3a0
> add_interrupt_randomness+0x2d4/0x340
> handle_percpu_devid_irq+0x2e0/0x4e0
> handle_irq_desc+0xc0/0x120
> generic_handle_domain_irq+0x20/0x40
> __gic_handle_irq_from_irqson.isra.0+0x3c4/0x708
> gic_handle_irq+0x7c/0xe0
> call_on_irq_stack+0x30/0x48
> do_interrupt_handler+0x134/0x158
> el1_interrupt+0x48/0xb0
> :

What about:

During early boot, interrupts are getting enabled before the scheduler
is enabled. In this window (before SYSTEM_SCHEDULING is set) interrupts
can fire and attempt to fill the pool from within the hardirq. This can
lead to a deadlock the interrupt occurred while in the memory allocator.

Reorder the exception rule and forbid this scenario by excluding
allocations from hardirq.


> Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
> Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> ---

> /*
> * On RT enabled kernels the pool refill must happen in preemptible
> - * context and not enqueued on an rt_mutex -- for !RT kernels we rely
> - * on the fact that spinlock_t and raw_spinlock_t are basically the
> - * same type and this lock-type inversion works just fine.
> + * context and not enqueued on an rt_mutex or in task context during
> + * early boot before scheduling starts.
> + *
> + * For !RT kernels we rely on the fact that spinlock_t and
> + * raw_spinlock_t are basically the same type and this lock-type
> + * inversion works just fine.
> */
> - if (!IS_ENABLED(CONFIG_PREEMPT_RT) || system_state < SYSTEM_SCHEDULING ||
> + if (!IS_ENABLED(CONFIG_PREEMPT_RT) ||
> + (system_state < SYSTEM_SCHEDULING && in_task()) ||
> (preemptible() && !debug_objects_is_pi_blocked_on())) {
> /*
> * Annotate away the spinlock_t inside raw_spinlock_t warning

I updated the comment to explain in more verbose why this and that is
done.
I re-ordered the whole thing stared with the pi-locked-on part since
this is always valid. It shouldn't happen during early boot I think it
is easier to read that way. Then we restrict it to the preeptible case
which can be overruled with the SYSTEM_SCHEDULING exception however as
long as it is not an hardirq. It looks easier to parse and hopefully
brings an end to this.

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index b18a682fe3da2..2adfe2a79a086 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -736,12 +736,17 @@ static void debug_objects_fill_pool(void)

/*
* On RT enabled kernels the pool refill must happen in preemptible
- * context and not enqueued on an rt_mutex -- for !RT kernels we rely
- * on the fact that spinlock_t and raw_spinlock_t are basically the
- * same type and this lock-type inversion works just fine.
+ * context and not while blocking on a lock which can trigger recursion
+ * during PI. During system boot (before scheduling) preemption is
+ * disabled and the pool gets exhausted. Without scheduling a deadlock
+ * is not possible if allocations from interrupt context are excluded.
+ * For !RT kernels we rely on the fact that spinlock_t and
+ * raw_spinlock_t are basically the same type and this lock-type
+ * inversion works just fine.
*/
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || system_state < SYSTEM_SCHEDULING ||
- (preemptible() && !debug_objects_is_pi_blocked_on())) {
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT) ||
+ !debug_objects_is_pi_blocked_on() &&
+ (preemptible() || (system_state < SYSTEM_SCHEDULING && !in_hardirq()))) {
/*
* Annotate away the spinlock_t inside raw_spinlock_t warning
* by temporarily raising the wait-type to LD_WAIT_CONFIG, matching
Sebastian