Re: [PATCH v2] locking/semaphore: Use wake_q to wake up processes outside lock critical section

From: Waiman Long
Date: Sat Jan 25 2025 - 21:21:32 EST



On 1/22/25 2:17 PM, Waiman Long wrote:
A circular lock dependency splat has been seen involving down_trylock().

[ 4011.795602] ======================================================
[ 4011.795603] WARNING: possible circular locking dependency detected
[ 4011.795607] 6.12.0-41.el10.s390x+debug
[ 4011.795612] ------------------------------------------------------
[ 4011.795613] dd/32479 is trying to acquire lock:
[ 4011.795617] 0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
[ 4011.795636]
[ 4011.795636] but task is already holding lock:
[ 4011.795637] 000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0

the existing dependency chain (in reverse order) is:
-> #4 (&zone->lock){-.-.}-{2:2}:
-> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
-> #2 (&rq->__lock){-.-.}-{2:2}:
-> #1 (&p->pi_lock){-.-.}-{2:2}:
-> #0 ((console_sem).lock){-.-.}-{2:2}:

The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console.sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.

The hrtimer_bases.lock -> zone->lock dependency is actually another
instance of raw_spinlock to spinlock nesting problem.

[ 4011.795646] -> #4 (&zone->lock){-.-.}-{2:2}:
[ 4011.795650] __lock_acquire+0xe86/0x1cc0
[ 4011.795655] lock_acquire.part.0+0x258/0x630
[ 4011.795657] lock_acquire+0xb8/0xe0
[ 4011.795659] _raw_spin_lock_irqsave+0xb4/0x120
[ 4011.795663] rmqueue_bulk+0xac/0x8f0
[ 4011.795665] __rmqueue_pcplist+0x580/0x830
[ 4011.795667] rmqueue_pcplist+0xfc/0x470
[ 4011.795669] rmqueue.isra.0+0xdec/0x11b0
[ 4011.795671] get_page_from_freelist+0x2ee/0xeb0
[ 4011.795673] __alloc_pages_noprof+0x2c2/0x520
[ 4011.795676] alloc_pages_mpol_noprof+0x1fc/0x4d0
[ 4011.795681] alloc_pages_noprof+0x8c/0xe0
[ 4011.795684] allocate_slab+0x320/0x460
[ 4011.795686] ___slab_alloc+0xa58/0x12b0
[ 4011.795688] __slab_alloc.isra.0+0x42/0x60
[ 4011.795690] kmem_cache_alloc_noprof+0x304/0x350
[ 4011.795692] fill_pool+0xf6/0x450
[ 4011.795697] debug_object_activate+0xfe/0x360
[ 4011.795700] enqueue_hrtimer+0x34/0x190
[ 4011.795703] __run_hrtimer+0x3c8/0x4c0
[ 4011.795705] __hrtimer_run_queues+0x1b2/0x260
[ 4011.795707] hrtimer_interrupt+0x316/0x760
[ 4011.795709] do_IRQ+0x9a/0xe0
[ 4011.795712] do_irq_async+0xf6/0x160

We will need to make some changes in debugobjects and/or hrtimer code
to address this problem and break the circular dependency.

I check the v6.12 debugobjects code again and I saw

static void debug_objects_fill_pool(void)
{
        /*
         * On RT enabled kernels the pool refill must happen in preemptible
         * context -- for !RT kernels we rely on the fact that spinlock_t and
         * raw_spinlock_t are basically the same type and this lock-type
         * inversion works just fine.
         */
        if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
                /*
                 * Annotate away the spinlock_t inside raw_spinlock_t warning
                 * by temporarily raising the wait-type to WAIT_SLEEP, matching
                 * the preemptible() condition above.
                 */
                static DEFINE_WAIT_OVERRIDE_MAP(fill_pool_map, LD_WAIT_SLEEP);
                lock_map_acquire_try(&fill_pool_map);
                fill_pool();
                lock_map_release(&fill_pool_map);
        }
}

It looks like it explicitly allows raw_spinlock to spinlock nesting for non-PREEMPT_RT kernel and disable the lockdep nested locking check. So the hrtimer_bases.lock -> zone->lock dependency is legit and expected. I will need to update my patch description again.

Cheers,
Longman