[Patch] select_idle_sibling v.s. DELAYED_DEQUEUE

From: Tingjia Cao

Date: Sat Nov 22 2025 - 23:04:51 EST


Recently, we encountered an issue that sync wakeup kthread didn't choose the current CPU though the waker is the only runnable task. It is caused by a conflict between delayed dequeue feature and select_idle_sibling function.

With the DELAYED_DEQUEUE mechanism enabled, a task that goes to sleep may not be removed from the runqueue immediately. As a result, nr_running may overcount the number of runnable tasks. Inside select_idle_sibling, there is a special case for sync wakeup:

if (is_per_cpu_kthread(current) &&
    in_task() &&
    prev == smp_processor_id() &&
    this_rq()->nr_running <= 1 &&
    asym_fits_cpu(...)) {
    return prev;
}

For "this_rq()->nr_running <= 1": we should use the real running-tasks rq to check whether to place the wake-up task to the current cpu.

To fix this (patch attached), we can use the true number of runnable tasks by subtracting the delayed-dequeue count:

        this_rq()->nr_running - cfs_h_nr_delayed(this_rq()) <= 1


Best,
Tingjia

Attachment: fix-select_idle_sibling-vs-DELAYED_DEQUEUE.patch
Description: Binary data