Re: [PATCH 0/11 v3] sched/fair: Fix statistics with delayed dequeue

From: Luis Machado
Date: Wed Dec 04 2024 - 14:10:30 EST


On 12/2/24 17:45, Vincent Guittot wrote:
> Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its
> lag has elapsed. As a result, it stays also visible in the statistics that
> are used to balance the system and in particular the field h_nr_running.
>
> This serie fixes those metrics by creating a new h_nr_runnable that tracks
> only tasks that want to run. It renames h_nr_running into h_nr_runnable.
>
> h_nr_runnable is used in several places to make decision on load balance:
> - PELT runnable_avg
> - deciding if a group is overloaded or has spare capacity
> - numa stats
> - reduced capacity management
> - load balance between groups
>
> While fixing h_nr_running, some fields have been renamed to follow the
> same pattern. We now have:
> - cfs.h_nr_runnable : running tasks in the hierarchy
> - cfs.h_nr_queued : enqueued tasks in the hierarchy either running or
> delayed dequeue
> - cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy
>
> cfs.nr_running has been rename cfs.nr_queued because it includes the
> delayed dequeued entities
>
> The unused cfs.idle_nr_running has been removed
>
> Load balance compares the number of running tasks when selecting the
> busiest group or runqueue and tries to migrate a runnable task and not a
> sleeping delayed dequeue one. delayed dequeue tasks are considered only
> when migrating load as they continue to impact it.
>
> It should be noticed that this serie doesn't fix the problem of delayed
> dequeued tasks that can't migrate at wakeup.
>
> Some additional cleanups have been added:
> - move variable declaration at the beginning of pick_next_entity()
> and dequeue_entity()
> - sched_can_stop_tick() should use cfs.h_nr_queued instead of
> cfs.nr_queued (previously cfs.nr_running) to know how many tasks
> are running in the whole hierarchy instead of how many entities at
> root level
>
> Changes since v2:
> - Fix h_nr_runnable after removing h_nr_delayed (reported by Mike and Prateek)
> - Move "sched/fair: Fix sched_can_stop_tick() for fair tasks" at the
> beginning of the series so it can be easily backported (asked by Prateek)
> - Split "sched/fair: Add new cfs_rq.h_nr_runnable" in 2 patches. One
> for the creation of h_nr_runnable and one for its use (asked by Peter)
> - Fix more variable declarations (reported Prateek)
>
>
> Changes since v1:
> - reorder the patches
> - rename fields into:
> - h_nr_queued for all tasks queued both runnable and delayed dequeue
> - h_nr_runnable for all runnable tasks
> - h_nr_idle for all tasks with sched_idle policy
> - Cleanup how h_nr_runnable is updated in enqueue_task_fair() and
> dequeue_entities
>
> Peter Zijlstra (1):
> sched/eevdf: More PELT vs DELAYED_DEQUEUE
>
> Vincent Guittot (10):
> sched/fair: Fix sched_can_stop_tick() for fair tasks
> sched/fair: Rename h_nr_running into h_nr_queued
> sched/fair: Add new cfs_rq.h_nr_runnable
> sched/fair: Use the new cfs_rq.h_nr_runnable
> sched/fair: Removed unsued cfs_rq.h_nr_delayed
> sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle
> sched/fair: Remove unused cfs_rq.idle_nr_running
> sched/fair: Rename cfs_rq.nr_running into nr_queued
> sched/fair: Do not try to migrate delayed dequeue task
> sched/fair: Fix variable declaration position
>
> kernel/sched/core.c | 4 +-
> kernel/sched/debug.c | 14 ++-
> kernel/sched/fair.c | 240 ++++++++++++++++++++++++-------------------
> kernel/sched/pelt.c | 4 +-
> kernel/sched/sched.h | 12 +--
> 5 files changed, 153 insertions(+), 121 deletions(-)
>

Sorry, it took me a bit to work the backports for 6.8/Android. I gave this series a try
on the Pixel 6 with a couple workloads (Speedometer and a UI-based one) and I didn't
spot any significant performance/frame metric differences or power metric differences compared
to a 6.8 kernel with this series applied.

Also, checking the statistics for the cores didn't show any numbers that are obviously wrong.

Though the kernel version is a bit behind mainline, I hope that's useful.

Tested-By: Luis Machado <luis.machado@xxxxxxx>