Re: [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout

From: Viresh Kumar
Date: Fri Nov 08 2019 - 06:32:06 EST


On 30-10-19, 16:47, Mel Gorman wrote:
> On Thu, Oct 24, 2019 at 12:15:27PM +0530, Viresh Kumar wrote:
> > There are instances where we keep searching for an idle CPU despite
> > having a sched-idle cpu already (in find_idlest_group_cpu(),
> > select_idle_smt() and select_idle_cpu() and then there are places where
> > we don't necessarily do that and return a sched-idle cpu as soon as we
> > find one (in select_idle_sibling()). This looks a bit inconsistent and
> > it may be worth having the same policy everywhere.
> >
>
> This needs supporting data.

I did some more interesting tests with rt-app. It was getting
difficult to generate the correct numbers with normal use cases as
most of the time prev/target/etc CPUs were found to be completely idle
and the task was getting placed there in all the cases and so no diff
with sched-idle changes.

To prove the point I was making (that we can reduce task latency with
SCHED_IDLE), I created 3 different tests on my hikey board (octa-core,
2 clusters, 0-3 and 4-7). The cpufreq governor was set to performance
to avoid any side affects from CPU frequency.

Test 1: 1-cfs-task:

A single SCHED_NORMAL task is pinned to CPU5 which runs for 2333 us
out of 7777 us (so gives time for the cluster to go in deep idle
state).

Test 2: 1-cfs-1-idle-task:

A single SCHED_NORMAL task is pinned on CPU5 and single SCHED_IDLE
task is pinned on CPU6 (to make sure cluster 1 doesn't go in deep idle
state).

Test 3: 1-cfs-8-idle-task:

A single SCHED_NORMAL task is pinned on CPU5 and eight SCHED_IDLE
tasks are created which run forever (not pinned anywhere, so they run
on all CPUs). Checked with kernelshark that as soon as NORMAL task
sleeps, the SCHED_IDLE task starts running on CPU5.

And here are the results on mean latency (in us), using the "st" tool.

$ st 1-cfs-task/rt-app-cfs_thread-0.log
N min max sum mean stddev
642 90 592 197180 307.134 109.906

$ st 1-cfs-1-idle-task/rt-app-cfs_thread-0.log
N min max sum mean stddev
642 67 311 113850 177.336 41.4251

$ st 1-cfs-8-idle-task/rt-app-cfs_thread-0.log
N min max sum mean stddev
643 29 173 41364 64.3297 13.2344


The mean latency when:
- we need to wakeup from deep idle state is 307 us
- we need to wakeup from shallow idle state is 177 us
- we need to preempt a SCHED_IDLE task is 64 us

So the theory looks correct, we should probably prefer SCHED_IDLE CPUs
both for power and performance :)

> find_idlest_group_cpu is generally from
> a fork() context where it's not particularly performance critical.
> select_idle_sibling and the helpers it uses is wakeup context where is
> is often much more critical to wake quickly than find the best CPU.

I agree. We must find the best CPU here. But won't a SCHED_IDLE cpu be
the best ? After all that is the one in shallowest idle state and so
better for power :)

--
viresh