Re: [PATCH v3 0/5] Improve newidle lb cost tracking and early abort

From: Tim Chen
Date: Tue Oct 26 2021 - 13:25:12 EST


On Tue, 2021-10-19 at 14:35 +0200, Vincent Guittot wrote:
> This patchset updates newidle lb cost tracking and early abort:
>
> The time spent running update_blocked_averages is now accounted in
> the 1st
> sched_domain level. This time can be significant and move the cost of
> newidle lb above the avg_idle time.
>
> The decay of max_newidle_lb_cost is modified to start only when the
> field
> has not been updated for a while. Recent update will not be decayed
> immediatlybut only after a while.
>
> The condition of an avg_idle lower than sysctl_sched_migration_cost
> has
> been removed as the 500us value is quite large and prevent
> opportunity to
> pull task on the newly idle CPU for at least 1st domain levels.
>
> Monitoring sd->max_newidle_lb_cost on cpu0 of a Arm64 system
> THX2 (2 nodes * 28 cores * 4 cpus) during the benchmarks gives the
> following results:
> min avg max
> SMT: 1us 33us 273us - this one includes the update of blocked
> load
> MC: 7us 49us 398us
> NUMA: 10us 45us 158us
>
>
> Some results for hackbench -l $LOOPS -g $group :
> group tip/sched/core + this patchset
> 1 15.189(+/- 2%) 14.987(+/- 2%) +1%
> 4 4.336(+/- 3%) 4.322(+/- 5%) +0%
> 16 3.654(+/- 1%) 2.922(+/- 3%) +20%
> 32 3.209(+/- 1%) 2.919(+/- 3%) +9%
> 64 2.965(+/- 1%) 2.826(+/- 1%) +4%
> 128 2.954(+/- 1%) 2.993(+/- 8%) -1%
> 256 2.951(+/- 1%) 2.894(+/- 1%) +2%
>
> tbench and reaim have not shown any difference
>

Vincent,

Our benchmark team tested the patches for our OLTP benchmark
on a 2 socket Cascade Lake
with 28 cores/socket. It is a smaller configuration
than the 2 socket Ice Lake we hae tested previously that has 40
cores/socket so the overhead on update_blocked_averages is smaller
(~4%).

Here's a summary of the results:
Relative Performance
(higher better)
5.15 rc4 vanilla (cgroup disabled) 100%
5.15 rc4 vanilla (cgroup enabled) 96%
patch v2 96%
patch v3 96%

We didn't see much change in performance from the patch set.

Looking at the profile on update_blocked_averages a bit more,
the majority of the call to update_blocked_averages
happens in run_rebalance_domain. And we are not
including that cost of update_blocked_averages for
run_rebalance_domains in our current patch set. I think
the patch set should account for that too.


0.60% 0.00% 3 [kernel.vmlinux] [k] run_rebalance_domains - -
|
--0.59%--run_rebalance_domains
|
--0.57%--update_blocked_averages

Thanks.

Tim