Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal
From: Vincent Guittot
Date: Wed Nov 04 2020 - 05:06:22 EST
On Wed, 4 Nov 2020 at 10:42, Mel Gorman <mgorman@xxxxxxx> wrote:
>
> On Mon, Nov 02, 2020 at 09:44:18AM -0500, Phil Auld wrote:
> > > > I have not investigated why because I do not have the bandwidth
> > > > to do a detailed study (I was off for a few days and my backlog is
> > > > severe). However, I recommend in before v5.10 this be reverted and retried.
> > > > If I'm cc'd on v2, I'll run the same tests through the grid and see what
> > > > falls out.
> > >
> > > I'm going to have a look at the regressions and see if patches that
> > > have been queued for v5.10 or even more recent patch can help or if
> > > the patch should be adjusted
> > >
> >
> > Fwiw, we have pulled this in, along with some of the 5.10-rc1 fixes in this
> > area and in the load balancing code.
> >
> > We found some load balancing improvements and some minor overall perf
> > gains in a few places, but generally did not see any difference from before
> > the commit mentioned here.
> >
> > I'm wondering, Mel, if you have compared 5.10-rc1?
> >
>
> The results indicate that reverting on 5.9 would have been the right
> decision. It's less clear for 5.10-rc2 so I'm only showing the 5.10-rc2
> comparison. Bear in mind that this is one machine only so I'll be
> rerunning against all the affected machines according to the bisections
> run against 5.9.
>
> aim9
> 5.10.0-rc2 5.10.0-rc2
> vanilla 5.10-rc2-revert
> Hmean page_test 510863.13 ( 0.00%) 517673.91 * 1.33%*
> Hmean brk_test 1807400.76 ( 0.00%) 1814815.29 * 0.41%*
> Hmean exec_test 821.41 ( 0.00%) 841.05 * 2.39%*
> Hmean fork_test 4399.71 ( 0.00%) 5124.86 * 16.48%*
>
> Reverting shows a 16.48% gain for fork_test and minor gains for others.
> To be fair, I don't generally consider the fork_test to be particularly
> important because fork microbenchmarks that do no real work are rarely
> representative of anything useful. It tends to go up and down a lot and
> it's rare a regression in fork_test correlates to anything else.
The patch makes a difference only when most of CPUs are already idle
because it will select the one with lowest utilization: which could be
translated by the LRU one.
>
> Hackbench failed to run because I typo'd the configuration. Kernel build
> benchmark and git test suite both were inconclusive for 5.10-rc2
> (neutral results) although the showed 10-20% gain for kernbench and 24%
> gain in git test suite by reverting in 5.9.
>
> The gitsource test was interesting for a few reasons. First, the big
> difference between 5.9 and 5.10 is that the workload is mostly concentrated
> on one NUMA node. mpstat shows that 5.10-rc2 uses all of the CPUs on one
> node lightly. Reverting the patch shows that far fewer CPUs are used at
> a higher utilisation -- not particularly high utilisation because of the
> nature of the workload but noticable. i.e. gitsource with the revert
> packs the workload onto fewer CPUs. The same holds for fork_test --
> reverting packs the workload onto fewer CPUs with higher utilisation on
> each of them. Generally this plays well with cpufreq without schedutil
> using fewer CPUs means the CPU is likely to reach higher frequencies.
Which cpufreq governor are you using ?
>
> While it's possible that some other factor masked the impact of the patch,
> the fact it's neutral for two workloads in 5.10-rc2 is suspicious as it
> indicates that if the patch was implemented against 5.10-rc2, it would
> likely not have been merged. I've queued the tests on the remaining
> machines to see if something more conclusive falls out.
I don't think that the goal of the patch is stressed by those benchmarks.
I typically try to optimize the sequence:
1-fork a lot of threads that immediately wait
2-wake up all threads simultaneously to run in parallel
3-wait the end of all threads
Without the patch all newly forked threads were packed on few CPUs
which were already idle when the next fork happened. Then the spreads
were spread on CPUs at wakeup in the LLC but they have to wait for a
LB to fill other sched domain
>
> --
> Mel Gorman
> SUSE Labs