Re: Potential scheduler regression

From: Peter Zijlstra
Date: Mon Jul 10 2017 - 05:25:49 EST

On Fri, Jul 07, 2017 at 04:55:27PM -0400, Ben Guthro wrote:

> Apologies on the delay - it took a bit to get the machines, to run the test.
> I am happy to report that the kernel at 1ad3aaf3fcd2, seems to regain
> performance loss from 1b568f0aab, in our test environment.


> Since 4.9 is an LTS kernel - is this appropriate to suggest to be
> included in the linux-stable list?

Hurm... so I typically suck at (also) keeping track of -stable things.

But given LTS, there might be a few more commits that might make sense
to include.

This series corrects NUMA topology creation:

8c0334697dc3 ("sched/topology: Refactor function build_overlap_sched_groups()")
c743f0a5c50f ("sched/fair, cpumask: Export for_each_cpu_wrap()")
0372dd2736e0 ("sched/topology: Fix building of overlapping sched-groups")
91eaed0d6131 ("sched/topology: Simplify build_overlap_sched_groups()")
b0151c25548c ("sched/debug: Print the scheduler topology group mask")
a420b0630362 ("sched/topology: Verify the first group matches the child domain")
f32d782e31bf ("sched/topology: Optimize build_group_mask()")
c20e1ea4b61c ("sched/topology: Move comment about asymmetric node setups")
af85596c74de ("sched/topology: Remove FORCE_SD_OVERLAP")
73bb059f9b8a ("sched/topology: Fix overlapping sched_group_mask")
8d5dc5126bb2 ("sched/topology: Small cleanup")
005f874dd284 ("sched/topology: Add sched_group_capacity debugging")
1676330ecfa8 ("sched/topology: Fix overlapping sched_group_capacity")

(there's a few more commits at the end of that series that add comments
and renames a bunch of stuff which doesn't really fix anything).

Cures a BUG_ON through sysrq:

896bbb252258 ("sched/core: Allow __sched_setscheduler() in interrupts when PI is not used")

Performance issues:

502ce005ab95 ("sched/fair: Use task_groups instead of leaf_cfs_rq_list to walk all cfs_rqs")
a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path")

c249f255aab8 ("sched/rt: Minimize rq->lock contention in do_sched_rt_period_timer()")

8655d5497735 ("sched/numa: Use down_read_trylock() for the mmap_sem")

And then the patch you want for this:

1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()")

I have no real idea how much of any those qualify for 4.9, but know most
of those patches ended up in the various enterprise distros in some form
or other.

In any case, some of that will need some massaging to apply and it
obviously needs testing of sorts. So I'm not sure what all makes sense
to do.