Re: Potential scheduler regression

From: Ben Guthro
Date: Mon Jul 10 2017 - 11:43:09 EST


On Mon, Jul 10, 2017 at 11:26 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Jul 10, 2017 at 11:25:32AM +0200, Peter Zijlstra wrote:
>> On Fri, Jul 07, 2017 at 04:55:27PM -0400, Ben Guthro wrote:
>>
>> > Apologies on the delay - it took a bit to get the machines, to run the test.
>> >
>> > I am happy to report that the kernel at 1ad3aaf3fcd2, seems to regain
>> > performance loss from 1b568f0aab, in our test environment.
>>
>> Excellent.
>>
>> > Since 4.9 is an LTS kernel - is this appropriate to suggest to be
>> > included in the linux-stable list?
>>
>> Hurm... so I typically suck at (also) keeping track of -stable things.
>>
>> But given LTS, there might be a few more commits that might make sense
>> to include.
>>
>> This series corrects NUMA topology creation:
>>
>> 8c0334697dc3 ("sched/topology: Refactor function build_overlap_sched_groups()")
>> c743f0a5c50f ("sched/fair, cpumask: Export for_each_cpu_wrap()")
>> 0372dd2736e0 ("sched/topology: Fix building of overlapping sched-groups")
>> 91eaed0d6131 ("sched/topology: Simplify build_overlap_sched_groups()")
>> b0151c25548c ("sched/debug: Print the scheduler topology group mask")
>> a420b0630362 ("sched/topology: Verify the first group matches the child domain")
>> f32d782e31bf ("sched/topology: Optimize build_group_mask()")
>> c20e1ea4b61c ("sched/topology: Move comment about asymmetric node setups")
>> af85596c74de ("sched/topology: Remove FORCE_SD_OVERLAP")
>> 73bb059f9b8a ("sched/topology: Fix overlapping sched_group_mask")
>> 8d5dc5126bb2 ("sched/topology: Small cleanup")
>> 005f874dd284 ("sched/topology: Add sched_group_capacity debugging")
>> 1676330ecfa8 ("sched/topology: Fix overlapping sched_group_capacity")
>>
>> (there's a few more commits at the end of that series that add comments
>> and renames a bunch of stuff which doesn't really fix anything).
>>
>> Cures a BUG_ON through sysrq:
>>
>> 896bbb252258 ("sched/core: Allow __sched_setscheduler() in interrupts when PI is not used")
>>
>>
>> Performance issues:
>>
>>
>> 502ce005ab95 ("sched/fair: Use task_groups instead of leaf_cfs_rq_list to walk all cfs_rqs")
>> a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path")
>>
>> c249f255aab8 ("sched/rt: Minimize rq->lock contention in do_sched_rt_period_timer()")
>>
>> 8655d5497735 ("sched/numa: Use down_read_trylock() for the mmap_sem")
>>
>>
>>
>> And then the patch you want for this:
>>
>> 1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()")
>>
>>
>>
>> I have no real idea how much of any those qualify for 4.9, but know most
>> of those patches ended up in the various enterprise distros in some form
>> or other.
>
> If people have experience with these in the "enterprise" distros, or
> any other tree, and want to provide me with backported, and tested,
> patches, I'll be glad to consider them for stable kernels.
>
> thanks,
>
> greg k-h

I tried to do a simple cherry-pick of the suggested patches - but they
apply against files that don't exist in the 4.9 series.
This means it would be a more complicated port, that, without having
the original author's context - there's a non-zero possibility that
I'd botch the port. As such, I'll yield to Peter's expertise here.

In my release of 4.9 - I'm planning on doing the simpler revert of
1b568f0aab that introduced the performance degradation, rather than
pulling in lots of code from newer kernels.

Thanks
Ben G