Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks

From: Jirka Hladky
Date: Thu Sep 06 2018 - 04:16:32 EST


Hi Mel,

we have results with 2d4056fafa196e1ab4e7161bae4df76f9602d56d reverted.

* Compared to 4.18, there is still performance regression -
especially with NAS (sp_C_x subtest) and SPECjvm2008. On 4 NUMA
systems, regression is around 10-15%
* Compared to 4.19rc1 there is a clear gain across all benchmarks around 20%

While reverting 2d4056fafa196e1ab4e7161bae4df76f9602d56d has helped a
lot there is another issue as well. Could you please recommend some
commit prior to 2d4056fafa196e1ab4e7161bae4df76f9602d56d to try?

Regarding the current results, how do we proceed? Could you please
contact Srikar and ask for the advice or should we contact him
directly?

Thanks a lot!
Jirka

On Tue, Sep 4, 2018 at 12:07 PM, Jirka Hladky <jhladky@xxxxxxxxxx> wrote:
> Hi Mel,
>
> thanks for sharing the background information! We will check if
> 2d4056fafa196e1ab4e7161bae4df76f9602d56d is causing the current
> regression in 4.19 rc1 and let you know the outcome.
>
> Jirka
>
> On Tue, Sep 4, 2018 at 11:00 AM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>> On Mon, Sep 03, 2018 at 05:07:15PM +0200, Jirka Hladky wrote:
>>> Resending in the plain text mode.
>>>
>>> > My own testing completed and the results are within expectations and I
>>> > saw no red flags. Unfortunately, I consider it unlikely they'll be merged
>>> > for 4.18. Srikar Dronamraju's series is likely to need another update
>>> > and I would need to rebase my patches on top of that. Given the scope
>>> > and complexity, I find it unlikely they would be accepted for an -rc,
>>> > particularly this late of an rc. Whether we hit the 4.19 merge window or
>>> > not will depend on when Srikar's series gets updated.
>>>
>>>
>>> Hi Mel,
>>>
>>> we have collaborated back in July on the scheduler patch, improving
>>> the performance by allowing faster memory migration. You came up with
>>> the "sched-numa-fast-crossnode-v1r12" series here:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git
>>>
>>> which has shown good performance results both in your and our testing.
>>>
>>
>> I remember.
>>
>>> Do you have some update on the latest status? Is there any plan to
>>> merge this series into 4.19 kernel? We have just tested 4.19.0-0.rc1.1
>>> and based on the results it seems that the patch is not included (and
>>> I don't see it listed in git shortlog v4.18..v4.19-rc1
>>> ./kernel/sched)
>>>
>>
>> Srikar's series that mine depended upon was only partially merged due to
>> a review bottleneck. He posted a v2 but it was during the merge window
>> and likely will need a v3 to avoid falling through the cracks. When it
>> is merged, I'll rebase my series on top and post it. While I didn't
>> check against 4.19-rc1, I did find that rebasing on top of the partial
>> series in 4.18 did not have as big an improvement.
>>
>>> With 4.19rc1 we see performance drop
>>> * up to 40% (NAS bench) relatively to 4.18 + sched-numa-fast-crossnode-v1r12
>>> * up to 20% (NAS, Stream, SPECjbb2005, SPECjvm2008) relatively to 4.18 vanilla
>>> The performance is dropping. It's quite unclear what are the next
>>> steps - should we wait for "sched-numa-fast-crossnode-v1r12" to be
>>> merged or should we start looking at what has caused the drop in
>>> performance going from 4.19rc1 to 4.18?
>>>
>>
>> Both are valid options. If you take the latter option, I suggest looking
>> at whether 2d4056fafa196e1ab4e7161bae4df76f9602d56d is the source of the
>> issue as at least one auto-bisection found that it may be problematic.
>> Whether it is an issue or not depends heavily on the number of threads
>> relative to a socket size.
>>
>> --
>> Mel Gorman
>> SUSE Labs