Re: [PATCH] fix scheduler regression from "sched/fair: Rework load_balance()"

From: Chris Mason
Date: Mon Oct 26 2020 - 08:45:46 EST


On 26 Oct 2020, at 4:39, Vincent Guittot wrote:

Hi Chris

On Sat, 24 Oct 2020 at 01:49, Chris Mason <clm@xxxxxx> wrote:

Hi everyone,

We’re validating a new kernel in the fleet, and compared with v5.2,

Which version are you using ?
several improvements have been added since v5.5 and the rework of load_balance

We’re validating v5.6, but all of the numbers referenced in this patch are against v5.9. I usually try to back port my way to victory on this kind of thing, but mainline seems to behave exactly the same as 0b0695f2b34a wrt this benchmark.


performance is ~2-3% lower for some of our workloads. After some
digging, Johannes found that our involuntary context switch rate was ~2x
higher, and we were leaving a CPU idle a higher percentage of the time,
even though the workload was trying to saturate the system.

We were able to reproduce the problem with schbench, and Johannes
bisected down to:

commit 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad912
Author: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Date: Fri Oct 18 15:26:31 2019 +0200

sched/fair: Rework load_balance()

Our working theory is the load balancing changes are leaving processes
behind busy CPUs instead of moving them onto idle ones. I made a few
schbench modifications to make this easier to demonstrate:

https://git.kernel.org/pub/scm/linux/kernel/git/mason/schbench.git/

My VM has 40 cpus (20 cores, 2 threads per core), and my schbench
command line is:

What is the topology ? are they all part of the same LLC ?

We’ve seen the regression on both single socket and dual socket bare metal intel systems. On the VM I reproduced with, I saw similar latencies with and without siblings configured into the topology.

-chris