Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v3

From: Vincent Guittot
Date: Mon Feb 17 2020 - 08:49:28 EST

On Mon, 17 Feb 2020 at 11:44, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> Changelog since V2:
> o Rebase on top of Vincent's series again
> o Fix a missed rcu_read_unlock
> o Reduce overhead of tracepoint
> Changelog since V1:
> o Rebase on top of Vincent's series and rework
> Note: The baseline for this series is tip/sched/core as of February
> 12th rebased on top of v5.6-rc1. The series includes patches from
> Vincent as I needed to add a fix and build on top of it. Vincent's
> series on its own introduces performance regressions for *some*
> but not *all* machines so it's easily missed. This series overall
> is close to performance-neutral with some gains depending on the
> machine. However, the end result does less work on NUMA balancing
> and the fact that both the NUMA balancer and load balancer uses
> similar logic makes it much easier to understand.
> The NUMA balancer makes placement decisions on tasks that partially
> take the load balancer into account and vice versa but there are
> inconsistencies. This can result in placement decisions that override
> each other leading to unnecessary migrations -- both task placement
> and page placement. This series reconciles many of the decisions --
> partially Vincent's work with some fixes and optimisations on top to
> merge our two series.
> The first patch is unrelated. It's picked up by tip but was not present in
> the tree at the time of the fork. I'm including it here because I tested
> with it.
> The second and third patches are tracing only and was needed to get
> sensible data out of ftrace with respect to task placement for NUMA
> balancing. The NUMA balancer is *far* easier to analyse with the
> patches and informed how the series should be developed.
> Patches 4-5 are Vincent's and use very similar code patterns and logic
> between NUMA and load balancer. Patch 6 is a fix to Vincent's work that
> is necessary to avoid serious imbalances being introduced by the NUMA

Yes the test added in load_too_imbalanced() by patch 5 doesn't seem to
be a good choice.
I haven't remove it as it was done by your patch 6 but it might worth
removing it directly if a new version is needed

> balancer. Patches 7-8 are also Vincents and while I have not reviewed
> them closely myself, others have.
> The rest of the series are a mix of optimisations and improvements, one
> of which stops the NUMA balancer fighting with itself.
> Note that this is not necessarily a universal performance win although
> performance results are generally ok (small gains/losses depending on
> the machine and workload). However, task migrations, page migrations,
> variability and overall overhead are generally reduced.
> The main reference workload I used was specjbb running one JVM per node
> which typically would be expected to split evenly. It's an interesting
> workload because the number of "warehouses" does not linearly related
> to the number of running tasks due to the creation of GC threads
> and other interfering activity. The mmtests configuration used is
> jvm-specjbb2005-multi with two runs -- one with ftrace enabling relevant
> scheduler tracepoints.
> An example of the headline performance of the series is below and the
> tested kernels are
> baseline-v3r1 Patches 1-3 for the tracing
> loadavg-v3 Patches 1-5 (Add half of Vincent's work)
> lbidle-v3 Patches 1-6 Vincent's work with a fix on top
> classify-v3 Patches 1-8 Rest of Vincent's work
> stopsearch-v3 All patches