[PATCH 0/2] numa,sched: improve performance for multi-threaded workloads

From: riel
Date: Mon Jul 31 2017 - 15:35:07 EST

The NUMA balancing code spends way too much CPU time scanning and
faulting when running multi-threaded workloads.

This patch set slows down NUMA PTE scanning when there are lots
of shared faults, and when dealing with large NUMA groups that
have a large fraction of shared faults.

Some results from Jirka's half-week performance run, on
a 4 node system:
- improvements in the range of 10-30% for NAS benchmarks
(mostly ft and lu subtests)
- SPECjbb2005 single instance mode - improvements in the range of 5-10%
- SPECjvm2008 - performance very similar to before, some small
improvements for the scimark* subtests