[PATCH 0/1] Reduce scheduler migrations due to wake_affine

From: Mel Gorman
Date: Tue Dec 19 2017 - 04:00:18 EST


wake_affine has the impossible task of figuring out when it's best for a
waker to pull a wakee towards the wakers CPU on the expectation that data
locality will offset the migration. It's hurt by the fact that most wakeups
cannot run on the current CPU to avoid stacking multiple tasks on one CPU
by accident so it depends heavily on topology and which CPU nearby is idle.
This series special cases some wake_affine decisions.

Patch 1 was already posted and is simply being reposted as other parts of
the series were dropped. It avoids wake_affine pulling a task to a
different node if the wakeup source is an interrupt. This is on the
basis that we have little knowledge of whather the CPU servicing
the interrupt is relevant to the data locality of the task being
woken. The data from the interrupt itself may be a tiny proportion
of the tasks working set.

The wake-on-prev patch got dropped as the initial version had a serious bug.
Once corrected, it does reduce migrations and reduces overhead in some cases
by avoiding wake_affine_weight. However, it's not a universal win across a
range of machines and workloads and when it's a win, it's marginal while
being potentially confusing the role of select_idle_sibling in selecting
prev_cpu if it turns out to be idle. The kworker stacking patch got dropped
as it could not guarantee the relationship was synchronous and may introduce
other regressions. I'll revisit migration mitigation after the Christmas
assuming I get the chance. However, patch 1 still makes sense and known
to address a number of issues so it should be relatively safe.

kernel/sched/fair.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

--
2.15.0