[PATCH 0/2] Introduce SIS_CACHE to choose previous CPU during task wakeup

From: Chen Yu
Date: Tue Sep 26 2023 - 01:11:07 EST


RFC -> v1:
- drop RFC
- Only record the short sleeping time for each task, to better honor the
burst sleeping tasks. (Mathieu Desnoyers)
- Keep the forward movement monotonic for runqueue's cache-hot timeout value.
(Mathieu Desnoyers, Aaron Lu)
- Introduce a new helper function cache_hot_cpu() that considers
rq->cache_hot_timeout. (Aaron Lu)
- Add analysis of why inhibiting task migration could bring better throughput
for some benchmarks. (Gautham R. Shenoy)
- Choose the first cache-hot CPU, if all idle CPUs are cache-hot in
select_idle_cpu(). To avoid possible task stacking on the waker's CPU.
(K Prateek Nayak)

Thanks for your comments and review!

----------------------------------------------------------------------

When task p is woken up, the scheduler leverages select_idle_sibling()
to find an idle CPU for it. p's previous CPU is usually a preference
because it can improve cache locality. However in many cases, the
previous CPU has already been taken by other wakees, thus p has to
find another idle CPU.

Inhibit the task migration while keeping the work conservation of
scheduler could benefit many workloads. Inspired by Mathieu's
proposal to limit the task migration ratio[1], this patch considers
the task average sleep duration. If the task is a short sleeping one,
then tag its previous CPU as cache hot for a short while. During this
reservation period, other wakees are not allowed to pick this idle CPU
until a timeout. Later if the task is woken up again, it can find its
previous CPU still idle, and choose it in select_idle_sibling().

This test is based on tip/sched/core, on top of
Commit afc1996859a2
("sched/fair: Ratelimit update to tg->load_avg")

patch afc1996859a2 has significantly reduced the cost of task migration,
the SIS_CACHE further reduces that cost. SIS_CACHE shows noticeable
throughput improvement of netperf/tbench around 100% load.

[patch 1/2] records the task's average short sleeping time in
its per sched_entity structure.
[patch 2/2] introduces the SIS_CACHE to skip the cache-hot
idle CPU during wakeup.

Link: https://lore.kernel.org/lkml/20230905171105.1005672-2-mathieu.desnoyers@xxxxxxxxxxxx/ #1

Chen Yu (2):
sched/fair: Record the short sleeping time of a task
sched/fair: skip the cache hot CPU in select_idle_cpu()

include/linux/sched.h | 3 ++
kernel/sched/fair.c | 86 +++++++++++++++++++++++++++++++++++++++--
kernel/sched/features.h | 1 +
kernel/sched/sched.h | 1 +
4 files changed, 87 insertions(+), 4 deletions(-)

--
2.25.1