[PATCH 0/2] sched: improve task_mm_cid_work impact on isolated systems

From: Gabriele Monaco
Date: Mon Dec 02 2024 - 09:15:16 EST


This patchset introduces two small changes to make the task_mm_cid_work
lighter and less problematic for RT tasks.

We observed moderate latency spikes in a system with isolated cores but
multiple tasks running on those cores (e.g. one stressor and one
measuring thread).

If the nohz tick occurs during the measuring thread's execution (i.e.
the RT task), the task work calling task_mm_cid_work alone can take
around 30-35us, this is above the requirements for isolated cores.

The first patch reduces the runtime of the task by lowering the number
of cores that are checked during CID cleanup. Instead of iterating over
all possible cores, we only check the ones defined by the CID mask.

The second patch moves the work in a preemptible context (RCU callback),
making it harmless towards RT tasks.

We run the benchmark on a 128-core aarch64 box with 4 housekeping cores
and 124 (1-31,33-63,65-95,97-127) isolated cores.

Each isolated core is running an instance of stress-ng:
# (foreach N in 1-31,33-63,65-95,97-127)
# taskset -c N stress-ng --cpu 1 --cpu-load 80
And an rtla timerlat measuring thread (besides the first isolated core
running the main timerlat thread):
# cpus=2-31,33-63,65-95,97-127
# rtla timerlat top -q -P f:95 -c $cpus -H 1

Our 30min test run without this patch reaches a maximum latency on one
core (say cpu 113) of 48us.

After this patch, we get a latency below 20us on all cores.

Gabriele Monaco (2):
sched: Optimise task_mm_cid_work duration
sched: Move task_mm_cid_work to RCU callback

include/linux/sched.h | 1 -
kernel/sched/core.c | 21 ++++++++-------------
2 files changed, 8 insertions(+), 14 deletions(-)


base-commit: e70140ba0d2b1a30467d4af6bcfe761327b9ec95
--
2.47.0