Re: [RESEND PATCH v13 0/9] timers: Exclude isolated cpus from timer migration
From: Waiman Long
Date: Thu Oct 30 2025 - 12:37:18 EST
On 10/30/25 12:09 PM, Gabriele Monaco wrote:
On Thu, 2025-10-30 at 11:37 -0400, Waiman Long wrote:
On 10/30/25 10:12 AM, Frederic Weisbecker wrote:Sure, I'm going to have a look at this next week and send a V14.
Hi Waiman,OK, I thought you were OK with the timer changes. I guess Gabriele will have
Le Wed, Oct 29, 2025 at 10:56:06PM -0400, Waiman Long a écrit :
On 10/20/25 7:27 AM, Gabriele Monaco wrote:Just wait a little, I realize I made a buggy suggestion to Gabriele and
The timer migration mechanism allows active CPUs to pull timers fromThomas,
idle ones to improve the overall idle time. This is however undesired
when CPU intensive workloads run on isolated cores, as the algorithm
would move the timers from housekeeping to isolated cores, negatively
affecting the isolation.
Exclude isolated cores from the timer migration algorithm, extend the
concept of unavailable cores, currently used for offline ones, to
isolated ones:
* A core is unavailable if isolated or offline;
* A core is available if non isolated and online;
A core is considered unavailable as isolated if it belongs to:
* the isolcpus (domain) list
* an isolated cpuset
Except if it is:
* in the nohz_full list (already idle for the hierarchy)
* the nohz timekeeper core (must be available to handle global timers)
CPUs are added to the hierarchy during late boot, excluding isolated
ones, the hierarchy is also adapted when the cpuset isolation changes.
Due to how the timer migration algorithm works, any CPU part of the
hierarchy can have their global timers pulled by remote CPUs and have to
pull remote timers, only skipping pulling remote timers would break the
logic.
For this reason, prevent isolated CPUs from pulling remote global
timers, but also the other way around: any global timer started on an
isolated CPU will run there. This does not break the concept of
isolation (global timers don't come from outside the CPU) and, if
considered inappropriate, can usually be mitigated with other isolation
techniques (e.g. IRQ pinning).
This effect was noticed on a 128 cores machine running oslat on the
isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs,
and the CPU with lowest count in a timer migration hierarchy (here 1
and 65) appears as always active and continuously pulls global timers,
from the housekeeping CPUs. This ends up moving driver work (e.g.
delayed work) to isolated CPUs and causes latency spikes:
before the change:
# oslat -c 1-31,33-63,65-95,97-127 -D 62s
...
Maximum: 1203 10 3 4 ... 5 (us)
after the change:
# oslat -c 1-31,33-63,65-95,97-127 -D 62s
...
Maximum: 10 4 3 4 3 ... 5 (us)
The same behaviour was observed on a machine with as few as 20 cores /
40 threads with isocpus set to: 1-9,11-39 with rtla-osnoise-top.
The first 5 patches are preparatory work to change the concept of
online/offline to available/unavailable, keep track of those in a
separate cpumask cleanup the setting/clearing functions and change a
function name in cpuset code.
Patch 6 and 7 adapt isolation and cpuset to prevent domain isolated and
nohz_full from covering all CPUs not leaving any housekeeping one. This
can lead to problems with the changes introduced in this series because
no CPU would remain to handle global timers.
Patch 9 extends the unavailable status to domain isolated CPUs, which
is the main contribution of the series.
This series is equivalent to v13 but rebased on v6.18-rc2.
This patch series have undergone multiple round of reviews. Do you think
it
is good enough to be merged into tip?
It does contain some cpuset code, but most of the changes are in the timer
code. So I think it is better to go through the tip tree. It does have
some
minor conflicts with the current for-6.19 branch of the cgroup tree, but
it
can be easily resolved during merge.
What do you think?
a detail needs to be fixed.
My bad...
to send out a new version to address your finding.
I am going to extract out your 2 cpuset patches and send them to the cgroup mailing list separately. So you don't need to include them in your next version.
Cheers,
Longman