Re: [PATCH] timers: Exclude isolated cpus from timer migation

From: Waiman Long
Date: Thu Apr 10 2025 - 10:22:19 EST



On 4/10/25 6:38 AM, Gabriele Monaco wrote:
On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
On Thu, Apr 10 2025 at 08:54, Gabriele Monaco wrote:
+/*  cpumask excluded from migration */
+static cpumask_var_t tmigr_unavailable_cpumask;
Why is this a negated mask instead of being the obvious and intuitive
available mask?
Well, the way I started writing the patch I found it easier to do the
double andnot in tmigr_isolated_exclude_cpumask to see what changed.
I see the way it evolved is just messier..
I'll apply your solution which seems much neater!

  if (firstexp != KTIME_MAX) {
- migrator = cpumask_any_but(cpu_online_mask, cpu);
+ migrator = cpumask_nth_andnot(0, cpu_possible_mask,
+       tmigr_unavailable_cpumask);
That's exactly what this negated mask causes: incomprehensible code.

cpumask_clear_cpu(cpu, available_mask);
        ...
migrator = cpumask_first(available_mask);

is too simple and obvious, right?

+ /* Fall back to any online in case all are isolated. */
How can that happen? There is always at least _ONE_ housekeeping,
non-isolated, CPU online, no?

In my understanding it shouldn't, but I'm not sure there's anything
preventing the user from isolating everything via cpuset.
Anyway that's something no one in their mind should do, so I guess I'd
just opt for the cpumask_first (or actually cpumask_any, like before
the change).

No, cpuset is not allowed to take all the CPUs from the cgroup root (housekeeping CPUs). So it can't happen.


+ if (migrator >= nr_cpu_ids)
+ migrator = cpumask_any_but(cpu_online_mask, cpu);
  work_on_cpu(migrator, tmigr_trigger_active, NULL);
  }
  return 0;
 }
-static int tmigr_cpu_online(unsigned int cpu)
+static int tmigr_cpu_available(unsigned int cpu)
 {
- struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
+ struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
  /* Check whether CPU data was successfully initialized */
  if (WARN_ON_ONCE(!tmc->tmgroup))
  return -EINVAL;
+ /* Silently ignore online requests if isolated */
This comment makes no sense.

     /* Isolated CPUs are not participating in timer migration */

makes it entirely clear what this is about, no?

That brings me to the general design decision here. Your changelog
explains at great length WHAT the change is doing, but completely
fails
to explain the consequences and the rationale why this is the right
thing to do.

By excluding the isolated CPUs from migration completely, any
'global'
timer, which is armed on such a CPU, has to be expired on that
isolated
CPU. That's fundamentaly different from e.g. RCU isolation.

It might be the right thing to do and harmless, but without a proper
explanation it's a silent side effect of your changes, which leaves
people scratching their heads.
Mmh, essentially the idea is that global timer should not migrate from
housekeeping to isolated cores. I assumed the opposite never occurs (as
global timers /should/ not even start on isolated cores on a properly
isolated system), but you're right, that's not quite true.

Thinking about it now, since global timers /can/ start on isolated
cores, that makes them quite different from offline ones and probably
considering them the same is just not the right thing to do..

I'm going to have a deeper thought about this whole approach, perhaps
something simpler just preventing migration in that one direction would
suffice.

+ if (cpu_is_isolated(cpu))
+ return 0;
  raw_spin_lock_irq(&tmc->lock);
- trace_tmigr_cpu_online(tmc);
+ trace_tmigr_cpu_available(tmc);
  tmc->idle = timer_base_is_idle();
  if (!tmc->idle)
  __tmigr_cpu_activate(tmc);
- tmc->online = true;
+ tmc->available = true;
+ tmc->idle = true;
How so?

+ cpumask_clear_cpu(cpu, tmigr_unavailable_cpumask);
  raw_spin_unlock_irq(&tmc->lock);
  return 0;
 }
+int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
cpumask_var_t is wrong here. 'const struct cpumask *' is what you
want.
You mean in the function argument? That's exactly how it is handled in
workqueue_unbound_exclude_cpumask. I get cpumask_var_t is not
necessarily a pointer, perhaps it's worth changing it there too..
Or am I missing your point?

cpumask_var_t can be interchangeable with "struct cpumask *" as long as the passed-in cpumask is not being modified.

Cheers,
Longman