Re: [PATCH v3 15/20] sched/core: Introduce a simple steal monitor

From: Shrikanth Hegde

Date: Tue Jun 09 2026 - 01:55:01 EST




On 6/8/26 9:55 PM, Ilya Leoshkevich wrote:


On 5/14/26 17:21, Shrikanth Hegde wrote:
Start with a simple steal monitor.

It is meant to look at steal time and make the decision to
reduce/increase the preferred CPUs.

It has
- work function to execute the steal time calculations and decision
   making periodically.
- temporary cpumask, which will be used in the work function. This helps
   to avoid cpumask allocation in periodic work function.
- low and high thresholds for steal time.
- sampling period to control the frequency of steal time calculations.
- cache the previous decision to avoid oscillations

Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>
---
  include/linux/sched.h | 13 +++++++++++++
  kernel/sched/core.c   | 24 ++++++++++++++++++++++++
  kernel/sched/sched.h  |  3 +++
  3 files changed, 40 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index dcfb57c90850..ee5f19a96118 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h

[...]

@@ -11351,4 +11353,26 @@ void sched_push_current_non_preferred_cpu(struct rq *rq)
                  push_task, this_cpu_ptr(&npc_push_task_work));
      local_irq_restore(flags);
  }
+
+struct steal_monitor_t steal_mon;
+
+void sched_init_steal_monitor(void)
+{
+    INIT_WORK(&steal_mon.work, sched_steal_detection_work);
+    zalloc_cpumask_var(&steal_mon.tmp_mask, GFP_KERNEL);

Error handling is missing.

[...]

I had seen llm complain the same. I think there very little chance for it since this is
during the boot. But yes.
I can add the relevant checks for tmp_mask and WARN_ON here. Its not like anything should
crash if one can;t enable this steal monitor thing.


PS:
we don't have similar error handling in other paths for zalloc_cpumask_var case too.
Many are in init time, so again very unlikely. One can add BUG_ON() for those,
but that's different story.