[PATCH v2 17/17] sched/debug: Add debug knobs for steal monitor
From: Shrikanth Hegde
Date: Tue Apr 07 2026 - 15:28:25 EST
Add three debug knobs:
steal_mon_period - sampling frequency in milliseconds.
steal_mon_low - lower threshold value (specify percentage * 100)
steal_mon_high - higher threshold value (specify percentage * 100)
Refer to Documentation/scheduler/sched-debug.rst for detailed info.
Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>
---
Documentation/scheduler/sched-debug.rst | 27 +++++++++++++++++++++++++
kernel/sched/debug.c | 6 ++++++
kernel/sched/sched.h | 2 ++
3 files changed, 35 insertions(+)
diff --git a/Documentation/scheduler/sched-debug.rst b/Documentation/scheduler/sched-debug.rst
index b5a92a39eccd..288cd2c63224 100644
--- a/Documentation/scheduler/sched-debug.rst
+++ b/Documentation/scheduler/sched-debug.rst
@@ -52,3 +52,30 @@ rate for each task.
``scan_size_mb`` is how many megabytes worth of pages are scanned for
a given scan.
+
+==================================
+Tunables for generic steal monitor
+==================================
+
+Generic Steal time monitor can be enabled by selecting STEAL_MONITOR in
+sched features. It is disabled by default.
+
+steal_mon_period - sampling frequency in milliseconds.
+How often sampling for steal values happen. This controls how fast scheduler
+acts on detecting the changes to steal time values.
+Default value is 1000 milliseconds.
+
+steal_mon_low - lower threshold value in percentage * 100
+This determines what values should be considered as nil/no steal values.
+When scheduler see steal times below this value, it will try to increase
+the preferred CPUs by 1 core. Having value as zero causes too much oscillations.
+Default value is 200, i.e 2% steal is considered as low threshold.
+
+steal_mon_high - higher threshold value in percentage * 100
+This determines what values should be considered as high steal values.
+When scheduler see steal times higher than this value, it will reduce
+the preferred CPUs by 1 core.
+Default value is 500, i.e 5% steal is considered as high threshold.
+
+Note: When the steal values in between high and low threshold no action is taken
+by scheduler. This is to avoid too much oscillations.
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 482c86a0ff80..9a6c1ada2cec 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -612,6 +612,12 @@ static __init int sched_init_debug(void)
debugfs_create_u32("migration_cost_ns", 0644, debugfs_sched, &sysctl_sched_migration_cost);
debugfs_create_u32("nr_migrate", 0644, debugfs_sched, &sysctl_sched_nr_migrate);
+#ifdef CONFIG_PARAVIRT
+ debugfs_create_u32("steal_mon_low", 0644, debugfs_sched, &steal_mon.low_threshold);
+ debugfs_create_u32("steal_mon_high", 0644, debugfs_sched, &steal_mon.high_threshold);
+ debugfs_create_u32("steal_mon_period", 0644, debugfs_sched, &steal_mon.sampling_period_ms);
+#endif
+
sched_domains_mutex_lock();
update_sched_domain_debugfs();
sched_domains_mutex_unlock();
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 337357e48a83..850d944b22f4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -4149,6 +4149,8 @@ struct steal_monitor_t {
unsigned int sampling_period_ms;
};
+extern struct steal_monitor_t steal_mon;
+
static inline bool task_can_run_on_preferred_cpu(struct task_struct *p)
{
return cpumask_intersects(p->cpus_ptr, cpu_preferred_mask);
--
2.47.3