Re: [PATCH v4 14/20] sched/core: Introduce a simple steal monitor
From: Shrikanth Hegde
Date: Thu Jun 18 2026 - 02:03:18 EST
On 6/18/26 11:02 AM, K Prateek Nayak wrote:
Hello Shrikanth, Yury,
On 6/18/2026 10:14 AM, Shrikanth Hegde wrote:
On 6/18/26 10:00 AM, Yury Norov wrote:
On Wed, Jun 17, 2026 at 11:11:33PM +0530, Shrikanth Hegde wrote:
Start with a simple steal monitor.
It is meant to look at steal time and make the decision to
reduce/increase the preferred CPUs.
It has
- work function to execute the steal time calculations and decision
making periodically.
- low and high thresholds for steal time.
- sampling period to control the frequency of steal time calculations.
- cache the previous decision to avoid oscillations
This monitor is the one implementation out of quite many possible,
right? I don't think it should live in the core scheduler files, it
should be a module.
I agree that this tight of an integration with the sched bits might not
not be required.
You mean similar to drivers/cpuidle/? a new one drivers/steal_monitor/ ?
Since steal time is a virtualization concept, somewhere in drivers/virt/
probably makes more sense unless we need some scheduler internal API to
implement it which shouldn't be the case.
All the driver has to do is track steal-time (which should be available
via kcpustat_cpu_fetch()) periodically (using a workqueue?) and should
do set_cpu_preferred() (which needs to be made available for other use
cases anyways) so it should be possible.
Yes. Seems like doable.
Do you think it would make sense to keep the debugfs in sched still?
Since you mentioned you get an interrupt in LPAR before vCPU is
scheduled out due to contention, perhaps this also allows for a way to
add governors, and other heuristic along the line.
No. when vCPU gets scheduled out, there is no such interrupt. Guest vCPU
doesn't know.