Re: [PATCH v4 14/20] sched/core: Introduce a simple steal monitor

From: Yury Norov

Date: Thu Jun 18 2026 - 03:16:43 EST


On Thu, Jun 18, 2026 at 12:15:16PM +0530, Shrikanth Hegde wrote:
>
>
>
> Hi Yury.
>
> On 6/18/26 12:09 PM, Yury Norov wrote:
> > On Thu, Jun 18, 2026 at 11:31:17AM +0530, Shrikanth Hegde wrote:
> > >
> > >
> > > On 6/18/26 11:02 AM, K Prateek Nayak wrote:
> > > > Hello Shrikanth, Yury,
> > > >
> > > > On 6/18/2026 10:14 AM, Shrikanth Hegde wrote:
> > > > > On 6/18/26 10:00 AM, Yury Norov wrote:
> > > > > > On Wed, Jun 17, 2026 at 11:11:33PM +0530, Shrikanth Hegde wrote:
> > > > > > > Start with a simple steal monitor.
> > > > > > >
> > > > > > > It is meant to look at steal time and make the decision to
> > > > > > > reduce/increase the preferred CPUs.
> > > > > > >
> > > > > > > It has
> > > > > > > - work function to execute the steal time calculations and decision
> > > > > > >    making periodically.
> > > > > > > - low and high thresholds for steal time.
> > > > > > > - sampling period to control the frequency of steal time calculations.
> > > > > > > - cache the previous decision to avoid oscillations
> > > > > >
> > > > > > This monitor is the one implementation out of quite many possible,
> > > > > > right? I don't think it should live in the core scheduler files, it
> > > > > > should be a module.
> > > >
> > > > I agree that this tight of an integration with the sched bits might not
> > > > not be required.
> > > >
> > > > >
> > > > > You mean similar to drivers/cpuidle/? a new one drivers/steal_monitor/ ?
> > > >
> > > > Since steal time is a virtualization concept, somewhere in drivers/virt/
> > > > probably makes more sense unless we need some scheduler internal API to
> > > > implement it which shouldn't be the case.
> > > >
> > > > All the driver has to do is track steal-time (which should be available
> > > > via kcpustat_cpu_fetch()) periodically (using a workqueue?) and should
> > > > do set_cpu_preferred() (which needs to be made available for other use
> > > > cases anyways) so it should be possible.
> > >
> > > Yes. Seems like doable.
> > >
> > > Do you think it would make sense to keep the debugfs in sched still?
> >
> > The enable/disable part will be replaced with insmod/rmmod. The
> > statistics part - IDK. It is nice to have all stats at the same
> > place. On the other hand, without the driver loaded it would
> > always read zeroes. It anyways is just a single line in sched/core.c,
> > not a big deal.
> >
>
> I was asking about these debugfs knobs.
>
> steal_monitor/high_threshold:500
> steal_monitor/low_threshold:200
> steal_monitor/sampling_period:1000

Those are the driver defaults. And if you want to override one, then:

insmod steal_monitor.ko high_threshold=400