Re: [PATCH 0/2] sched_ext: Add a core event and update scx schedulers

From: Andrea Righi
Date: Fri Feb 07 2025 - 16:46:03 EST


On Fri, Feb 07, 2025 at 11:38:31AM -1000, Tejun Heo wrote:
> Hello,
>
> On Fri, Feb 07, 2025 at 07:24:08AM +0100, Andrea Righi wrote:
> > On Fri, Feb 07, 2025 at 12:13:36PM +0900, Changwoo Min wrote:
> > > This patchset introduces a new event, SCX_EV_ENQ_SLICE_DFL, and updates
> > > two scx schedulers -- scx_qmap and scx_central -- to print out the new
> > > event.
> > >
> > > SCX_EV_ENQ_SLICE_DFL counts how many times the tasks' time slice is set
> > > to the default value (SCX_SLICE_DFL) by the sched_ext core in the enqueue
> > > and pick_next paths.
> > >
> > > Scheduling a task with SCX_SLICE_DFL unintentionally would be a source
> > > of latency spikes because SCX_SLICE_DFL is relatively long (20 msec).
> > > Thus, soaring the SCX_EV_ENQ_SLICE_DFL value would be a sign of BPF
> > > scheduler bugs, causing latency spikes.
> >
> > Not directly related to this patch set, but as a general thought: would it
> > be useful to introduce ops->slice_ms (in sched_ext_ops) to override
> > SCX_SLICE_DFL?
> >
> > With that, schedulers that care about latency could set a smaller default
> > time slice to prevent potential spikes caused by the implicit use of
> > SCX_SLICE_DFL.
> >
> > Opinions?
>
> I'm not sure. BPF schedulers should be able to avoid getting the default
> slice. Hopefully, with the added visibility, this should be easier now. I'm
> not sure how much overriding the default value in ops helps in terms of
> control. It's a very half-way measure. Instead, how about we add tracepoint
> to scx_add_event() so that folks who want to get backtrace of specific
> events can get them easily so that it's easier to debug where these counts
> are coming from? Let's just make it easier to avoid these events.

Yeah, that's a valid point, the implicit SCX_SLICE_DFL should be seen as a
countermeasure for unhandled situations. Instead of fixing the
countermeasure itself we should try to prevent it, if it proves to be
problematic. And I like the idea of having a way to backtrace specific
events.

-Andrea