Re: [RFC PATCH 00/11] rv: Add scheduler specification monitors

From: Gabriele Monaco
Date: Fri Feb 07 2025 - 06:36:39 EST




On Fri, 2025-02-07 at 11:55 +0100, Juri Lelli wrote:
> Hi Gabriele,
>
> On 06/02/25 09:09, Gabriele Monaco wrote:
> > This patchset starts including adapted scheduler specifications
> > from
> > Daniel's task model [1].
>
> Thanks a lot for working on this. Apart from being cool stuff per-se,
> it
> means a lot personally to see Daniel's work continuing to be
> developed.
>
> > As the model is fairly complicated, it is split in several
> > generators
> > and specifications. The tool used to create the model can output a
> > unified model, but that would be hardly readable (9k states).
> >
> > RV allows monitors to run and react concurrently. Running the
> > cumulative
> > model is equivalent to running single components using the same
> > reactors, with the advantage that it's easier to point out which
> > specification failed in case of error.
> >
> > We allow this by introducing nested monitors, in short, the sysfs
> > monitor folder will contain a monitor named sched, which is nothing
> > but
> > an empty container for other monitors. Controlling the sched
> > monitor
> > (enable, disable, set reactors) controls all nested monitors.
> >
> > The task model proposed by Daniel includes 12 generators and 33
> > specifications. The generators are good for documentation but are
> > usually implied in some specifications.
> > Not all monitors work out of the box, mainly because of those
> > reasons:
> > * need to distinguish if preempt disable leads to schedule
> > * need to distinguish if irq disable comes from an actual irq
> > * assumptions not always true on SMP
> >
> > The original task model was designed for PREEMPT_RT and this
> > patchset is
> > only tested on an upstream kernel with full preemption enabled.
>
> I played with your additions a bit and I was able to enable/disable
> monitors, switch reactors, etc., w/o noticing any issue.
>

Thanks for trying it out!

> I wonder if you also had ways to test that the monitors actually
> react
> properly in case of erroneous conditions (so that we can see a
> reactor
> actually react :).
>

Well, in my understanding, reactors should fire if there is a problem
either in the kernel or in the model logic.
While trying things out, I had more than a few models failing and I
excluded them from this patch because they are not stable.

Ideally you shouldn't be seeing errors using those monitors, unless you
(un)intentionally break something in the kernel.

That said, the monitor task switch while scheduling (tss) imposes
context switches whenever we reach the scheduler.
Daniel modified the sched_switch tracepoint to fire also if prev==next
(in fact no switch is happening), I'm assuming the tss specification is
partly why that was necessary.
During my tests, I didn't apply that change, yet I've never seen the
monitor failing.

If you manage to call __schedule while the next picked task is the same
as the currently running one, you should see an error and a reactor
firing.

Since I couldn't reproduce the above case, I ignored it for the current
RFC, however if that's possible in practice, we should perhaps add
another event describing this fake switch to prevent the monitor from
failing.

Thanks,
Gabriele