Re: [PATCH 0/5] sched: Lazy preemption muck

From: Steven Rostedt
Date: Wed Oct 09 2024 - 16:48:06 EST


On Wed, 09 Oct 2024 22:13:51 +0200
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> > If we think about it as what would happen with the current PREEMPT_NONE,
> > wouldn't need_resched() return true as soon as NEED_RESCHED is set? That
> > means, with PREEMPT_AUTO, it should return true if LAZY_NEED_RESCHED is
> > set, right? That would make it act the same in both cases.
>
> I don't think so. Quite some of these need_resched() things have been
> sprinkled around to address the issues with PREEMPT_NONE.

My point is that the need_resched() that are sprinkled around are checking
if a schedule has been requested or not. From that point alone,
LAZY_NEED_RESCHED is something that requested a schedule.

>
> We need to look at those places and figure out whether they need it when
> LAZY is enabled. There might be a few which want to look at both flags,
> but my expectation is that those are less than 5% of the overall usage.
>
> Peter's choice is the right one. That forces us to look at all of them
> and figure out whether they need to be extended to include the lazy bit
> or not. Those which do not need it can be eliminated when LAZY is in
> effect because that will preempt on the next possible preemption point
> once the non-lazy bit is set in the tick.
>
> Remember, the goal is to eliminate all except LAZY (RT is a different
> scope) and make the kernel behave correctly to the expectation of LAZY
> (somewhere between NONE and VOLUNTARY) and non-LAZY (equivalent to
> FULL).
>
> The reason why this works is that preempt_enable() actually has a
> meaning while it does not with NONE.

Looking around, I see the pattern of checking need_resched() from within a
loop where a spinlock is held. Then after the break of the loop and release
of the spinlock, cond_resched() is checked, and the loop is entered again.

Thus, I guess this is the reason you are saying that it should just check
NEED_RESCHED and not the LAZY variant. Because if we remove that
cond_resched() then it would just re-enter the loop again with the LAZY
being set.

Hmm, but then again...

Perhaps these cond_resched() is proper? That is, the need_resched() /
cond_resched() is not something that is being done for PREEMPT_NONE, but
for preempt/voluntary kernels too. Maybe these cond_resched() should stay?
If we spin in the loop for one more tick, that is actually changing the
behavior of PREEMPT_NONE and PREEMPT_VOLUNTARY, as the need_resched()/cond_resched()
helps with latency. If we just wait for the next tick, these loops (and
there's a lot of them) will all now run for one tick longer than if
PREEMPT_NONE or PREEMPT_VOLUNTARY were set today.


-- Steve