Re: [PATCH v5 net 1/3] rcu: add a helper to report consolidated flavor QS

From: Sebastian Andrzej Siewior
Date: Fri Apr 05 2024 - 09:50:06 EST


On 2024-03-22 21:02:02 [-0500], Yan Zhai wrote:
> On Fri, Mar 22, 2024 at 4:31 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Fri, Mar 22, 2024 at 12:24:13PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2024-03-19 13:44:34 [-0700], Yan Zhai wrote:
> > > > + * The macro is not needed when CONFIG_PREEMPT_RT is defined. RT kernels would
> > > > + * have more chance to invoke schedule() calls and provide necessary quiescent
> > > > + * states. As a contrast, calling cond_resched() only won't achieve the same
> > > > + * effect because cond_resched() does not provide RCU-Tasks quiescent states.
> > > > + */
> > >
> > > Paul, so CONFIG_PREEMPTION is affected but CONFIG_PREEMPT_RT is not.
> > > Why does RT have more scheduling points?
> >
> > In RT, isn't BH-disabled code preemptible? But yes, this would not help
> > RCU Tasks.
Yes, it is but why does it matter? This is used in the NAPI thread which
fully preemptible and does cond_resched(). This should be enough.

> By "more chance to invoke schedule()", my thought was that
> cond_resched becomes no op on RT or PREEMPT kernel. So it will not
> call __schedule(SM_PEREEMPT), which clears the NEED_RESCHED flag. On a
It will nop cond_resched(), correct. However once something sends
NEED_RESCHED then the receiver of this flag will __schedule(SM_PEREEMPT)
as soon as possible. That is either because the scheduler sends an IPI
and the CPU will do it in the irq-exit path _or_ the thread does
preempt_enable() (which includes local_bh_enable()) and the counter hits
zero at which point the same context switch happens.

Therefore I don't see a difference between CONFIG_PREEMPT and
CONFIG_PREEMPT_RT.

> normal irq exit like timer, when NEED_RESCHED is on,
> schedule()/__schedule(0) can be called time by time then.

This I can not parse. __schedule(0) means the task gives up on its own
and goes to sleep. This does not happen for the NAPI-thread loop,
kworker loop or any other loop that consumes one work item after the
other and relies on cond_resched() in between.

> __schedule(0) is good for RCU tasks, __schedule(SM_PREEMPT) is not.
Okay and that is why? This means you expect that every thread gives up
on its own which may take some time depending on the workload. This
should not matter.

If I see this right, the only difference is rcu_tasks_classic_qs() and I
didn't figure out yet what it does.

> But I think this code comment does not take into account frequent
> preempt_schedule and irqentry_exit_cond_resched on a PREEMPT kernel.
> When returning to these busy kthreads, irqentry_exit_cond_resched is
> in fact called now, not schedule(). So likely __schedule(PREEMPT) is
> still called frequently, or even more frequently. So the code comment
> looks incorrect on the RT argument part. We probably should remove the
> "IS_ENABLED" condition really. Paul and Sebastian, does this sound
> reasonable to you?

Can you walk me through it? Why is it so important for a task to give up
voluntary? There is something wrong here with how RCU tasks works.
We want to get rid of the sprinkled cond_resched(). This looks like a
another variant of it that might be required in places with no
explanation except it takes too long.

> Yan

Sebastian