Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

From: Joel Fernandes
Date: Tue Aug 27 2019 - 09:08:58 EST


On Tue, Aug 27, 2019 at 11:23:33AM +0200, Sebastian Andrzej Siewior wrote:
[snip]
> > However, if this was instead an rcu_read_lock() critical section within
> > a PREEMPT=y kernel, then if a schedule() occured within stop_one_task(),
> > RCU would consider that critical section to be preempted. This means
> > that any RCU grace period that is blocked by this RCU read-side critical
> > section would remain blocked until stop_one_cpu() resumed, returned,
> > and so on until the matching rcu_read_unlock() was reached. In other
> > words, RCU would consider that RCU read-side critical section to span
> > the call to stop_one_cpu() even if stop_one_cpu() invoked schedule().
>
> Isn't that my example from above and what we do in RT? My understanding
> is that this is the reason why we need BOOST on RT otherwise the RCU
> critical section could remain blocked for some time.

Not just for boost, it is needed to block the grace period itself on
PREEMPT=y. On PREEMPT=y, if rcu_note_context_switch() happens in middle of a
rcu_read_lock() reader section, then the task is added to a blocked list
(rcu_preempt_ctxt_queue). Then just after that, the CPU reports a QS state
(rcu_qs()) as you can see in the PREEMPT=y implementation of
rcu_note_context_switch(). Even though the CPU has reported a QS, the grace
period will not end because the preempted (or block as could be in -rt) task
is still blocking the grace period. This is fundamental to the function of
Preemptible-RCU where there is the concept of tasks blocking a grace period,
not just CPUs.

I think what Paul is trying to explain AIUI (Paul please let me know if I
missed something):

(1) Anyone calling rcu_note_context_switch() and expecting it to respect
RCU-readers that are readers as a result of interrupt disabled regions, have
incorrect expectations. So calling rcu_note_context_switch() has to be done
carefully.

(2) Disabling interrupts is "generally" implied as an RCU-sched flavor
reader. However, invoking rcu_note_context_switch() from a disabled interrupt
region is *required* for rcu_note_context_switch() to work correctly.

(3) On PREEMPT=y kernels, invoking rcu_note_context_switch() from an
interrupt disabled region does not mean that that the task will be added to a
blocked list (unless it is also in an RCU-preempt reader) so
rcu_note_context_switch() may immediately report a quiescent state and
nothing blockings the grace period.
So callers of rcu_note_context_switch() must be aware of this behavior.

(4) On PREEMPT=n, unlike PREEMPT=y, there is no blocked list handling and so
nothing will block the grace period once rcu_note_context_switch() is called.
So any path calling rcu_note_context_switch() on a PREEMPT=n kernel, in the
middle of something that is expected to be an RCU reader would be really bad
from an RCU view point.

Probably, we should add this all to documentation somewhere.

thanks!

- Joel


> > On the other hand, within a PREEMPT=n kernel, the call to schedule()
> > would split even an rcu_read_lock() critical section. Which is why I
> > asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
> > in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
> > complaints in that case.
>
> sleeping_lock_inc() +dec() is only RT specific. It is part of RT's
> spin_lock() implementation and used by RCU (rcu_note_context_switch())
> to not complain if invoked within a critical section.
>
> > Does that help, or am I missing the point?
> >
> > Thanx, Paul
> Sebastian