Re: [PATCH tip/core/rcu 2/2] rcu: Check for wakeup-safe conditions in rcu_read_unlock_special()

From: Peter Zijlstra
Date: Tue Apr 02 2019 - 03:10:27 EST


On Mon, Apr 01, 2019 at 10:22:57AM -0700, Paul E. McKenney wrote:
> > > The initial solution to this problem was to use set_tsk_need_resched() and
> > > set_preempt_need_resched() to force a future context switch, which allows
> > > rcu_preempt_note_context_switch() to report the deferred quiescent state
> > > to RCU's core processing. Unfortunately for expedited grace periods,
> > > there can be a significant delay between the call for a context switch
> > > and the actual context switch.
> >
> > This is all PREEMPT=y kernels, right? Where is the latency coming from?
> > Because PREEMPT=y _should_ react quite quickly.
>
> Yes, PREEMPT=y. It happens like this:
>
> 1. rcu_read_lock() with everything enabled.
>
> 2. Preemption then resumption.
>
> 3. local_irq_disable().
>
> 4. rcu_read_unlock().
>
> 5. local_irq_enable().
>
> From what I know, the scheduler doesn't see anything until the next
> interrupt, local_bh_enable(), return to userspace, etc. Because this
> is PREEMPT=y, preempt_enable() and cond_resched() do nothing. So
> it could be some time (milliseconds, depending on HZ, NO_HZ_FULL, and
> so on) until the scheduler responds. With NO_HZ_FULL, last I knew,
> the delay can be extremely long.
>
> Or am I missing something that gets the scheduler on the job faster?

Oh urgh, yah. So normally we only twiddle with the need_resched state:

- while preempt_disabl(), such that preempt_enable() will reschedule
- from interrupt context, such that interrupt return will reschedule

But the usage here 'violates' those rules and then there is an
unspecified latency between setting the state and it getting observed,
but no longer than 1 tick I would think.

I don't think we can go NOHZ with need_resched set, because the moment
we hit the idle loop with that set, we _will_ reschedule.

So in that respect the irq_work suggestion I made would fix things
properly.