Re: [RFC 2/2] rcu: Remove ->dynticks_nmi_nesting from struct rcu_dynticks

From: Joel Fernandes
Date: Mon Jun 25 2018 - 15:15:57 EST


Hi Paul,

Thanks a lot for your comments, my replies inline:

On Mon, Jun 25, 2018 at 10:19:20AM -0700, Paul E. McKenney wrote:
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> When I traced rdtp->dynticks_nesting, I could only find its
> value to be either a 0 or a 1. However looking back at old kernel
> sources, it appears that these can be nested becaues of so called
> âhalf-interruptsâ. I believe these are basically interrupts
> that cause a transition to usermode due to usermode upcalls
> (usermode helper subsystem). So a nesting situation could be
> something like: 1. Transition from idle to process context which
> makes dynticks_nesting == 1. Next, an interrupt comes in which
> makes a usermode upcall. This usermode call now makes a system
> call causing entry back into process context, which increments
> the dynticks_nesting counter to 2. Such a crazy situation is
> perhaps possible.
>
> The half-interrupts can instead cause ->dynticks_nmi_nesting to either
> fail to return to zero or to go negative, depending on which half of

Actually in the above paragraph I was referring to a "half interrupt" messing
up dynticks_nesting, not dynticks_nmi_nesting. I know that the latter can be
messed up too but I wasn't referring to dynticks_nmi_nesting in this part of
the article.

I was thinking more in terms of the comment in:
https://elixir.bootlin.com/linux/v3.19.8/source/kernel/rcu/rcu.h#L34
/*
* Process-level increment to ->dynticks_nesting field. This allows for
* architectures that use half-interrupts and half-exceptions from
* process context.
... */

In my hypothetical example above that you quoted from my notes, I was trying
to reason about how taking a half-interrupt in process context can cause
dynticks_nesting to increase to 2. Thinking some more though, I am not sure
how the above hypothetical example I mentioned can cause this ;) since the
transition to usermode from the half-interrupt should have corrected the
dynticks_nesting counter due to the callchain: rcu_user_enter->rcu_eqs_enter ?

> the interrupt was present. I don't immediately recall the reason for
> allowing nested process-level entry/exit. Might be another place to
> put a WARN_ON_ONCE(), as eliminating this capability would save another
> conditional branch.

Sure, sounds good to me.

>
> Any time the rdtp->dynticks counterâs second-lowest most bit
> is not set, we are in an EQS, and if its set, then we are not
> (second lowest because lowest is reserved for something else as
> of v4.18-rc1). This function is not useful to check if weâre
> in an EQS from a timer tick though, because its possible the
> timer tick interrupt entry caused an EQS exit which updated
> the counter. IOW, the âdynticksâ counter is not capable of
> checking if we had already exited the EQS before. To check if
> we were in an EQS or not from the timer tick, we instead must
> use dynticks_nesting counter. More on that later. The above
> function is probably just useful to make sure that interrupt
> entry/exit is properly updating the dynticks counter, and also
> to make sure from non-interrupt context that RCU is in an EQS
> (see rcu_gp_fqs function).
>
> You lost me on this one. There is rcu_is_cpu_rrupt_from_idle(), but
> I am not sure what you are trying to achieve here, so I am not sure
> whether this function does what you want.

Sorry about that. Let me try to explain in detail about why I wrote the above
paragraph when talking about rdtp->dynticks.

I was trying to determine how the RCU code determines if the CPU is idle. It
appears from the code that there are 2 ways it does so:

1. By calling rcu_is_cpu_rrupt_from_idle() which checks for the
dynticks_nesting counter. If the counter is 0, then CPU was idle at the time
of the check. This is how rcu_check_callbacks knows that the CPU was idle.

2. By checking for evenness of the dynticks counter. If its even we were idle
(or perhaps in usermode, but I think that extra inference doesn't hurt). This
is done in rcu_dynticks_curr_cpu_in_eqs.

So basically, there are 2 different counters that seem to serve the same
purpose as far as determining if we're in an idle EQS state goes. Right?

Then I was trying to see why we can't just use method 2. in
rcu_check_callbacks to determine if the "timer interrupt was taken while the
CPU was idle". rcu_check_callbacks could simply call
rcu_dynticks_curr_cpu_in_eqs() from rcu_check_callbacks(). I was trying to
convince myself why that wouldn't work.

I concluded that that wouldn't work because the timer interrupt that led to
the rcu_check_callbacks() call would have tainted the dynticks counter
because of it would have called rcu_nmi_enter() during interrupt entry. So
there's no way to know if the CPU was really idle at the time of the
interrupt if we were to rely on rcu_dynticks_curr_cpu_in_eqs for that. Hence
we would need to rely on method 1 for the "did I take an interrupt while I
was idle" in rcu_check_callbacks() function which uses the dynticks_nesting
counter for this determination. Does that make sense?

>
> When dynticks_nesting is decremented to 0 (the outermost
> process-context nesting level exit causes an eqs-entry), the
> dynticks_nmi_nesting is reset to
>
> I think you want "0." at the end of this sentence. Or maybe my browser
> is messing things up.

Yes the 0. was on the next line, but I moved it back to the previous line so
its easier to read. Thanks for letting me know.

Thanks!

- Joel