Re: [PATCH] rcu: Fix dynticks_nmi_nesting underflow check in rcu_is_cpu_rrupt_from_idle
From: Paul E. McKenney
Date: Wed Dec 23 2020 - 10:13:17 EST
On Wed, Dec 23, 2020 at 02:09:37PM +0530, Neeraj Upadhyay wrote:
> For the smp_call_function() optimization, where callbacks can run from
> idle context, in commit 806f04e9fd2c ("rcu: Allow for smp_call_function()
> running callbacks from idle"), an additional check is added in
> rcu_is_cpu_rrupt_from_idle(), for dynticks_nmi_nesting value being 0,
> for these smp_call_function() callbacks running from idle loop.
> However, this commit missed updating a preexisting underflow check
> of dynticks_nmi_nesting, which checks for a non zero positive value.
> Fix this warning and while at it, read the counter only once.
>
> Signed-off-by: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx>
> ---
>
> Hi,
>
> I was not able to get this warning, with scftorture.
>
> RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> "RCU dynticks_nmi_nesting counter underflow/zero!");
>
> Not sure if idle loop smp_call_function() optimization is already present
> in mainline?
Now that you mention it, I don't see it. But x86 has a number of idle
functions, and I might be missing it. The "idle" kernel boot parameter
allows some ability to select them, and of course more profound (and maybe
more dangerous) changes can be made by changing select_idle_routine().
> Another thing, which I am not sure of is, maybe lockdep gets disabled
> in the idle loop contexts, where rcu_is_cpu_rrupt_from_idle() is called?
> Was this the original intention, to keep the lockdep based
> RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0
> check separate from idle task context nesting value
> WARN_ON_ONCE(!nesting && !is_idle_task(current)) check?
An easy way to test lockdep is to create a pair of locks, acquire them
in one order then release them both, and finally acquire them in the
other order and then release them both. If lockdep is configured and
enabled, it will complain.
The only reason I used RCU_LOCKDEP_WARN() was that people were complaining
to me about idle-entry overhead back at that time. So without lockdep,
there is zero overhead. Maybe people have become more tolerant of idle
delays, or perhaps they are not so worried about an extra check of a
cache-hot quantity.
And another thing to try is to make the RCU_LOCKDEP_WARN() be WARN_ONCE()
or similar. Can't hurt for testing purposes!
Of course, once Peter Zijlstra comes back from vacation, we should ask him.
And thank you for trying this out!
I am tempted to pull this in as is, given the current logical
inconsistency in the checks. Thoughts?
Thanx, Paul
> Thanks
> Neeraj
>
> kernel/rcu/tree.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index bc8b489..c3037cf 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -457,11 +457,10 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> /* Check for counter underflows */
> RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> "RCU dynticks_nesting counter underflow!");
> - RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> - "RCU dynticks_nmi_nesting counter underflow/zero!");
> + nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> + RCU_LOCKDEP_WARN(nesting < 0, "RCU dynticks_nmi_nesting counter underflow!");
>
> /* Are we at first interrupt nesting level? */
> - nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> if (nesting > 1)
> return false;
>
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>