Re: rcu_preempt detected stalls.

From: Paul E. McKenney
Date: Thu Oct 23 2014 - 16:10:04 EST


On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> >>> oday in "rcu stall while fuzzing" news:
> >>>
> >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> >>> Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> >>> Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> >>> (detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> >>
> >> I've complained about RCU stalls couple days ago (in a different context)
> >> on -next. I guess whatever causing them made it into Linus's tree?
> >>
> >> https://lkml.org/lkml/2014/10/11/64
> >
> > And on that one, I must confess that I don't see where the RCU read-side
> > critical section might be.
> >
> > Hmmm... Maybe someone forgot to put an rcu_read_unlock() somewhere.
> > Can you reproduce this with CONFIG_PROVE_RCU=y?
>
> Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> set and nothing else is showing up before/after that.

Indeed it was directed to you. ;-)

Does the following crude diagnostic patch turn up anything?

Thanx, Paul

------------------------------------------------------------------------

softirq: Check for RCU read-side misnesting in softirq handlers

This commit adds checks for RCU read-side misnesting in softirq handlers.
Please note that this works only for CONFIG_TREE_PREEMPT_RCU=y because
the other RCU flavors have no way of knowing how deeply nested they are.

Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 501baa9ac1be..c6b63a4c576d 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -257,11 +257,13 @@ restart:
while ((softirq_bit = ffs(pending))) {
unsigned int vec_nr;
int prev_count;
+ int rcu_depth;

h += softirq_bit - 1;

vec_nr = h - softirq_vec;
prev_count = preempt_count();
+ rcu_depth = rcu_preempt_depth();

kstat_incr_softirqs_this_cpu(vec_nr);

@@ -274,6 +276,11 @@ restart:
prev_count, preempt_count());
preempt_count_set(prev_count);
}
+ if (IS_ENABLED(CONFIG_PROVE_RCU) &&
+ rcu_depth != rcu_preempt_depth())
+ pr_err("huh, entered softirq %u %s %p with RCU nesting %08x, exited with %08x?\n",
+ vec_nr, softirq_to_name[vec_nr], h->action,
+ rcu_depth, rcu_preempt_depth());
h++;
pending >>= softirq_bit;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/