Re: [RFC PATCH 00/11] printk: safe printing in NMI context

From: Paul E. McKenney
Date: Wed Jun 18 2014 - 12:21:27 EST


On Wed, Jun 18, 2014 at 05:58:40AM -1000, Linus Torvalds wrote:
> On Jun 18, 2014 4:36 AM, "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> wrote:
> >
> > I could easily add an option to RCU to allow people to tell it not to
> > use NMIs to dump the stack.
>
> I don't think it should be an "option".
>
> We should stop using nmi as if it was something "normal". It isn't. Code
> running in nmi context should be special, and should be very very aware
> that it is special. That goes way beyond "don't use printk". We seem to
> have gone way way too far in using nmi context.
>
> So we should get *rid* of code in nmi context rather than then complain
> about printk being buggy.

OK, unconditional non-use of NMIs is even easier. ;-)

Something like the following.

Thanx, Paul

------------------------------------------------------------------------

rcu: Don't use NMIs to dump other CPUs' stacks

Although NMI-based stack dumps are in principle more accurate, they are
also more likely to trigger deadlocks. This commit therefore replaces
all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
that the CPU detecting an RCU CPU stall does the stack dumping.

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c590e1201c74..777624e1329b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -932,10 +932,7 @@ static void record_gp_stall_check_time(struct rcu_state *rsp)
}

/*
- * Dump stacks of all tasks running on stalled CPUs. This is a fallback
- * for architectures that do not implement trigger_all_cpu_backtrace().
- * The NMI-triggered stack traces are more accurate because they are
- * printed by the target CPU.
+ * Dump stacks of all tasks running on stalled CPUs.
*/
static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
{
@@ -1013,7 +1010,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
(long)rsp->gpnum, (long)rsp->completed, totqlen);
if (ndetected == 0)
pr_err("INFO: Stall ended before state dump start\n");
- else if (!trigger_all_cpu_backtrace())
+ else
rcu_dump_cpu_stacks(rsp);

/* Complain about tasks blocking the grace period. */
@@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
jiffies - rsp->gp_start,
(long)rsp->gpnum, (long)rsp->completed, totqlen);
- if (!trigger_all_cpu_backtrace())
- dump_stack();
+ rcu_dump_cpu_stacks(rsp);

raw_spin_lock_irqsave(&rnp->lock, flags);
if (ULONG_CMP_GE(jiffies, ACCESS_ONCE(rsp->jiffies_stall)))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/