Re: [PATCH v2 11/11] lockdep,trace: Expose tracepoints

From: peterz
Date: Wed Sep 02 2020 - 04:57:11 EST


On Tue, Sep 01, 2020 at 08:51:46PM -0700, Guenter Roeck wrote:

> [ 27.056457] include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!

> [ 27.057006] Hardware name: Generic OMAP3-GP (Flattened Device Tree)
> [ 27.057098] [<c03125b4>] (unwind_backtrace) from [<c030c32c>] (show_stack+0x10/0x14)
> [ 27.057189] [<c030c32c>] (show_stack) from [<c08e7a4c>] (dump_stack+0xd8/0xf8)
> [ 27.057312] [<c08e7a4c>] (dump_stack) from [<c03b04bc>] (lock_acquire+0x4d8/0x4dc)
> [ 27.057464] [<c03b04bc>] (lock_acquire) from [<c12457e8>] (_raw_spin_lock_irqsave+0x58/0x74)
> [ 27.057617] [<c12457e8>] (_raw_spin_lock_irqsave) from [<c0338198>] (pwrdm_lock+0x10/0x18)
> [ 27.057739] [<c0338198>] (pwrdm_lock) from [<c033a214>] (clkdm_deny_idle+0x10/0x24)
> [ 27.057891] [<c033a214>] (clkdm_deny_idle) from [<c0332e10>] (omap3_enter_idle_bm+0xd4/0x1b8)
> [ 27.058044] [<c0332e10>] (omap3_enter_idle_bm) from [<c0f740d4>] (cpuidle_enter_state+0x16c/0x620)

ARM cpuidle is a trainwreck :/

So it looks like we have:

- clkdm_
- pwrdm_
- cpu_pm_
- pm_runtime_

In that approximate order, and then there's the coupled idle muck.
Sometimes cpuidle core calls cpu_pm_*(), but mostly it's sprinkled
around in drivers.

How about we unconditionally kill tracing when RCU is idle? Yes this is
a hack, and we really should turn it into a WARN in due time.

The thing is, we're shutting down clock/power domains and god knows
what, the CPU is in a crap state, we'll not take interrupts, but tracing
must happen! Hell no.

Let's make the rule that if you want something traced, you get to pull
it out of the cpuidle driver and stick it in the generic code with a
CPUIDLE_FLAG, before rcu_idle_enter().

Totally untested patch below..

---

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 7d9c1c0e149c..878bac893e41 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -27,17 +27,20 @@
* SOFTIRQ_MASK: 0x0000ff00
* HARDIRQ_MASK: 0x000f0000
* NMI_MASK: 0x00f00000
+ * RCUIDLE_MASK: 0x01000000
* PREEMPT_NEED_RESCHED: 0x80000000
*/
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
#define HARDIRQ_BITS 4
#define NMI_BITS 4
+#define RCUIDLE_BITS 1

#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
#define HARDIRQ_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
#define NMI_SHIFT (HARDIRQ_SHIFT + HARDIRQ_BITS)
+#define RCUIDLE_SHIFT (NMI_SHIFT + NMI_BITS)

#define __IRQ_MASK(x) ((1UL << (x))-1)

@@ -45,11 +48,13 @@
#define SOFTIRQ_MASK (__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
#define HARDIRQ_MASK (__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
#define NMI_MASK (__IRQ_MASK(NMI_BITS) << NMI_SHIFT)
+#define RCUIDLE_MASK (__IRQ_MASK(RCUIDLE_BITS) << RCUIDLE_SHIFT)

#define PREEMPT_OFFSET (1UL << PREEMPT_SHIFT)
#define SOFTIRQ_OFFSET (1UL << SOFTIRQ_SHIFT)
#define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT)
#define NMI_OFFSET (1UL << NMI_SHIFT)
+#define RCUIDLE_OFFSET (1UL << RCUIDLE_SHIFT)

#define SOFTIRQ_DISABLE_OFFSET (2 * SOFTIRQ_OFFSET)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 598fec9f9dbf..997472c662c4 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -164,7 +164,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
void *__data; \
int __maybe_unused __idx = 0; \
\
- if (!(cond)) \
+ if (!(cond) || (preempt_count() & RCUIDLE_MASK)) \
return; \
\
/* srcu can't be used from NMI */ \
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8ce77d9ac716..ad9fb4f12c63 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -269,6 +269,8 @@ static noinstr void rcu_dynticks_eqs_enter(void)
/* Better not have special action (TLB flush) pending! */
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
(seq & RCU_DYNTICK_CTRL_MASK));
+
+ __preempt_count_add(RCUIDLE_OFFSET);
}

/*
@@ -281,6 +283,8 @@ static noinstr void rcu_dynticks_eqs_exit(void)
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
int seq;

+ __preempt_count_sub(RCUIDLE_OFFSET);
+
/*
* CPUs seeing atomic_add_return() must see prior idle sojourns,
* and we also must force ordering with the next RCU read-side