[RFC] tracepoint: Run tracepoints even after CPU is offline
From: Joel Fernandes (Google)
Date: Sun Aug 05 2018 - 23:44:58 EST
Commit f37755490fe9 ("tracepoints: Do not trace when cpu is offline")
causes a problem for lockdep using tracepoint code. Once a CPU is
offline, tracepoints donot get called, however this causes a big problem
for lockdep probes that need to run so that IRQ annotations are marked
correctly.
An issue is possible where while the CPU is going offline, an interrupt
can come in and then a lockdep assert causes an annotation warning:
[ 106.551354] IRQs not enabled as expected
[ 106.551785] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:982
tick_nohz_idle_enter+0x99/0xb0
[ 106.552964] Modules linked in:
[ 106.553299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W
We need tracepoints to run as late as possible. This commit tries to fix
the issue by removing the cpu_online check in tracepoint code that was
introduced in the mentioned commit, however now we run the risk of
running dereferencing probes that aren't RCU protected, which gives an
RCU warning like so on boot up:
[ 0.030159] x86: Booting SMP configuration:
[ 0.030169] .... node #0, CPUs: #1
[ 0.001000]
[ 0.001000] =============================
[ 0.001000] WARNING: suspicious RCU usage
[ 0.001000] 4.18.0-rc6+ #42 Not tainted
[ 0.001000] -----------------------------
[ 0.001000] ./include/trace/events/timer.h:38 suspicious
rcu_dereference_check() usage!
[ 0.001000]
[ 0.001000] other info that might help us debug this:
[ 0.001000]
[ 0.001000]
[ 0.001000] RCU used illegally from offline CPU!
[ 0.001000] rcu_scheduler_active = 1, debug_locks = 1
[ 0.001000] no locks held by swapper/1/0.
[ 0.001000]
Any ideas on how we can fix this? Basically we need RCU to work here
even after !cpu_online. I thought of just using SRCU for all tracepoints
however that may mean we can't use tracepoints from NMI..
Tries-to-Fix: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and
unify their usage")
Reported-by: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
---
include/linux/tracepoint.h | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index d9a084c72541..020885714a0f 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -365,19 +365,17 @@ extern void syscall_unregfunc(void);
* "void *__data, proto" as the callback prototype.
*/
#define DECLARE_TRACE_NOARGS(name) \
- __DECLARE_TRACE(name, void, , \
- cpu_online(raw_smp_processor_id()), \
+ __DECLARE_TRACE(name, void, , 1, \
void *__data, __data)
#define DECLARE_TRACE(name, proto, args) \
- __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \
- cpu_online(raw_smp_processor_id()), \
+ __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1, \
PARAMS(void *__data, proto), \
PARAMS(__data, args))
#define DECLARE_TRACE_CONDITION(name, proto, args, cond) \
__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \
- cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \
+ PARAMS(cond), \
PARAMS(void *__data, proto), \
PARAMS(__data, args))
--
2.18.0.597.ga71716f1ad-goog