Re: [PATCH v12 3/3] tracing: Centralize preemptirq tracepoints and unify their usage
From: Joel Fernandes
Date: Tue Aug 07 2018 - 23:44:29 EST
Hi Steve,
On Tue, Aug 7, 2018 at 7:28 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Tue, 7 Aug 2018 19:13:32 -0700
> Joel Fernandes <joelaf@xxxxxxxxxx> wrote:
>> >
>> >> From 6986af946ceb04fc9ddc6d5b45fc559b6807e465 Mon Sep 17 00:00:00 2001
>> >> From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
>> >> Date: Sun, 5 Aug 2018 20:17:41 -0700
>> >> Subject: [PATCH] tracepoint: Run tracepoints even after CPU is offline
>> >>
>> >> Commit f37755490fe9 ("tracepoints: Do not trace when cpu is offline")
>> >> causes a problem for lockdep using tracepoint code. Once a CPU is
>> >> offline, tracepoints donot get called, however this causes a big problem
>> >> for lockdep probes that need to run so that IRQ annotations are marked
>> >> correctly.
>> >>
>> >> A race is possible where while the CPU is going offline, an interrupt
>> >> can come in and then a lockdep assert causes an annotation warning:
>> >>
>> >> [ 106.551354] IRQs not enabled as expected
>> >> [ 106.551785] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:982
>> >> tick_nohz_idle_enter+0x99/0xb0
>> >> [ 106.552964] Modules linked in:
>> >> [ 106.553299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W
>> >>
>> >> We need tracepoints to run as late as possible. This commit fixes the
>> >> issue by removing the cpu_online check in tracepoint code that was
>> >> introduced in the mentioned commit, however we now switch to using SRCU
>> >> for all tracepoints and special handle calling tracepoints from NMI so
>> >> that we don't run into issues that result from using sched-RCU when the
>> >> CPUs are marked to be offline.
>> >>
>> >> Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and
>> >> unify their usage")
>> >> Reported-by: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
>> >> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
>> >
>> >
>> > The above change log doesn't look like it matches the NMI patch.
>> >
>> > Can you resend with just the NMI changes? I already handled the cpu
>> > offline ones.
>>
>> Ok, sent with "cpu offline" changes dropped, here it is:
>> https://lore.kernel.org/patchwork/patch/972657/
>>
>> If you could add your Reported-by to it, that would be great as well.
>>
>> >
>> > But I may still have concerns with this patch.
>>
>> Ok let me know what you think.
>>
>
> Not sure you saw this part of my reply:
I missed the part on the srcu handles, sorry.
>
>> @@ -171,8 +174,7 @@ extern void syscall_unregfunc(void);
>> } while ((++it_func_ptr)->func); \
>> } \
>> \
>> - if (rcuidle) \
>> - srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
>> + srcu_read_unlock_notrace(ss, idx); \
>
> Hmm, why do we have the two different srcu handles?
Because if the memory operations happening on the normal SRCU handle
(during srcu_read_lock) is interrupted by NMI, then the other handle
(devoted to NMI) could be used instead and not bother the interrupted
handle. Does that makes sense?
When I talked to Paul few months ago about SRCU from NMI context, he
mentioned the per-cpu memory operations during srcu_read_lock can be
NMI interrupted, that's why we added that warning.
> Thinking about this, if this can be called by the "thunk" code, then
> there might be an issue. I think the "thunk" code can be called before
> in_nmi() is set, so we don't know if we are in an NMI or not. I need to
> look at that code to make sure. If in_nmi() still works, then we should
> use the _nmi srcu handle (if that's required).
>
> But I'm not sure using SRCU for all tracepoints is needed at this
> moment. I'll have to look deeper into this tomorrow. But it's getting
> close to the merge window, and this needs to be settled quick. Another
> "partial revert" may be needed until this gets settled.
I did read this part, yes I'm not sure either. You mentioned you would
confirm that in the morning, I look forward to it. I hope the in_nmi()
function is reliable to use from here.