Re: Instrumentation and RCU

From: Masami Hiramatsu
Date: Tue Mar 10 2020 - 20:18:38 EST


Hi Mathieu,

On Tue, 10 Mar 2020 12:21:31 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:

> ----- On Mar 10, 2020, at 11:46 AM, rostedt rostedt@xxxxxxxxxxx wrote:
>
> > On Tue, 10 Mar 2020 11:31:51 -0400 (EDT)
> > Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> >
> >> I think there are two distinct problems we are trying to solve here,
> >> and it would be good to spell them out to see which pieces of technical
> >> solution apply to which.
> >>
> >> Problem #1) Tracer invoked from partially initialized kernel context
> >>
> >> - Moving the early/late entry/exit points into sections invisible from
> >> instrumentation seems to make tons of sense for this.
> >>
> >> Problem #2) Tracer recursion
> >>
> >> - I'm much less convinced that hiding entry points from instrumentation
> >> works for this. As an example, with the isntr_begin/end() approach you
> >> propose above, as soon as you have a tracer recursing into itself because
> >> something below do_stuff() has been instrumented, having hidden the entry
> >> point did not help at all.
> >>
> >> So I would be tempted to use the "hide entry/exit points" with explicit
> >> instr begin/end annotation to solve Problem #1, but I'm still thinking there
> >> is value in the per recursion context "in_tracing" flag to prevent tracer
> >> recursion.
> >
> > The only recursion issue that I've seen discussed is breakpoints. And
> > that's outside of the tracer infrastructure. Basically, if someone added a
> > breakpoint for a kprobe on something that gets called in the int3 code
> > before kprobes is called we have (let's say rcu_nmi_enter()):
> >
> >
> > rcu_nmi_enter();
> > <int3>
> > do_int3() {
> > rcu_nmi_enter();
> > <int3>
> > do_int3();
> > [..]
> >
> > Where would a "in_tracer" flag help here? Perhaps a "in_breakpoint" could?
>
> An approach where the "in_tracer" flag is tested and set by the instrumentation
> (function tracer, kprobes, tracepoints) would work here. Let's say the beginning
> of the int3 ISR is part of the code which is invisible to instrumentation, and
> before we issue rcu_nmi_enter(), we handle the in_tracer flag:
>
> rcu_nmi_enter();
> <int3>
> (recursion_ctx->in_tracer == false)
> set recursion_ctx->in_tracer = true
> do_int3() {
> rcu_nmi_enter();
> <int3>
> if (recursion_ctx->in_tracer == true)
> iret
>
> We can change "in_tracer" for "in_breakpoint", "in_tracepoint" and
> "in_function_trace" if we ever want to allow different types of instrumentation
> to nest. I'm not sure whether this is useful or not through.

Kprobes already has its own "in_kprobe" flag, and the recursion path is
not so simple. Since the int3 replaces the original instruction, we have to
execute the original instruction with single-step and fixup.

This means it involves do_debug() too. Thus, we can not do iret directly
from do_int3 like above, but if recursion happens, we have no way to
recover to origonal execution path (and call BUG()).

As my previous email, I showed a patch which is something like
"bust_kprobes()" for oops path. That is not safe but no other way to escape
from this recursion hell. (Maybe we can try to call it instead of calling
BUG() so that the kernel can continue to run, but I'm not sure we can
safely make the pagetable to readonly again.)

Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>