Re: [PATCH 0/5] ftrace: to kill a daemon

From: Steven Rostedt
Date: Mon Aug 11 2008 - 15:28:59 EST




On Mon, 11 Aug 2008, Mathieu Desnoyers wrote:
>
> Hi Steven,
>
> I'm actually a bit worried about this scheduler-centric approach. The
> problem is that any trap that could be generated in the middle of the
> 3/2 nops (think of performance counters if the APIC is set as a trap
> gate) which would stack an interruptible trap handler on top of those
> instructions would lead to having a return IP pointing in the wrong
> spot, but because the scheduler would interrupt the trap handler and not
> the trapped code, it would not detect this.
>
> I am not sure wheter we do actually have the possibility to have these
> interruptible trap handlers considering the way the kernel sets up the
> gates, but I think the "let's figure out where the IP stopped" approach
> is a bit complex and fragile.
>
> Trying to find a fast atomic 5-bytes nop, involving the
> microarchitecture guys, seems like a better approach to me.
>

I agree that the fast atomic 5-byte nop is the optimal approach, but until
we have that, we need to do this.

I don't think this approach is too complex, I wrote the code in about an
hour, and the patch isn't that big. albeit my one bug, which was a CS101
type bug (pointer arithmetic), there wasn't anything to me that was
complex.

The thing is, I want this to be enabled in a production kernel. If there
is a 1 in a million chance that this theoretical bug with the trap
happening in a middle of the double nop, it only affects those that enable
the tracer. No one else. The code only happens when tracing is being
enabled.

If we need to put in a non optimal nop in, to replace the call to mcount,
this affects everyone. Anyone that has CONFIG_FTRACE on, and never plans
on running a trace.

If it comes down to slowing everyone down, against hitting a 1 in a
million theoretical bug, I'll take my chances on the bug. Again, that code
is only performed when tracing is being enabled. That is, when it converts
all 20 thousand nops into calls to a tracer function. If the crash
happens, it will only happen after tracing has started. Then we can easily
point the bug at the tracer.

IOW, I do not want to slow down Linus' box.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/