Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() andfixes crash bugs

From: Masami Hiramatsu
Date: Fri Dec 06 2013 - 01:13:04 EST


(2013/12/05 23:49), Frank Ch. Eigler wrote:
>
> Hi, Masami -
>
> masami.hiramatsu.pt wrote:
>
>> [...]
>> For the safeness of kprobes, I have an idea; introduce a whitelist
>> for dynamic events. AFAICS, the biggest unstable issue of kprobes
>> comes from putting *many* probes on the functions called from tracers.
>
> Why do you think so?

Oh, because I actually hit this problem when enabling kprobe-events
on every *ftrace-related* functions(ring buffer, trace filter etc.)
It doesn't crash the kernel but it slows down the machine very much.
And finally I have to reboot it forcibly. But when I just enables a
few probes on those functions, the system has no problem.

In this case, almost probes are miss-hit because of recursion, but
anyway each miss-hit involves int3/debug interrupts and it increases
the processing time of one event handling by ftrace as below.

1. hit a kprobe outside of ftrace
2. kprobe calls event handler
3. the event handler calls ftrace-related functions to reserve
buffer, check filter, commit buffer etc.
3-1. each ftrace/ringbuffer function hits a kprobe
3-2. the kprobe detect recursion and just do single-step and return
4. do single stepping
5. return from kprobe

Note that all the problem happens inside the event handler.

> We have had problems with single kprobes in the
> "wrong" spot. The main reason I showed spraying them widely is to get
> wide coverage with minimal information/effort, not to suggest that the
> number of concurrent probes per se is a problem. (We have had
> systemtap scripts probing some areas of the kernel with thousands of
> active kprobes, e.g. for statement-by-statement variable-watching
> jobs, and these have worked fine.)

Ah, sorry for confusion. Agreed. I just tried to explain that kprobes
can cause a performance problem under *very specific* operation.
So the whitelist is just for keeping people away from it.

>> It doesn't crash the kernel but slows down so much, because every
>> probes hit many other nested miss-hit probes.
>
> (kprobes does have code to detect & handle reentrancy.)

Right. :)

>> This gives us a big performance impact. [...]
>
> Sure, but I'd expect to see pure slowdowns show their impact with
> time-related problems like watchdogs firing or timeouts.

I doubt it can cause, because each probe processing time is
still small enough to slip through the watchdog.

>> [...] Then, I'd like to propose this new whitelist feature in
>> kprobe-tracer (not raw kprobe itself). And a sysctl knob for
>> disabling the whitelist. That knob will be
>> /proc/sys/debug/kprobe-event-whitelist and disabling it will mark
>> kernel tainted so that we can check it from bug reports.
>
> How would one assemble a reliable whitelist, if we haven't fully
> characterized the problems that make the blacklist necessary?

As I said, we can use function graph tracer's list as the whitelist,
since it doesn't include any functions invoked from ftrace's event
handler. (Note that I don't mention the Systemtap or other user here)

Whitelist is just for keeping the people away from the quantitative
issue, who just want to trace their subsystems except for ftrace.
For example, such people may try to probe every functions
(e.g. perf probe --add '* $vars' : actually this is why I don't
release wildcard support on perf probe yet).
Of course I can implement the whitelist feature in perf probe only,
that will allow me to support wildcard on perf probe. :)

For the long term solution, I think we can introduce some kind of
performance gatekeeper as systemtap does. Counting the miss-hit rate
per second and if it go over a threshold, disable next miss-hit (or
most miss-hit) probe (as OOM killer does).

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@xxxxxxxxxxx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/