Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)

From: Masami Hiramatsu
Date: Thu Jun 07 2018 - 21:10:34 EST


On Wed, 6 Jun 2018 14:03:37 +0530
Ravi Bangoria <ravi.bangoria@xxxxxxxxxxxxx> wrote:

> Why RFC again:
>
> This series is different from earlier versions[1]. Earlier series
> implemented this feature in trace_uprobe while this has implemented
> the logic in core uprobe. Few reasons for this:
> 1. One of the major reason was the deadlock between uprobe_lock and
> mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
> because mm->mmap is not in control of trace_uprobe_mmap() and it has
> to take uprobe_lock to loop over trace_uprobe list. More details can
> be found at[2]. With this new approach, there are no deadlocks found
> so far.
> 2. Many of the core uprobe function and data-structures needs to be
> exported to make earlier implementation simple. With this new approach,
> reference counter logic is been implemented in core uprobe and thus
> no need to export anything.

I agree with you. Moreover, since uprobe_register/unregister() are
exported to modules, this enablement would better be implemented
inside uprobe so that all uprobe users benefit from this.

>
> Description:
>
> Userspace Statically Defined Tracepoints[3] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
>
> if (reference_counter > 0) {
> Execute marker instructions;
> }
>
> Default value of reference counter is 0. Tracer has to increment the
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
>
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by reference counter. Also, it's
> not easy to add reference counter logic in userspace tool like perf,
> so basic idea for this patchset is to add reference counter logic in
> the trace_uprobe infrastructure. Ex,[4]
>
> # cat tick.c
> ...
> for (i = 0; i < 100; i++) {
> DTRACE_PROBE1(tick, loop1, i);
> if (TICK_LOOP2_ENABLED()) {
> DTRACE_PROBE1(tick, loop2, i);
> }
> printf("hi: %d\n", i);
> sleep(1);
> }
> ...
>
> Here tick:loop1 is marker without reference counter where as tick:loop2
> is surrounded by reference counter condition.
>
> # perf buildid-cache --add /tmp/tick
> # perf probe sdt_tick:loop1
> # perf probe sdt_tick:loop2
>
> # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
> hi: 0
> hi: 1
> hi: 2
> ^C
> Performance counter stats for '/tmp/tick':
> 3 sdt_tick:loop1
> 0 sdt_tick:loop2
> 2.747086086 seconds time elapsed
>
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
>
> # ./perf buildid-cache --add /tmp/tick
> # ./perf probe sdt_tick:loop2
> # ./perf stat -e sdt_tick:loop2 /tmp/tick
> hi: 0
> hi: 1
> hi: 2
> ^C
> Performance counter stats for '/tmp/tick':
> 3 sdt_tick:loop2
> 2.561851452 seconds time elapsed
>
>
> Note:
> - 'reference counter' is called as 'semaphore' in original Dtrace
> (or Systemtap, bcc and even in ELF) documentation and code. But the
> term 'semaphore' is misleading in this context. This is just a counter
> used to hold number of tracers tracing on a marker. This is not haeally
> used for any synchronization. So we are referring it as 'reference
> counter' in kernel / perf code.

+1. I like this "reference counter" term :)

> - This patches still has one issue. If there are multiple instances of
> same application running and user wants to trace any particular
> instance, trace_uprobe is updating reference counter in all instances.
> This is not a problem on user side because instruction is not replaced
> with trap/int3 and thus user will only see samples from his interested
> process. But still this is more of a correctness issue. I'm working on
> a fix for this.

Hmm, it sounds like not a correctness issue, but there maybe a performace
tradeoff. Tracing one particulear instance, other instances also will get
a performance loss (Only if the parameter preparation block is heavy,
because the heaviest part of probing - trap/int3 and recording data - isn't
executed.)

BTW, why this happens? I thought the refcounter part is just a data which
is not shared among processes...

Thank you,


>
> [1] https://lkml.org/lkml/2018/4/17/23
> [2] https://lkml.org/lkml/2018/5/25/111
> [3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> [4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
>
> Ravi Bangoria (7):
> Uprobes: Simplify uprobe_register() body
> Uprobes: Support SDT markers having reference count (semaphore)
> Uprobes/sdt: Fix multiple update of same reference counter
> trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
> Uprobes/sdt: Prevent multiple reference counter for same uprobe
> Uprobes/sdt: Document about reference counter
> perf probe: Support SDT markers having reference counter (semaphore)
>
> Documentation/trace/uprobetracer.rst | 16 +-
> include/linux/uprobes.h | 5 +
> kernel/events/uprobes.c | 502 +++++++++++++++++++++++++++++++----
> kernel/trace/trace.c | 2 +-
> kernel/trace/trace_uprobe.c | 74 +++++-
> tools/perf/util/probe-event.c | 39 ++-
> tools/perf/util/probe-event.h | 1 +
> tools/perf/util/probe-file.c | 34 ++-
> tools/perf/util/probe-file.h | 1 +
> tools/perf/util/symbol-elf.c | 46 +++-
> tools/perf/util/symbol.h | 7 +
> 11 files changed, 643 insertions(+), 84 deletions(-)
>
> --
> 1.8.3.1
>


--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>