Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
From: Masami Hiramatsu
Date: Wed Mar 01 2017 - 01:41:41 EST
On Tue, 28 Feb 2017 16:26:19 +0530
Ravi Bangoria <ravi.bangoria@xxxxxxxxxxxxxxxxxx> wrote:
> Thanks Masami for the review.
> Please find my replies to your comments.
>
> On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote:
> > On Fri, 24 Feb 2017 13:13:25 +0530
> > Ravi Bangoria <ravi.bangoria@xxxxxxxxxxxxxxxxxx> wrote:
> >
> >> After invoking 'perf record', behind the scenes, it checks whether
> >> the event specified is an SDT event by using the flag '%' or string
> >> 'sdt_'. After that, it first checks whether event already exists
> >> with that *name* in uprobe_events file. If found, it records that
> >> particular event. Otherwise, it does a lookup of the probe cache to
> >> find out the SDT event. If its not present, it throws an error.
> >> If found, it again tries to find existing events from uprobe_events
> >> file, but this time it uses *address* and *filename* for comparison.
> >> Finally it writes new events into the uprobe_events file and starts
> >> recording. It also maintains a list of the event names that were
> >> written to uprobe_events file for this session. After finishing the
> >> record session, it removes the events from the uprobe_events file
> >> using the maintained name list.
> > OK, the behavior looks good. However, the implementation seems
> > too complex, could you make it simpler?
> > I have some comments on the code, see below.
>
> Yes, it's complex code. It's all because of multiple SDT events with same name
> exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- which
> creates complex situations that has to be taken care. See my replies below for
> such examples.
OK, but in that case, could you split those support for review?
E.g. the first one just combine perf-probe and perf-record and
second one support multiple SDT events.
>
> >> +/*
> >> + * Remove ith tev from pev->tevs list and shift remaining
> >> + * tevs(i+1 to pev->ntevs) one step.
> >> + */
> >> +static void shift_pev(struct perf_probe_event *pev, int i)
> >> +{
> >> + int j;
> >> +
> >> + free(pev->tevs[i].event);
> >> + free(pev->tevs[i].group);
> >> + free(pev->tevs[i].args);
> >> + free(pev->tevs[i].point.realname);
> >> + free(pev->tevs[i].point.symbol);
> >> + free(pev->tevs[i].point.module);
> >> +
> >> + /*
> >> + * If ith element is last element, no need to shift,
> >> + * just decrement pev->ntevs.
> >> + */
> >> + if (i == pev->ntevs - 1)
> >> + goto ret;
> >> +
> >> + for (j = i; j < pev->ntevs - 1; j++) {
> >> + pev->tevs[j].event = pev->tevs[j + 1].event;
> >> + pev->tevs[j].group = pev->tevs[j + 1].group;
> >> + pev->tevs[j].nargs = pev->tevs[j + 1].nargs;
> >> + pev->tevs[j].uprobes = pev->tevs[j + 1].uprobes;
> >> + pev->tevs[j].args = pev->tevs[j + 1].args;
> >> + pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname;
> >> + pev->tevs[j].point.symbol = pev->tevs[j + 1].point.symbol;
> >> + pev->tevs[j].point.module = pev->tevs[j + 1].point.module;
> >> + pev->tevs[j].point.offset = pev->tevs[j + 1].point.offset;
> >> + pev->tevs[j].point.address = pev->tevs[j + 1].point.address;
> >> + pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe;
> >> + }
> >> +
> >> +ret:
> >> + pev->ntevs--;
> >> +}
> > This code smells no good...
> > Why don't you make a list of newly added probes ?
> >
>
> Yes. This is little ugly. Consider a following scenario.
>
> $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
> Provider: libpthread
> Name: mutex_entry
> Location: 0x0000000000009ddb, Base: 0x00000000000139cc, ...
> --
> Provider: libpthread
> Name: mutex_entry
> Location: 0x000000000000bcbb, Base: 0x00000000000139cc, ...
>
> $ perf probe sdt_libpthread:mutex_entry
>
> $ cat /sys/kernel/debug/tracing/uprobe_events
> p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x0000000000009ddb
> p:sdt_libpthread/mutex_entry_1 /usr/lib64/libpthread-2.24.so:0x000000000000bcbb
>
> $ perf probe -d sdt_libpthread:mutex_entry
>
> $ perf record -a -e sdt_libpthread:mutex_entry
> /* This should record:
> sdt_libpthread/mutex_entry <== New entry (delete at the end of session)
> sdt_libpthread/mutex_entry_1 <== Reuse (do not delete at the end of session)
> */
Hmm, OK. but this also not becomes the reason why you need 2 different
lists. After calling apply_perf_probe_events(), pev->tevs have been
updated, which have applied names. So you can pull it from the trace_event.
>
> First of all, perf looks into uprobe_events for entry with sdt_libpthread:mutex_entry
> *name*. As it's not present, perf does a lookup in probe-cache and gets two entries.
> Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events entries
> with entries found from probe-cache with *filename* and *address* as comparison
> parameter (because this combo makes unique identifier for any SDT event in a
> particular system - please correct me if that's wrong). And perf finds one entry
> sdt_libpthread/mutex_entry_1.
>
> Keeping this entry in 'pevs->tevs' and calling apply_perf_probe_events() will add
> new entry for that event even if it exists. So this code removes that particular
> element of an array and shift all remaining elements one step.
>
> Please suggest if you have a better approach.
As I described above, you can get actual name of newly added event after calling
apply_perf_probe_events(). (if not, please feel free to update it to do so.)
>
> >> +
> >> +/* Compare address and filename */
> >> +static bool is_sdt_match(struct probe_trace_event *tev1,
> >> + struct probe_trace_event *tev2)
> >> +{
> >> + return (tev1->point.address == tev2->point.address &&
> >> + !(strcmp(tev1->point.module, tev2->point.module)));
> >> +}
> >> +
> ...
> >> + probe_conf.max_probes = MAX_PROBES;
> >> + probe_conf.force_add = 1;
> >> +
> >> + /*
> >> + * Find the sdt event from the cache. We deliberately check failure
> >> + * of this function after checking entries in uprobe_events. Because,
> >> + * we may find matching entry from uprobe_events and in that case we
> >> + * should continue recording that event.
> >> + */
> >> + pev->ntevs = find_sdt_events_from_cache(pev, &pev->tevs);
> >> +
> >> + /* Prepare list of existing sdt events from uprobe_events */
> >> + INIT_LIST_HEAD(&esl.list);
> >> + ret = probe_file__get_exst_sdt_list(&esl);
> >> + if (ret < 0)
> >> + goto free_str;
> >> +
> >> + /* If there is entry with the same name in uprobe_events, record it. */
> >> + found = probe_file__add_exst_sdt_event(&esl, sdt_events, pev);
> >> + if (found) {
> >> + ret = (found > 0) ? 0 : found;
> >> + goto free_str;
> >> + }
> > Curious, this should be done before searching cached SDT, so that
> > we can just skip listing up existing probes.
>
> Yes that's intentional. We warn user if event with exact name found from
> uprobe_events and there are multiple instances of that event found from
> probe-cache. For example,
>
> $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
> Provider: libpthread
> Name: mutex_entry
> Location: 0x0000000000009ddb, Base: 0x00000000000139cc, ...
> --
> Provider: libpthread
> Name: mutex_entry
> Location: 0x000000000000bcbb, Base: 0x00000000000139cc, ...
>
> $ cat /sys/kernel/debug/tracing/uprobe_events
> p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x0000000000009ddb
>
> $ perf record -a -e sdt_libpthread/mutex_entry
> Warning: Found 2 events from probe-cache with name 'sdt_libpthread:mutex_entry'.
> Since probe point already exists with this name, recording only 1 event.
> Hint: Please use 'perf probe -d sdt_libpthread:mutex_entry*' to allow record on all events.
OK, but could you split this out to another patch?
>
> >> +
> >> + /* Reaching here means no matching entry found in uprobe_events. */
> >> + filter = filter_exst_sdt_events_from_pev(pev, &esl);
> >> + if (!filter && pev->ntevs == 0) {
> >> + pr_err("%s:%s not found in the cache\n", pev->group,
> >> + pev->event);
> >> + ret = -EINVAL;
> >> + goto free_str;
> >> + } else if (pev->ntevs < 0) {
> >> + err->str = strdup("Cache lookup failed.\n");
> >> + ret = pev->ntevs;
> >> + goto free_str;
> >> + }
> > Again, why is the esl needed here? We can remove events by using it's name.
> >
> > What I expected was,
> >
> > 0) add "generated_sdts" in record structure.
> > 1) search existing events by using its name from tracing/events/.
> > (Don't need to touch uprobe_events)
>
> I didn't get your point to look at tracing/events/. It's easy to look into
> uprobe_events, because perf has a builtin functions that easily transform
> entries of uprobe_events to 'tevs'. And uprobe_events has all data like
> eventname, filename, address which are crucial for this implementation.
>
> > 2) if it does not exist, add new probe points and store its name in "generated_sdts".
>
> This step is little difficult. As I mentioned earlier, major issue is, event
> exists in uprobe_events but with different name. In such case I've to
> reuse existing entries and also have to add new entries for events
> which are absent.
OK, I see. But could you break this patch into a series? At first, we just
need a name-based event setting, and checking event definition should be done
in following patches in the series. One patch should solve one thing, that
help us to review it smoother.
Thank you,
>
> > 3) record events
> > 4) remove the events by using "generated_sdts" which only has the event name.
> >
>
> Let me know if I've misunderstood any of your point.
>
> Thanks,
> -Ravi
>
--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>