Re: [PATCH v4] [RFC] trace: Add kprobe on tracepoint

From: Masami Hiramatsu
Date: Wed Aug 11 2021 - 21:27:40 EST


Hi Steve,

On Wed, 11 Aug 2021 11:22:49 -0400
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Thu, 12 Aug 2021 00:03:43 +0900
> Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> > On Wed, 11 Aug 2021 17:14:33 +0300
> > "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@xxxxxxxxx> wrote:
> >
> > > From: Tzvetomir Stoyanov (VMware) <tz.stoyanov@xxxxxxxxx>
> > >
> > > A new dynamic event is introduced: event probe. The event is attached
> > > to an existing tracepoint and uses its fields as arguments. The user
> > > can specify custom format string of the new event, select what tracepoint
> > > arguments will be printed and how to print them.
> > > An event probe is created by writing configuration string in
> > > 'dynamic_events' ftrace file:
> > > e:GNAME/ENAME SYSTEM.EVENT [FETCHARGS] - Set an event probe
> >
> > Hmm, this inconsistency looks not good to me.
>
> Hi Masami,
>
> Thanks for reviewing this.
>
> >
> > "GNAME/ENAME" "SYSTEM.EVENT"
> > - GNAME is "group name" but SYSTEM is "system name" but both must point
> > same idea.
> > - Delimiter character is different.
> >
> > Aren't these confusing users?
>
> I agree. It was basically copying code from the histogram logic that
> uses '.', and that's how it got mixed up. We should change it to be
> more consistent.
>
> >
> > One idea is adding a patch for kprobe and uprobe events to accept new
> > delimiter '.'. This will give a consistency with hist triggers too.
>
> I think the above is good. We can make it work with both '/' and '.'.
>
> Would that work for you?

Yes, it works for me.

> > Also, you can add a note about that the system and group is same
> > meaning in events.
> >
> > > -:GNAME/ENAME - Delete an event probe
> > >
> > > Where:
> > > GNAME - Group name, if omitted 'eprobes' is used.
> >
> > If this is not mandatory, you should write it as
> >
> > e:[GNAME/]ENAME SYSTEM.EVENT [FETCHARGS]
>
> Good point.
>
> >
> >
> > > ENAME - Name of the new event in GNAME, mandatory.
> > > SYSTEM - Name of the system, where the tracepoint is defined, mandatory.
> > > EVENT - Name of the tracepoint event in SYSTEM, mandatory.
> > > FETCHARGS - Arguments:
> > > <name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print
> > > it as given TYPE with given name. Supported
> > > types are:
> > > (u8/u16/u32/u64/s8/s16/s32/s64), basic type
> > > (x8/x16/x32/x64), hexadecimal types
> > > "string", "ustring" and bitfield.
> > >
> > > Example, attach an event probe on openat system call and print name of the
> > > file that will be opened:
> > > echo "e:esys/eopen syscalls.sys_enter_openat file=\$filename:string" >> dynamic_events
> > > A new dynamic event is created in events/esys/eopen/ directory. It
> > > can be deleted with:
> > > echo "-:esys/eopen" >> dynamic_events
> > >
> > > Filters, triggers and histograms can be attached to the new event, it can
> > > be matched in synthetic events. There is one limitation - no synthetic
> > > events can be attached to an event probe.
> >
> > I'm not sure what the "no synthetic events can be attached to an event probe"
> > means.
> > Can you show an example command what this means?
>
> Basically, we can not do:
>
> echo 'my_open pid_t pid; unsigned long file' > synthetic_events
>
> echo 'e:myopen_ret syscalls.sys_exit_open ret=$ret' > dynamic_events
>
> echo 'hist:keys=common_pid:file=filename' > events/syscalls/sys_enter_open/trigger
> echo 'hist:keys=common_pid:file=$file:onmatch(eprobes.myopen).trace(my_open,common_pid,$file)' > events/eprobes/myopen_ret

(Maybe 'onmatch(syscalls.sys_enter_open)' ?)

> The above attaches a trace call to a synthetic event from the
> myopen_ret event. The reason we do not allow it, is because we want
> eprobes to attach to synthetic events (because they can convert the
> types and do more to them), but we do not allow eprobes to attach to
> eprobes.

Let me confirm this, so eprobes can be attached to synthetic event?
IMHO, I rather like to prevent attaching eprobe_event on the other
dynamic events. It makes hard to check when removing the base dynamic
events...

For the above example, we can rewrite it as below to trace filename
without attaching eprobe_events on the synthetic event.

echo 'my_open pid_t pid; char file[]' > synthetic_events

echo 'e:myopen syscalls.sys_enter_open file=+0($filename):ustring' > dynamic_events
echo 'e:myopen_ret syscalls.sys_exit_open ret=$ret' > dynamic_events

echo 'hist:keys=common_pid:fname=file' > events/eprobes/myopen/trigger
echo 'hist:keys=common_pid:fname=$fname:onmatch(eprobes.myopen).trace(my_open,common_pid,$fname)' > events/eprobes/myopen_ret

The point is that type converting or dereference pointer should be
done at the base event level, instead of synthetic event level.

> But if we allow synthetic events to attach to an eprobe, then
> we can have:
>
> eprobe -> synthetic_event -> eprobe -> synthetic_event -> eprobe ...
>
> Which can be dangerous, and hard to find loops.

Yes, so the discussion point is which link (->) we should reject.

- eprobe -> synthetic_event

or

- synthetic_event -> eprobe

I like to prohibit latter one. It is my feeling, but I think it is
natural that the eprobe is only for the static events, and I also think
dereferencing a pointer-type field in raw-event is more reliable than
dereferencing a digit value passed to the synthetic event.

> Currently, we don't see any real use case to allow synthetic events to
> attach to an eprobe, but if there is, we can change the code to allow
> it, but we need to keep track of how many are attached, and limit them,
> and we need to find a way to check for loops.

Yes, anyway we need a way to find loops on histogram/eprobe at last.

>
> >
> > >
> > > Co-developed-by: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
> > > Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@xxxxxxxxx>
> > > ---
> > > Subject of the patch is still related to kprobe, though the design
> > > no more depends on kprobe. I did not change it for consistency with
> > > the first patch version.
> >
> > OK, thanks for moving onto the dynevent. Let me check the code in
> > another mail.
> > Anyway, I think this is good starting point.
> >
>
> Thanks for your time at looking at this. We greatly appreciate it!

You're welcome!

Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>