Re: [PATCH 1/6] perf: Add new type PERF_TYPE_PROBE

From: Peter Zijlstra
Date: Thu Nov 23 2017 - 05:22:26 EST


On Wed, Nov 15, 2017 at 09:23:33AM -0800, Song Liu wrote:
> A new perf type PERF_TYPE_PROBE is added to allow creating [k,u]probe
> with perf_event_open. These [k,u]probe are associated with the file
> decriptor created by perf_event_open, thus are easy to clean when
> the file descriptor is destroyed.
>
> Struct probe_desc and two flags, is_uprobe and is_return, are added
> to describe the probe being created with perf_event_open.

> ---
> include/uapi/linux/perf_event.h | 35 +++++++++++++++++++++++++++++++++--
> 1 file changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 362493a..cc42d59 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -33,6 +33,7 @@ enum perf_type_id {
> PERF_TYPE_HW_CACHE = 3,
> PERF_TYPE_RAW = 4,
> PERF_TYPE_BREAKPOINT = 5,
> + PERF_TYPE_PROBE = 6,

Not required.. these fixed types are mostly legacy at this point.

>
> PERF_TYPE_MAX, /* non-ABI */
> };
> @@ -299,6 +300,29 @@ enum perf_event_read_format {
> #define PERF_ATTR_SIZE_VER4 104 /* add: sample_regs_intr */
> #define PERF_ATTR_SIZE_VER5 112 /* add: aux_watermark */
>
> +#define MAX_PROBE_FUNC_NAME_LEN 64
> +/*
> + * Describe a kprobe or uprobe for PERF_TYPE_PROBE.
> + * perf_event_attr.probe_desc will point to this structure. is_uprobe
> + * and is_return are used to differentiate different types of probe
> + * (k/u, probe/retprobe).
> + *
> + * The two unions should be used as follows:
> + * For uprobe: use path and offset;
> + * For kprobe: if func is empty, use addr
> + * if func is not emtpy, use func and offset
> + */
> +struct probe_desc {
> + union {
> + __aligned_u64 func;
> + __aligned_u64 path;
> + };
> + union {
> + __aligned_u64 addr;
> + __u64 offset;
> + };
> +};
> +
> /*
> * Hardware event_id to monitor via a performance monitoring event:
> *
> @@ -320,7 +344,10 @@ struct perf_event_attr {
> /*
> * Type specific configuration information.
> */
> - __u64 config;
> + union {
> + __u64 config;
> + __u64 probe_desc; /* ptr to struct probe_desc */
> + };
>
> union {
> __u64 sample_period;
> @@ -370,7 +397,11 @@ struct perf_event_attr {
> context_switch : 1, /* context switch data */
> write_backward : 1, /* Write ring buffer from end to beginning */
> namespaces : 1, /* include namespaces data */
> - __reserved_1 : 35;
> +
> + /* For PERF_TYPE_PROBE */
> + is_uprobe : 1, /* 0: kprobe, 1: uprobe */
> + is_return : 1, /* 0: probe, 1: retprobe */
> + __reserved_1 : 33;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */


So I hate this... there's so much that doesn't make sense.. not in the
least that __align_u64 fixation. Who cares about that?

Why does probe_desc exist? You could have overloaded config{1,2}
further, just like the breakpoint crap did.

And the extra flags seem misplaced too, why are they not config?

You could have simply registered two PMU types:

perf_pmu_register(&perf_uprobe, "uprobe", -1);
perf_pmu_register(&perf_kprobe, "kprobe", -1);

Where perf_{u,k}probe differ only in the init function.

Then for uprobe you use config1 as pointer to a path string and config2
as offset. And for kprobe you use config1 as function string pointer and
config2 as either address or offset.

This leaves you config in both cases to stuff further modifiers, like
for example the is_return thing for kprobes.