Re: [PATCH v4 8/9] perf header: Add field 'embed'
From: Howard Chu
Date: Thu Aug 08 2024 - 09:58:14 EST
Hello,
On Thu, Aug 8, 2024 at 7:52 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> On Wed, Aug 07, 2024 at 11:38:42PM +0800, Howard Chu wrote:
> > We have to save the embedded data's sample type for it to be consumed
> > correctly by perf script or perf report.
> >
> > This will approach most definitely break some perf.data convertor.
> >
> > Signed-off-by: Howard Chu <howardchu95@xxxxxxxxx>
> > ---
> > tools/perf/util/header.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> > index 55e9553861d0..d60e77d5c25c 100644
> > --- a/tools/perf/util/header.c
> > +++ b/tools/perf/util/header.c
> > @@ -80,6 +80,7 @@ const char perf_version_string[] = PERF_VERSION;
> >
> > struct perf_file_attr {
> > struct perf_event_attr attr;
> > + __u64 embed;
>
> Can we just set bpf_output.attr correctly and get rid of this?
"I think this is ugly as well. I believe I can sneak sample_type_embed
into struct perf_event_attr without side effects, but some hacks are
needed. Firstly, the __u64 inherit field can be used to store __u64
sample_type_embed. If BPF output can be collected, we can guarantee
that inherit is 0 (remember when parsing the off-CPU time event, we
used bpf-output/no-inherit=1,name=%s/), so we are certain about its
value. And since it's also a u64, sample_type_embed will fit.
Additionally, it isn't used much in the userspace perf tool (nothing
related to parsing and visualization), so maybe after
perf_event_open(), we can overwrite it."
But that will lead to another question of mine, do you think we should
make this 'sample_type_embed' thing only for off-cpu, or there's a
possibility for scaling? The reason why I ask this question is because
there are things that's not API-ish in this patch, such as overwriting
the 'inherit' for sample_type_embed, it's clearly a hack. I don't have
enough foresight on this matter. Could you please provide some
insight? This could impact the direction of my programming. For
example, if it's only used by off-cpu, the implementation could likely
become much simpler.
Thanks,
Howard
>
> Thanks,
> Namhyung
>
>
> > struct perf_file_section ids;
> > };
> >
> > @@ -3713,6 +3714,7 @@ static int perf_session__do_write_header(struct perf_session *session,
> > }
> > f_attr = (struct perf_file_attr){
> > .attr = evsel->core.attr,
> > + .embed = evsel->sample_type_embed,
> > .ids = {
> > .offset = evsel->id_offset,
> > .size = evsel->core.ids * sizeof(u64),
> > @@ -4147,6 +4149,14 @@ static int read_attr(int fd, struct perf_header *ph,
> >
> > ret = readn(fd, ptr, left);
> > }
> > +
> > + ret = readn(fd, &f_attr->embed, sizeof(f_attr->embed));
> > + if (ret <= 0) {
> > + pr_debug("failed to read %d bytes of embedded sample type\n",
> > + (int)sizeof(f_attr->embed));
> > + return -1;
> > + }
> > +
> > /* read perf_file_section, ids are read in caller */
> > ret = readn(fd, &f_attr->ids, sizeof(f_attr->ids));
> >
> > @@ -4272,6 +4282,8 @@ int perf_session__read_header(struct perf_session *session, int repipe_fd)
> > tmp = lseek(fd, 0, SEEK_CUR);
> > evsel = evsel__new(&f_attr.attr);
> >
> > + evsel->sample_type_embed = f_attr.embed;
> > +
> > if (evsel == NULL)
> > goto out_delete_evlist;
> >
> > --
> > 2.45.2
> >