Re: [PATCH] perf timechart: dynamically determine event data offset

From: Arnaldo Carvalho de Melo
Date: Wed Nov 27 2013 - 08:44:45 EST


Em Wed, Nov 27, 2013 at 05:49:02PM +0900, Namhyung Kim escreveu:
> On Tue, 26 Nov 2013 18:54:37 +0400, Stanislav Fomichev wrote:
> > Since b000c8065a92 "tracing: Remove the extra 4 bytes of padding in events"
> > removed padding bytes, perf timechart got out of sync with the kernel's
> > trace_entry structure.

> > We can't just align perf's trace_entry definition with the kernel because we
> > want timechart to continue working with old perf.data. Instead, we now
> > calculate event data offset dynamically using offset of first non-common
> > event field in the perf.data.

> [SNIP]
> > @@ -304,13 +322,11 @@ struct trace_entry {
> > unsigned char flags;
> > unsigned char preempt_count;
> > int pid;
> > - int lock_depth;
> > };

> I had no chance to look into the timechart code in detail, but this is
> not good. The format of each trace event (so the struct trace_entry)
> was described in the format file under the event directory on debugfs.
> For cpu_frequency event, I get the following:

> $ cat /sys/kernel/debug/tracing/events/power/cpu_frequency/format
> name: cpu_frequency
> ID: 315
> format:
> field:unsigned short common_type; offset:0; size:2; signed:0;
> field:unsigned char common_flags; offset:2; size:1; signed:0;
> field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
> field:int common_pid; offset:4; size:4; signed:1;

> field:u32 state; offset:8; size:4; signed:0;
> field:u32 cpu_id; offset:12; size:4; signed:0;

> print fmt: "state=%lu cpu_id=%lu", (unsigned long)REC->state, (unsigned long)REC->cpu_id

> So it's not same as above struct trace_entry even with your change.

It is... Albeit working just with patches is good and dandy and I even
was discussing with Jiri how patches should be clean and small, you're
not seeing the whole picture, see what is at the line just before
"unsigned char flags":

struct trace_entry {
unsigned short type;
unsigned char flags;
unsigned char preempt_count;
int pid;
int lock_depth;
};

Looking with pahole to see the offsets on a 64-bit arch:

[acme@ssdandy linux]$ pahole -C trace_entry /tmp/build/perf/builtin-timechart.o
struct trace_entry {
short unsigned int type; /* 0 2 */
unsigned char flags; /* 2 1 */
unsigned char preempt_count; /* 3 1 */
int pid; /* 4 4 */
int lock_depth; /* 8 4 */

/* size: 12, cachelines: 1, members: 5 */
/* last cacheline: 12 bytes */
};
[acme@ssdandy linux]$
[acme@ssdandy linux]$ uname -a
Linux ssdandy 3.12.0-rc6+ #2 SMP Mon Oct 21 11:19:07 BRT 2013 x86_64 x86_64 x86_64 GNU/Linux

And now on a 32-bit arch:

acme@linux-goap:~> pahole -C trace_entry /tmp/build/perf/builtin-timechart.o
struct trace_entry {
short unsigned int type; /* 0 2 */
unsigned char flags; /* 2 1 */
unsigned char preempt_count; /* 3 1 */
int pid; /* 4 4 */
int lock_depth; /* 8 4 */

/* size: 12, cachelines: 1, members: 5 */
/* last cacheline: 12 bytes */
};
acme@linux-goap:~> uname -a
Linux linux-goap 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May 31 20:21:23 UTC 2013 (97c14ba) i686 i686 i386 GNU/Linux
acme@linux-goap:~>

Same signature, 32-bit, 64-bit userland, so whoever wrote timechart, Arjan, I
think, made no mistakes at using the kernel exported interface, choosing the
most efficient way to extract the data, casting to a struct.

Yeah, using the interface that searches for a field name to get the offset and
then add it to a pointer to then cast to the type will allow maximum flexibility,
but is not really efficient.

Doing that for something that is not performance critical (probably) as
timechart probably is not a problem, but so is not a problem using the patch
that does the cast after finding the offset of the first non-common field.

Having said that, I'll take the latest patch, using perf_evsel__intval et all,
if Namhyung sees no further problems with it.

BTW: in 'perf trace' I recently did some work to cache the offsets on a per evsel
private area, so that the string lookup is not done anymore in the raw_syscalls
tracepoint processing.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/