Re: [RFC PATCH] perf script python: Allow reporting [un]throttle

From: Arnaldo Carvalho de Melo
Date: Wed Sep 01 2021 - 09:32:06 EST


Em Tue, Aug 31, 2021 at 02:47:47PM -0700, Stephen Brennan escreveu:
> On 8/31/21 11:46 AM, Jiri Olsa wrote:
> > On Mon, Aug 16, 2021 at 05:21:33PM -0700, Stephen Brennan wrote:
> > > perf_events may sometimes throttle an event due to creating too many
> > > samples during a given timer tick. As of now, the perf tool will not
> > > report on throttling, which means this is a silent error. Implement a
> > > callback for the throttle and unthrottle events within the Python
> > > scripting engine, which can allow scripts to detect and report when
> > > events may have been lost due to throttling.

> > > A python script could simply define throttle() and unthrottle()
> > > functions to begin receiving them, e.g.:

> > > ```
> > > from __future__ import print_function

> > > def process_event(param_dict):
> > > print("event cpu={} time={}".format(
> > > param_dict["sample"]["cpu"], param_dict["sample"]["time"]))

> > > def throttle(*args):
> > > print("throttle(time={}, cpu={}, pid={}, tid={})".format(*args))

> > > def unthrottle(*args):
> > > print("unthrottle(time={}, cpu={}, pid={}, tid={})".format(*args))
> > > ```

> > throttle event has also 'id' and 'stream_id' I guess you don't
> > need it, but maybe we should add it to be complete

> I tried adding these from struct perf_sample.id and struct
> perf_sample.sample_id respectively. I then tested these on a perf.data which
> contains a throttle event. The values which my python script reported were
> 0xFFFFFFFFFFFFFFFF -- is this expected? If so, I'll send my revised patch.

So, perf_sample.X is only defined if PERF_SAMPLE_X is in
perf_event_attr.sample_type, and for the perf.data file below it is:

sample_type = IP|TID|TIME

So only perf_sample.ip, perf_sample.tid and perf_sample.time are set,
not perf_sample.id nor perf_sample.stream_id.

What Jiri suggests is to use the _fixed_ payload for
PERF_SAMPLE_THROTTLE/UNTHROTTLE, which is, according to
include/uapi/linux/perf_event.h:

* struct {
* struct perf_event_header header;
* u64 time; <<<<
* u64 id; <<<<
* u64 stream_id; <<<<
* struct sample_id sample_id;
* };
*/
PERF_RECORD_THROTTLE = 5,
PERF_RECORD_UNTHROTTLE = 6,


The 'time', 'id' and 'stream_id' should always be there, while what is
in 'sample_id' is documented in the same file at the beginning of the
'enum perf_event_type' definition:

* If perf_event_attr.sample_id_all is set then all event types will
* have the sample_type selected fields related to where/when
* (identity) an event took place (TID, TIME, ID, STREAM_ID, CPU,
* IDENTIFIER) described in PERF_RECORD_SAMPLE below, it will be stashed
* just after the perf_event_header and the fields already present for
* the existing fields, i.e. at the end of the payload. That way a newer
* perf.data file will be supported by older perf tools, with these new
* optional fields being ignored.
*
* struct sample_id {
* { u32 pid, tid; } && PERF_SAMPLE_TID
* { u64 time; } && PERF_SAMPLE_TIME
* { u64 id; } && PERF_SAMPLE_ID
* { u64 stream_id;} && PERF_SAMPLE_STREAM_ID
* { u32 cpu, res; } && PERF_SAMPLE_CPU
* { u64 id; } && PERF_SAMPLE_IDENTIFIER
* } && perf_event_attr::sample_id_all
*
* Note that PERF_SAMPLE_IDENTIFIER duplicates PERF_SAMPLE_ID. The
* advantage of PERF_SAMPLE_IDENTIFIER is that its position is fixed
* relative to header.size.

There are artifacts here from the evolution of the perf_event subsystem,
i.e. throttle.id is equal to throttle.sample-id.id when PERF_SAMPLE_ID
is set in perf_event_attr.sample_type, ditto for 'time' and 'stream_id'.
entries.

But since 'time', 'id' and 'stream_id' are _always_ set in the
PERF_RECORD_THROTTLE event, we can as well make them available via perf
script python.

Take a look at ./kernel/events/core.c, function perf_log_throttle(), the
only one in the kernel that emits the PERF_RECORD_THROTTLE meta event
and see that it always set the 'time', 'id' and 'stream_id' fields:

static void perf_log_throttle(struct perf_event *event, int enable)
{
struct perf_output_handle handle;
struct perf_sample_data sample;
int ret;

struct {
struct perf_event_header header;
u64 time;
u64 id;
u64 stream_id;
} throttle_event = {
.header = {
.type = PERF_RECORD_THROTTLE,
.misc = 0,
.size = sizeof(throttle_event),
},
.time = perf_event_clock(event),
.id = primary_event_id(event),
.stream_id = event->id,
};

if (enable)
throttle_event.header.type = PERF_RECORD_UNTHROTTLE;

perf_event_header__init_id(&throttle_event.header, &sample, event);

ret = perf_output_begin(&handle, &sample, event,
throttle_event.header.size);
if (ret)
return;

perf_output_put(&handle, throttle_event);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
}

So instead of doing:

+ tuple_set_u64(t, 0, sample->time);
+ tuple_set_s32(t, 1, sample->cpu);
+ tuple_set_s32(t, 2, sample->pid);
+ tuple_set_s32(t, 3, sample->tid);

That may not be set at all, you should do:

+ tuple_set_u64(t, 0, event->throttle.time);
+ tuple_set_u64(t, 1, event->throttle.id);
+ tuple_set_u64(t, 2, event->throttle.stream_id);

These can be left as is with 0xFFFFFFFFFFFFFFFF meaning its not set:

+ tuple_set_s32(t, 3, sample->cpu);
+ tuple_set_s32(t, 4, sample->pid);
+ tuple_set_s32(t, 5, sample->tid);

to confirm the user can look at perf_event_attr.sample_type |
PERF_SAMPLE_X.

- Arnaldo

> Thanks,
> Stephen
>
> `perf report --header-only` from my testing perf.data:
>
> # ========
> # captured on : Wed Aug 11 11:55:44 2021
> # header version : 1
> # data offset : 264
> # data size : 18240
> # feat offset : 18504
> # hostname : stepbren-ol7-2
> # os release : 5.4.17-2102.203.6.el7uek.x86_64
> # perf version : 5.4.17-2102.203.6.el7uek.x86_64
> # arch : x86_64
> # nrcpus online : 4
> # nrcpus avail : 4
> # cpudesc : Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz
> # cpuid : GenuineIntel,6,85,4
> # total memory : 30522624 kB
> # cmdline : /usr/libexec/perf.5.4.17-2102.203.6.el7uek.x86_64 record -c
> 100000 -e cycles -- sh -c echo 2000 > /proc/irqoff_sleep_millis
> # event : name = cycles, , id = { 5, 6, 7, 8 }, size = 112, { sample_period,
> sample_freq } = 100000, sample_type = IP|TID|TIME, read_format = ID,
> disabled = 1, inherit = 1, mmap = 1, comm = 1, enable_on_exec = 1, task = 1,
> sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1,
> bpf_event = 1
> # CPU_TOPOLOGY info available, use -I to display
> # NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: software = 1, uprobe = 7, cpu = 4, breakpoint = 5,
> tracepoint = 2, kprobe = 6, msr = 8
> # CACHE info available, use -I to display
> # time of first sample : 228.321751
> # time of last sample : 230.362698
> # sample duration : 2040.947 ms
> # MEM_TOPOLOGY info available, use -I to display
> # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT
> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
> HYBRID_CPU_PMU_CAPS
> # ========