Re: [PATCH 3/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs
From: Adrian Hunter
Date: Wed Jul 12 2023 - 11:09:03 EST
On 12/07/23 17:44, Yang Jihong wrote:
> Hello,
>
> On 2023/7/11 21:13, Adrian Hunter wrote:
>> On 4/07/23 10:42, Yang Jihong wrote:
>>> User space tasks can migrate between CPUs, we need to track side-band
>>> events for all CPUs.
>>>
>>> The specific scenarios are as follows:
>>>
>>> CPU0 CPU1
>>> perf record -C 0 start
>>> taskA starts to be created and executed
>>> -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>>> events only deliver to CPU1
>>> ......
>>> |
>>> migrate to CPU0
>>> |
>>> Running on CPU0 <----------/
>>> ...
>>>
>>> perf record -C 0 stop
>>>
>>> Now perf samples the PC of taskA. However, perf does not record the
>>> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
>>> Therefore, the comm and symbols of taskA cannot be parsed.
>>>
>>> The sys_perf_event_open invoked is as follows:
>>>
>>> # perf --debug verbose=3 record -e cpu-clock -C 1 true
>>> <SNIP>
>>> Opening: cpu-clock
>>> ------------------------------------------------------------
>>> perf_event_attr:
>>> type 1
>>> size 136
>>> { sample_period, sample_freq } 4000
>>> sample_type IP|TID|TIME|ID|CPU|PERIOD
>>> read_format ID|LOST
>>> disabled 1
>>> inherit 1
>>> freq 1
>>> sample_id_all 1
>>> exclude_guest 1
>>> ------------------------------------------------------------
>>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
>>> Opening: dummy:HG
>>> ------------------------------------------------------------
>>> perf_event_attr:
>>> type 1
>>> size 136
>>> config 0x9
>>> { sample_period, sample_freq } 4000
>>> sample_type IP|TID|TIME|ID|CPU|PERIOD
>>> read_format ID|LOST
>>> inherit 1
>>> mmap 1
>>> comm 1
>>> freq 1
>>> task 1
>>> sample_id_all 1
>>> mmap2 1
>>> comm_exec 1
>>> ksymbol 1
>>> bpf_event 1
>>> ------------------------------------------------------------
>>> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
>>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
>>> sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
>>> sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
>>> sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
>>> sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
>>> sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
>>> sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
>>> <SNIP>
>>>
>>> Signed-off-by: Yang Jihong <yangjihong1@xxxxxxxxxx>
>>> ---
>>> tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
>>> 1 file changed, 31 insertions(+)
>>>
>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>> index 8872cd037f2c..69e0d8c75aab 100644
>>> --- a/tools/perf/builtin-record.c
>>> +++ b/tools/perf/builtin-record.c
>>> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
>>> return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>>> }
>>> +static int record__config_tracking_events(struct record *rec)
>>> +{
>>> + struct evsel *evsel;
>>> + struct evlist *evlist = rec->evlist;
>>> + struct record_opts *opts = &rec->opts;
>>> +
>>> + /*
>>> + * User space tasks can migrate between CPUs, so when tracing
>>> + * selected CPUs, sideband for all CPUs is still needed.
>>> + */
>>> + if (opts->target.cpu_list) {
>>
>> I am not sure if anyone minds doing this by default, but perhaps
>> we should say something about it on the perf record man page.
>>
> Okay, will add comments to the man page.
>
>>> + evsel = evlist__findnew_tracking_event(evlist);
>>> + if (!evsel)
>>> + return -ENOMEM;
>>> +
>>> + if (!evsel->core.system_wide) {
>>> + evsel->core.system_wide = true;
>>> + evsel__set_sample_bit(evsel, TIME);
>>> + perf_evlist__propagate_maps(&evlist->core, &evsel->core);
>>> + }
>>
>> Perhaps better to export via internel/evsel.h
>>
>> void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
>> {
>> if (!evsel->system_wide) {
>> evsel->system_wide = true;
>> if (evlist->needs_map_propagation)
>> __perf_evlist__propagate_maps(evlist, evsel);
>> }
>> }
>>
>> As suggested in response to patch 2, perhaps deal with system_wide
>> inside evlist__findnew_tracking_event()
>>
> Okay, I'll modify it as above, so maybe we need to export perf_evlist__propagate_maps().
>
> As mentioned in the patch 1, __perf_evlist__propagate_maps is low-level and avoid to export it.
> Or can we export perf_evsel__go_system_wide() via through internel/evlist.h?
Yes
> In this way, we do not need to export perf_evlist__propagate_maps().
> If so, would it be more appropriate to call perf_evlist__go_system_wide()?
Sure
>
> Thanks,
> Yang