Re: [PATCH 3/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs

From: Adrian Hunter
Date: Wed Jul 12 2023 - 11:09:03 EST


On 12/07/23 17:44, Yang Jihong wrote:
> Hello,
>
> On 2023/7/11 21:13, Adrian Hunter wrote:
>> On 4/07/23 10:42, Yang Jihong wrote:
>>> User space tasks can migrate between CPUs, we need to track side-band
>>> events for all CPUs.
>>>
>>> The specific scenarios are as follows:
>>>
>>>           CPU0                                 CPU1
>>>    perf record -C 0 start
>>>                                taskA starts to be created and executed
>>>                                  -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>>>                                     events only deliver to CPU1
>>>                                ......
>>>                                  |
>>>                            migrate to CPU0
>>>                                  |
>>>    Running on CPU0    <----------/
>>>    ...
>>>
>>>    perf record -C 0 stop
>>>
>>> Now perf samples the PC of taskA. However, perf does not record the
>>> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
>>> Therefore, the comm and symbols of taskA cannot be parsed.
>>>
>>> The sys_perf_event_open invoked is as follows:
>>>
>>>    # perf --debug verbose=3 record -e cpu-clock -C 1 true
>>>    <SNIP>
>>>    Opening: cpu-clock
>>>    ------------------------------------------------------------
>>>    perf_event_attr:
>>>      type                             1
>>>      size                             136
>>>      { sample_period, sample_freq }   4000
>>>      sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>>>      read_format                      ID|LOST
>>>      disabled                         1
>>>      inherit                          1
>>>      freq                             1
>>>      sample_id_all                    1
>>>      exclude_guest                    1
>>>    ------------------------------------------------------------
>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>>>    Opening: dummy:HG
>>>    ------------------------------------------------------------
>>>    perf_event_attr:
>>>      type                             1
>>>      size                             136
>>>      config                           0x9
>>>      { sample_period, sample_freq }   4000
>>>      sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>>>      read_format                      ID|LOST
>>>      inherit                          1
>>>      mmap                             1
>>>      comm                             1
>>>      freq                             1
>>>      task                             1
>>>      sample_id_all                    1
>>>      mmap2                            1
>>>      comm_exec                        1
>>>      ksymbol                          1
>>>      bpf_event                        1
>>>    ------------------------------------------------------------
>>>    sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>>>    sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>>>    sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>>>    sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>>>    sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>>>    sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>>>    sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>>>    <SNIP>
>>>
>>> Signed-off-by: Yang Jihong <yangjihong1@xxxxxxxxxx>
>>> ---
>>>   tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
>>>   1 file changed, 31 insertions(+)
>>>
>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>> index 8872cd037f2c..69e0d8c75aab 100644
>>> --- a/tools/perf/builtin-record.c
>>> +++ b/tools/perf/builtin-record.c
>>> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
>>>       return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>>>   }
>>>   +static int record__config_tracking_events(struct record *rec)
>>> +{
>>> +    struct evsel *evsel;
>>> +    struct evlist *evlist = rec->evlist;
>>> +    struct record_opts *opts = &rec->opts;
>>> +
>>> +    /*
>>> +     * User space tasks can migrate between CPUs, so when tracing
>>> +     * selected CPUs, sideband for all CPUs is still needed.
>>> +     */
>>> +    if (opts->target.cpu_list) {
>>
>> I am not sure if anyone minds doing this by default, but perhaps
>> we should say something about it on the perf record man page.
>>
> Okay, will add comments to the man page.
>
>>> +        evsel = evlist__findnew_tracking_event(evlist);
>>> +        if (!evsel)
>>> +            return -ENOMEM;
>>> +
>>> +        if (!evsel->core.system_wide) {
>>> +            evsel->core.system_wide = true;
>>> +            evsel__set_sample_bit(evsel, TIME);
>>> +            perf_evlist__propagate_maps(&evlist->core, &evsel->core);
>>> +        }
>>
>> Perhaps better to export via internel/evsel.h
>>
>> void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
>> {
>>     if (!evsel->system_wide) {
>>         evsel->system_wide = true;
>>         if (evlist->needs_map_propagation)
>>             __perf_evlist__propagate_maps(evlist, evsel);
>>     }
>> }
>>
>> As suggested in response to patch 2, perhaps deal with system_wide
>> inside evlist__findnew_tracking_event()
>>
> Okay, I'll modify it as above, so maybe we need to export perf_evlist__propagate_maps().
>
> As mentioned in the patch 1, __perf_evlist__propagate_maps is low-level and avoid to export it.
> Or can we export perf_evsel__go_system_wide() via through internel/evlist.h?

Yes

> In this way, we do not need to export perf_evlist__propagate_maps().
> If so, would it be more appropriate to call perf_evlist__go_system_wide()?

Sure

>
> Thanks,
> Yang