Re: [PATCH 3/4] cxl: CXL Performance Monitoring Unit driver

From: Liang, Kan
Date: Tue Mar 07 2023 - 11:23:54 EST




On 2023-03-07 4:19 a.m., Jonathan Cameron wrote:
>>> A hybrid approach of exposing the VID / GID / Mask sets to userspace that
>>> then deals with pretty naming etc would work though.
>>>
>>> For summed events we also don't have enough information without similar
>>> exports of VID / GID / Mask.
>>>
>>> We can't use the fact that events are exposed which 'might' be summable in
>>> hardware because we may have only subsets supported by the hardware.
>>>
>>> E.g.
>>>
>>> VID / GID / Mask
>>>
>>> 19e5 / 1 / 0x1
>>> 19e5 / 1 / 0xe
>>>
>>> etc
>>>
>>> There are going to be a lot of CXL device implementations so I think we need
>>> to find a sensible interface to pass this information to perf tool.
>>>
>>> Currently I'm thinking a new sysfs ABI that has
>>>
>>> event_groupX_vid, event_groupX_gid, event_groupX_msk, event_groupX_fixed
>>> with one set of entries for every event group.
>>>
>>> Using that, perf tool can apply various rules to figure out sensible subsets
>>> to advertise.
>>>
>> Usually, we only hardcode the generic/architectural or some widely used
>> events in the kernel. The perf tool should be the place for the vender
>> specific/device specific events.
>>
>> We may not want to expose the counter constraints information to the
>> user space. It's better to keep the information in the kernel. We can
>> use it to check whether the user input is valid in the event_init.
>>
>> For a specific device, the constraint information should be fixed. We
>> may create a dedicate event list file in perf tool for each device.
>>
>> For example, with your example, you can define the below events in the
>> event list file. (just demonstrate the idea. It should be in JSON format
>> in the file.)
>> Event Name / VID / GID / Mask
>> A / 19e5 / 1 / 0x1
>> B / 19e5 / 1 / 0x2
>> C / 19e5 / 1 / 0x4
>> D / 19e5 / 1 / 0x8
>> BC / 19e5 / 1 / 0x6
>> BD / 19e5 / 1 / 0xa
>> CD / 19e5 / 1 / 0xc
>> BCD / 19e5 / 1 / 0xe
>>
>> perf stat -e BD will give you the summed event B + D.
> To me, that seems to be a non starter. We will have 100s to 1000s of
> device variants, each of which may have several instances of CPMU with
> different events. The available events on some devices will be
> firmware version dependent.
>
> The explosion in json files will rapidly become unmanageable.

OK. I agree.

> Given PMU capabilities are fully discoverable we should use that.
>
> We'd then need json to describe what groups make sense, but not which
> ones the particular hardware supports. Thus we'd need one set of
> data for a particular VID/GID pair, instead of one for every device
> with it's own mix of multiple masks for each VID/GID pair.
>
> In a similar fashion to metrics, not all combinations are likely to be
> useful things to measure so we can describe just the ones that seem
> reasonable.
>

It sounds reasonable. There will be a JSON file which includes all the
reasonable combinations of VID/GID/mask. We can decide the available one
at runtime, e.g., perf list will only list the available events/metrics.

Thanks,
Kan