Re: [PATCH 3/3] perf/x86/intel: Support auto counter reload

From: Liang, Kan
Date: Tue Oct 01 2024 - 10:44:32 EST




On 2024-10-01 7:16 a.m., Andi Kleen wrote:
>
> I hope the perf tools will support a nicer syntax, the mask is quite
> obscure.

Yes, it's a little bit hard to use, but it's workable with the current
perf too. So I post the kernel patch separately.
Thomas will work on improving the tool side, which will provide a new
and more convenient option.

>
> On Mon, Sep 30, 2024 at 08:41:22AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
>> }
>>
>> +static void intel_pmu_config_acr(int idx, u64 mask, u32 reload)
>> +{
>> + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> + int msr_b, msr_c;
>> +
>> + if (!mask && cpuc->acr_cfg_b[idx] == mask)
>> + return;
>
>
> if (!mask && !cpuc->acr_cfg_b[idx])
>

Sure

>> +
>> + if (idx < INTEL_PMC_IDX_FIXED) {
>> + msr_b = MSR_IA32_PMC_V6_GP0_CFG_B;
>> + msr_c = MSR_IA32_PMC_V6_GP0_CFG_C;
>> + } else {
>> + msr_b = MSR_IA32_PMC_V6_FX0_CFG_B;
>> + msr_c = MSR_IA32_PMC_V6_FX0_CFG_C;
>> + idx -= INTEL_PMC_IDX_FIXED;
>> + }
>
> Does this handle metrics correctly?
>

You mean perf metric? The perf errors out if a perf metric event is
detected.

> I assume you ran the fuzzer over this.
>
>> + if (cpuc->acr_cfg_b[idx] != mask) {
>> + wrmsrl(msr_b + x86_pmu.addr_offset(idx, false), mask);
>> + cpuc->acr_cfg_b[idx] = mask;
>> + }
>> + /* Only need to update the reload value when there is a valid config value. */
>> + if (mask && cpuc->acr_cfg_c[idx] != reload) {
>> + wrmsrl(msr_c + x86_pmu.addr_offset(idx, false), reload);
>> + cpuc->acr_cfg_c[idx] = reload;
>
> Can reload be larger than the counter width? What happens then?

I will add a check in the hw_config() to make sure that the period is
less than the counter width for the auto-reload case.

>
>> return c2;
>> }
>>
>> @@ -3948,6 +4004,78 @@ static inline bool intel_pmu_has_cap(struct perf_event *event, int idx)
>> return test_bit(idx, (unsigned long *)&intel_cap->capabilities);
>> }
>>
>> +static bool intel_pmu_is_acr_group(struct perf_event *event)
>> +{
>> + if (!hybrid(event->pmu, acr_cntr_mask64))
>> + return false;
>
> Shouldn't this error when the group leader
> has the flag set?

Only when both config2 and acr_cntr_mask64 have values, the group leader
has the flag set by the kernel. The case cannot happen, when
!acr_cntr_mask64, but a group leader has the flag set.

>
>> + /* The group leader has the ACR flag set */
>> + if (is_acr_event_group(event))
>> + return true;
>> +
>> + /* The acr_mask is set */
>> + if (event->attr.config2)
>> + return true;
>
>> + * the group. Reconfigure the dyn_mask of each X86 event
>> + * every time when add a new event.
>> + *
>> + * Check whether the reloadable counters is enough and
>> + * initialize the dyn_mask.
>> + */
>> + if (intel_pmu_acr_check_reloadable_event(event))
>> + return -EINVAL;
>> +
>> + /* Reconfigure the dyn_mask for each event */
>> + intel_pmu_set_acr_dyn_mask(leader, event_idx++, event);
>> + for_each_sibling_event(sibling, leader)
>> + intel_pmu_set_acr_dyn_mask(sibling, event_idx++, event);
>> + intel_pmu_set_acr_dyn_mask(event, event_idx, event);
>> +
>
> Shouldn't there be an error somewhere when a mask bit is set that
> exceeds the group? (maybe I missed it)

We have no idea how big the whole group is at that moment. We don't know
if the current event is the last one in a group.
Even if the mask (config2) exceeds the group, the invalid part will be
ignored. It should be harmless.

>
> I assume it could #GP on the MSR write, or maybe even overflow into
> some other field.

The mask (config2) set by the user cannot be directly written to the
MSRs. They are used to find the reloadable mask and caused reload mask,
which are from the enumeration. It guarantees that only the supported
MSRs/counters will be accessed.

The mask (config2) is also used in the intel_pmu_update_acr_mask() which
is after the scheduler. The n - n0 guarantees that only the bits in the
group is converted. The invalid part of the mask (config2) is ignored.

+ /* Convert the group index into the counter index */
+ for_each_set_bit(off, (unsigned long *)&event->attr.config2, n - n0)
+ set_bit(assign[n0 + off], (unsigned long *)&event->hw.config1);



Thanks,
Kan