Re: [PATCH v1 2/3] perf/dwc_pcie: Support narrowed time-based counter for long time monitoring
From: Yicong Yang
Date: Tue Jun 16 2026 - 09:32:25 EST
On 6/16/26 7:34 PM, Jonathan Cameron wrote:
> On Mon, 15 Jun 2026 14:34:58 +0800
> "Yicong Yang" <yang.yicong@xxxxxxxxxxxxx> wrote:
>
>> From: Yufan Dou <douyufan@xxxxxxxxxxxxx>
>>
>> The DWC PCIe Time-Based Analysis Data Register (the counter for time-based
>> events) is architected as 64-bit, but some hardware implementations do not
>> implement the full width. On these implementations the counter stops after
>> reaching its implemented width. This will limit the usage for short time
>> monitoring only. The counter will only cover ~15s for monitoring RX TLP
>> payloads on our platform.
>>
>> Add an optional hrtimer that fires every 2 seconds. It'll take the role
>> as the counter overflow interrupt to read-update-reset the counter and
>> event counts to break the limits of the narrow counters. It'll only
>> apply on timer-based counter. The 2 seconds update period is the half
>> of the maximum counting period (4s) of the time-based counter under
>> period counting mode of the hardware.
>>
>> Because fully-implemented 64-bit counters do not need this workaround,
>> expose it via the module parameter timer_enable (default N) so that
>> users can opt in only on affected platforms.
> Hi,
>
> That bit seems less than ideal. Given there is matching by the vsec
> in the previous patch, can we make this automatically opt in on appropriate
> platforms?
sure, we can make this opt in on our platform first by matching the
root port's vendor ID. it's implemented as a module parameter since
no discovery mechanism for this, then user can enable this if they found
they can only do short time monitoring (symptom attached in the commit)
>
> That would require the vendor to not use the same vsec for devices
> that don't have this quirk so kind of depends what is already out there.
>
> If this is a 'feature' rather than a 'bug' then longer term it would ideally
> be discoverable from a capability style register.
I may found that it's hard to regard this as a feature or bug, but
the implementation choice of the vendor...
>
> A few other minor things inline,
>
will fix in the next version.
Thanks.
> Thanks,
>
> Jonathan
>
>
>>
>> Before this patch, when counting fio for 10m the counts is incorrect:
>> root@localhost:/tmp# echo 0 > /sys/module/dwc_pcie_pmu/parameters/timer_enable
>> root@localhost:/tmp# perf stat -e dwc_rootport_20000/rx_pcie_tlp_data_payload/ -- fio --runtime=10m fio_job.config
>> [...]
>> Run status group 0 (all jobs):
>> READ: bw=5594MiB/s (5865MB/s), 5594MiB/s-5594MiB/s (5865MB/s-5865MB/s), io=3278GiB (3519GB), run=600010-600010msec
>> [...]
>> Performance counter stats for 'system wide':
>> 137,438,953,456 dwc_rootport_20000/rx_pcie_tlp_data_payload/
>>
>> After this patch the counts is as expected:
>> root@localhost:/tmp# echo 1 > /sys/module/dwc_pcie_pmu/parameters/timer_enable
>> root@localhost:/tmp# perf stat -e dwc_rootport_20000/rx_pcie_tlp_data_payload/ -- fio --runtime=10m fio_job.config
>> [...]
>> Run status group 0 (all jobs):
>> READ: bw=5632MiB/s (5905MB/s), 5632MiB/s-5632MiB/s (5905MB/s-5905MB/s), io=3300GiB (3543GB), run=600013-600013msec
>> [...]
>> Performance counter stats for 'system wide':
>> 3,543,850,268,576 dwc_rootport_20000/rx_pcie_tlp_data_payload/
>>
>> Signed-off-by: Yufan Dou <douyufan@xxxxxxxxxxxxx>
>> Signed-off-by: Yicong Yang <yang.yicong@xxxxxxxxxxxxx>
>
>> +static enum hrtimer_restart dwc_pcie_pmu_hrtimer_callback(struct hrtimer *hrtimer)
>> +{
>> + struct dwc_pcie_pmu *pcie_pmu = container_of(hrtimer, struct dwc_pcie_pmu, hrtimer);
>> + struct perf_event *event = pcie_pmu->time_based_event;
>> + struct hw_perf_event *hwc;
>> +
>> + if (!event)
>> + return HRTIMER_NORESTART;
>> +
>> + hwc = &event->hw;
>> + if (hwc->state & PERF_HES_STOPPED)
>> + return HRTIMER_NORESTART;
>> +
>> + dwc_pcie_pmu_event_update(event);
>> + dwc_pcie_pmu_reset_time_based_counter(event);
>> + hrtimer_forward_now(hrtimer, ns_to_ktime(DWC_PCIE_PMU_TIMER_PERIOD_NS));
>
> Small readability thing but I would put a blank line here.
>
>> + return HRTIMER_RESTART;
>> +}
>> +
>> static int dwc_pcie_pmu_event_init(struct perf_event *event)
>> {
>> struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
>> @@ -480,8 +526,13 @@ static void dwc_pcie_pmu_event_start(struct perf_event *event, int flags)
>>
>> if (type == DWC_PCIE_LANE_EVENT)
>> dwc_pcie_pmu_lane_event_enable(pcie_pmu, event, true);
>> - else if (type == DWC_PCIE_TIME_BASE_EVENT)
>> + else if (type == DWC_PCIE_TIME_BASE_EVENT) {
> Same as below - if () {
> } else if {
> }
>> dwc_pcie_pmu_time_based_event_enable(pcie_pmu, true);
>> + if (timer_enable)
>> + hrtimer_start(&pcie_pmu->hrtimer,
>> + ns_to_ktime(DWC_PCIE_PMU_TIMER_PERIOD_NS),
>> + HRTIMER_MODE_REL_PINNED_HARD);
>> + }
>> }
>>
>> static void dwc_pcie_pmu_event_stop(struct perf_event *event, int flags)
>> @@ -497,8 +548,10 @@ static void dwc_pcie_pmu_event_stop(struct perf_event *event, int flags)
>>
>> if (type == DWC_PCIE_LANE_EVENT)
>> dwc_pcie_pmu_lane_event_enable(pcie_pmu, event, false);
>> - else if (type == DWC_PCIE_TIME_BASE_EVENT)
>> + else if (type == DWC_PCIE_TIME_BASE_EVENT) {
>
> Style wise, all legs should have {} if any need it, so add them to the if above as well.
>
>> dwc_pcie_pmu_time_based_event_enable(pcie_pmu, false);
>> + hrtimer_cancel(&pcie_pmu->hrtimer);
>> + }
>>
>> hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
>> }
>> @@ -726,6 +779,8 @@ static int dwc_pcie_pmu_probe(struct platform_device *plat_dev)
>> pcie_pmu->ras_des_offset = vsec;
>> pcie_pmu->nr_lanes = pcie_get_width_cap(pdev);
>> pcie_pmu->on_cpu = -1;
>> + hrtimer_setup(&pcie_pmu->hrtimer, dwc_pcie_pmu_hrtimer_callback,
>> + CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED_HARD);
>> pcie_pmu->pmu = (struct pmu){
>> .name = name,
>> .parent = &plat_dev->dev,