Re: [PATCH v1 2/3] perf/dwc_pcie: Support narrowed time-based counter for long time monitoring
From: Jonathan Cameron
Date: Tue Jun 16 2026 - 07:36:57 EST
On Mon, 15 Jun 2026 14:34:58 +0800
"Yicong Yang" <yang.yicong@xxxxxxxxxxxxx> wrote:
> From: Yufan Dou <douyufan@xxxxxxxxxxxxx>
>
> The DWC PCIe Time-Based Analysis Data Register (the counter for time-based
> events) is architected as 64-bit, but some hardware implementations do not
> implement the full width. On these implementations the counter stops after
> reaching its implemented width. This will limit the usage for short time
> monitoring only. The counter will only cover ~15s for monitoring RX TLP
> payloads on our platform.
>
> Add an optional hrtimer that fires every 2 seconds. It'll take the role
> as the counter overflow interrupt to read-update-reset the counter and
> event counts to break the limits of the narrow counters. It'll only
> apply on timer-based counter. The 2 seconds update period is the half
> of the maximum counting period (4s) of the time-based counter under
> period counting mode of the hardware.
>
> Because fully-implemented 64-bit counters do not need this workaround,
> expose it via the module parameter timer_enable (default N) so that
> users can opt in only on affected platforms.
Hi,
That bit seems less than ideal. Given there is matching by the vsec
in the previous patch, can we make this automatically opt in on appropriate
platforms?
That would require the vendor to not use the same vsec for devices
that don't have this quirk so kind of depends what is already out there.
If this is a 'feature' rather than a 'bug' then longer term it would ideally
be discoverable from a capability style register.
A few other minor things inline,
Thanks,
Jonathan
>
> Before this patch, when counting fio for 10m the counts is incorrect:
> root@localhost:/tmp# echo 0 > /sys/module/dwc_pcie_pmu/parameters/timer_enable
> root@localhost:/tmp# perf stat -e dwc_rootport_20000/rx_pcie_tlp_data_payload/ -- fio --runtime=10m fio_job.config
> [...]
> Run status group 0 (all jobs):
> READ: bw=5594MiB/s (5865MB/s), 5594MiB/s-5594MiB/s (5865MB/s-5865MB/s), io=3278GiB (3519GB), run=600010-600010msec
> [...]
> Performance counter stats for 'system wide':
> 137,438,953,456 dwc_rootport_20000/rx_pcie_tlp_data_payload/
>
> After this patch the counts is as expected:
> root@localhost:/tmp# echo 1 > /sys/module/dwc_pcie_pmu/parameters/timer_enable
> root@localhost:/tmp# perf stat -e dwc_rootport_20000/rx_pcie_tlp_data_payload/ -- fio --runtime=10m fio_job.config
> [...]
> Run status group 0 (all jobs):
> READ: bw=5632MiB/s (5905MB/s), 5632MiB/s-5632MiB/s (5905MB/s-5905MB/s), io=3300GiB (3543GB), run=600013-600013msec
> [...]
> Performance counter stats for 'system wide':
> 3,543,850,268,576 dwc_rootport_20000/rx_pcie_tlp_data_payload/
>
> Signed-off-by: Yufan Dou <douyufan@xxxxxxxxxxxxx>
> Signed-off-by: Yicong Yang <yang.yicong@xxxxxxxxxxxxx>
> +static enum hrtimer_restart dwc_pcie_pmu_hrtimer_callback(struct hrtimer *hrtimer)
> +{
> + struct dwc_pcie_pmu *pcie_pmu = container_of(hrtimer, struct dwc_pcie_pmu, hrtimer);
> + struct perf_event *event = pcie_pmu->time_based_event;
> + struct hw_perf_event *hwc;
> +
> + if (!event)
> + return HRTIMER_NORESTART;
> +
> + hwc = &event->hw;
> + if (hwc->state & PERF_HES_STOPPED)
> + return HRTIMER_NORESTART;
> +
> + dwc_pcie_pmu_event_update(event);
> + dwc_pcie_pmu_reset_time_based_counter(event);
> + hrtimer_forward_now(hrtimer, ns_to_ktime(DWC_PCIE_PMU_TIMER_PERIOD_NS));
Small readability thing but I would put a blank line here.
> + return HRTIMER_RESTART;
> +}
> +
> static int dwc_pcie_pmu_event_init(struct perf_event *event)
> {
> struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
> @@ -480,8 +526,13 @@ static void dwc_pcie_pmu_event_start(struct perf_event *event, int flags)
>
> if (type == DWC_PCIE_LANE_EVENT)
> dwc_pcie_pmu_lane_event_enable(pcie_pmu, event, true);
> - else if (type == DWC_PCIE_TIME_BASE_EVENT)
> + else if (type == DWC_PCIE_TIME_BASE_EVENT) {
Same as below - if () {
} else if {
}
> dwc_pcie_pmu_time_based_event_enable(pcie_pmu, true);
> + if (timer_enable)
> + hrtimer_start(&pcie_pmu->hrtimer,
> + ns_to_ktime(DWC_PCIE_PMU_TIMER_PERIOD_NS),
> + HRTIMER_MODE_REL_PINNED_HARD);
> + }
> }
>
> static void dwc_pcie_pmu_event_stop(struct perf_event *event, int flags)
> @@ -497,8 +548,10 @@ static void dwc_pcie_pmu_event_stop(struct perf_event *event, int flags)
>
> if (type == DWC_PCIE_LANE_EVENT)
> dwc_pcie_pmu_lane_event_enable(pcie_pmu, event, false);
> - else if (type == DWC_PCIE_TIME_BASE_EVENT)
> + else if (type == DWC_PCIE_TIME_BASE_EVENT) {
Style wise, all legs should have {} if any need it, so add them to the if above as well.
> dwc_pcie_pmu_time_based_event_enable(pcie_pmu, false);
> + hrtimer_cancel(&pcie_pmu->hrtimer);
> + }
>
> hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
> }
> @@ -726,6 +779,8 @@ static int dwc_pcie_pmu_probe(struct platform_device *plat_dev)
> pcie_pmu->ras_des_offset = vsec;
> pcie_pmu->nr_lanes = pcie_get_width_cap(pdev);
> pcie_pmu->on_cpu = -1;
> + hrtimer_setup(&pcie_pmu->hrtimer, dwc_pcie_pmu_hrtimer_callback,
> + CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED_HARD);
> pcie_pmu->pmu = (struct pmu){
> .name = name,
> .parent = &plat_dev->dev,