Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

From: David Coe
Date: Fri May 14 2021 - 06:48:26 EST

Hi all!

On 04/05/2021 07:52, Suravee Suthikulpanit wrote:
On certain AMD platforms, when the IOMMU performance counter source
(csource) field is zero, power-gating for the counter is enabled, which
prevents write access and returns zero for read access.

This can cause invalid perf result especially when event multiplexing
is needed (i.e. more number of events than available counters) since
the current logic keeps track of the previously read counter value,
and subsequently re-program the counter to continue counting the event.
With power-gating enabled, we cannot gurantee successful re-programming
of the counter.

Workaround this issue by :

1. Modifying the ordering of setting/reading counters and enabing/
disabling csources to only access the counter when the csource
is set to non-zero.

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
can be simplified to always start counting with value zero,
and accumulate the counter value when stopping without the need
to keep track and reprogram the counter with the previously read
counter value.

This has been tested on systems with and without power-gating.

I've just noticed kernel-5.13-rc1 includes your full iommu enchilada. A quick test with Ubuntu's mainline ppa debs (and a home-spun perf)gives on a Ryzen 2400G what seem very satisfactory results. Bravo!

Performance counter stats for 'system wide':

0 amd_iommu_0/cmd_processed/ (33.32%)
0 amd_iommu_0/cmd_processed_inv/ (33.34%)
0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.38%)
615 amd_iommu_0/int_dte_hit/ (33.44%)
5 amd_iommu_0/int_dte_mis/ (33.44%)
1,347 amd_iommu_0/mem_dte_hit/ (33.46%)
19,127 amd_iommu_0/mem_dte_mis/ (33.44%)
71 amd_iommu_0/mem_iommu_tlb_pde_hit/ (33.43%)
754 amd_iommu_0/mem_iommu_tlb_pde_mis/ (33.41%)
1,777 amd_iommu_0/mem_iommu_tlb_pte_hit/ (33.36%)
20,163 amd_iommu_0/mem_iommu_tlb_pte_mis/ (33.32%)
0 amd_iommu_0/mem_pass_excl/ (33.25%)
0 amd_iommu_0/mem_pass_pretrans/ (33.28%)
27,283 amd_iommu_0/mem_pass_untrans/ (33.27%)
0 amd_iommu_0/mem_target_abort/ (33.29%)
645 amd_iommu_0/mem_trans_total/ (33.32%)
0 amd_iommu_0/page_tbl_read_gst/ (33.28%)
183 amd_iommu_0/page_tbl_read_nst/ (33.30%)
45 amd_iommu_0/page_tbl_read_tot/ (33.30%)
0 amd_iommu_0/smi_blk/ (33.32%)
0 amd_iommu_0/smi_recv/ (33.28%)
0 amd_iommu_0/tlb_inv/ (33.27%)
0 amd_iommu_0/vapic_int_guest/ (33.28%)
613 amd_iommu_0/vapic_int_non_guest/ (33.26%)

9.998673791 seconds time elapsed

Running Windows 10 & etc under QEMU/KVM produces nothing untoward. Again, congratulations and many thanks.