Re: [PATCH] perf: RISC-V: Check standard event availability

From: Atish Patra
Date: Mon Apr 15 2024 - 20:01:20 EST


On 4/10/24 18:40, Samuel Holland wrote:
Hi Atish,

On 2024-03-18 2:44 PM, Atish Patra wrote:
On Wed, Jan 3, 2024 at 8:54 AM Samuel Holland <samuel.holland@xxxxxxxxxx> wrote:

The RISC-V SBI PMU specification defines several standard hardware and
cache events. Currently, all of these events appear in the `perf list`
output, even if they are not actually implemented. Add logic to check
which events are supported by the hardware (i.e. can be mapped to some
counter), so only usable events are reported to userspace.


Thanks for the patch.
This adds tons of SBI calls at every boot for a use case which is at
best confusing for a subset of users who actually wants to run perf.

I should have been clearer in the patch description. This is not just a cosmetic
change; because perf sees these as valid events, it tries to use them in
commands like `perf stat`. When the error from SBI_EXT_PMU_COUNTER_CFG_MATCH
causes the ->add() callback to fail, this prevents any other events from being
scheduled on that same CPU (search can_add_hw in kernel/events/core.c). That is
why the dTLB/iTLB miss counts are missing in the "before" example below.


Thanks for explaining the problem. I can reproduce it in qemu as well if enough number of invalid events given on the command line and the workload is short enough.

This probing can be done at runtime by invoking the
pmu_sbi_check_event from pmu_sbi_event_map.
We can update the event map after that so that it doesn't need to
invoke pmu_sbi_check_event next time.

I tried to implement this suggestion, but it did not work. The SBI interface
does not distinguish between "none of the counters can monitor the specified
event [because the event is unsupported]" and "none of the counters can monitor
the specified event [because the counters are busy]". It is not sufficient for
the kernel to verify that at least one counter is available before performing
the check, because certain events may only be usable on a subset of counters
(per riscv,event-to-mhpmcounters), and the kernel does not know that mapping.


Yeah. My suggestion was to fix the perf list issue which is different than the issue reported now.

As a result, checking for event support is only reliable when none of the
counters are in use. So the check can be asynchronous/deferred to later in the
boot process, but I believe it still needs to be performed for all events before
userspace starts using the counters.


We should defer it a work queue for sure. We can also look at improving SBI PMU extension to support bulk matching behavior as well.

However, I think a better solution would be to just rely on the json file mappings instead of making SBI calls. We are going to have the event encoding and mappings in the json in the future.

I had added it only for platforms with counter delegation[1] but I think this can be generalized for platforms with SBI PMU as well.

I had some hacks to specify the legacy event encodings but Ian rogers improved with a generic support by preferring sysfs/json event encodings over fixed ones. I am yet to rebase and try Ian's series on top of the counter delegation though. Thoughts ?

[1] https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@xxxxxxxxxxxx/
[2] https://lore.kernel.org/bpf/20240415063626.453987-2-irogers@xxxxxxxxxx/T/


Regards,
Samuel

Signed-off-by: Samuel Holland <samuel.holland@xxxxxxxxxx>
---
Before this patch:
$ perf list hw

List of pre-defined events (to be used in -e or -M):

branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
stalled-cycles-backend OR idle-cycles-backend [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]

$ perf stat -ddd true

Performance counter stats for 'true':

4.36 msec task-clock # 0.744 CPUs utilized
1 context-switches # 229.325 /sec
0 cpu-migrations # 0.000 /sec
38 page-faults # 8.714 K/sec
4,375,694 cycles # 1.003 GHz (60.64%)
728,945 instructions # 0.17 insn per cycle
79,199 branches # 18.162 M/sec
17,709 branch-misses # 22.36% of all branches
181,734 L1-dcache-loads # 41.676 M/sec
5,547 L1-dcache-load-misses # 3.05% of all L1-dcache accesses
<not counted> LLC-loads (0.00%)
<not counted> LLC-load-misses (0.00%)
<not counted> L1-icache-loads (0.00%)
<not counted> L1-icache-load-misses (0.00%)
<not counted> dTLB-loads (0.00%)
<not counted> dTLB-load-misses (0.00%)
<not counted> iTLB-loads (0.00%)
<not counted> iTLB-load-misses (0.00%)
<not counted> L1-dcache-prefetches (0.00%)
<not counted> L1-dcache-prefetch-misses (0.00%)

0.005860375 seconds time elapsed

0.000000000 seconds user
0.010383000 seconds sys

After this patch:
$ perf list hw

List of pre-defined events (to be used in -e or -M):

branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]

$ perf stat -ddd true

Performance counter stats for 'true':

5.16 msec task-clock # 0.848 CPUs utilized
1 context-switches # 193.817 /sec
0 cpu-migrations # 0.000 /sec
37 page-faults # 7.171 K/sec
5,183,625 cycles # 1.005 GHz
961,696 instructions # 0.19 insn per cycle
85,853 branches # 16.640 M/sec
20,462 branch-misses # 23.83% of all branches
243,545 L1-dcache-loads # 47.203 M/sec
5,974 L1-dcache-load-misses # 2.45% of all L1-dcache accesses
<not supported> LLC-loads
<not supported> LLC-load-misses
<not supported> L1-icache-loads
<not supported> L1-icache-load-misses
<not supported> dTLB-loads
19,619 dTLB-load-misses
<not supported> iTLB-loads
6,831 iTLB-load-misses
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses

0.006085625 seconds time elapsed

0.000000000 seconds user
0.013022000 seconds sys


drivers/perf/riscv_pmu_sbi.c | 37 ++++++++++++++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 16acd4dcdb96..b58a70ee8317 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -86,7 +86,7 @@ struct sbi_pmu_event_data {
};
};

-static const struct sbi_pmu_event_data pmu_hw_event_map[] = {
+static struct sbi_pmu_event_data pmu_hw_event_map[] = {
[PERF_COUNT_HW_CPU_CYCLES] = {.hw_gen_event = {
SBI_PMU_HW_CPU_CYCLES,
SBI_PMU_EVENT_TYPE_HW, 0}},
@@ -120,7 +120,7 @@ static const struct sbi_pmu_event_data pmu_hw_event_map[] = {
};

#define C(x) PERF_COUNT_HW_CACHE_##x
-static const struct sbi_pmu_event_data pmu_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
+static struct sbi_pmu_event_data pmu_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
[C(L1D)] = {
@@ -265,6 +265,36 @@ static const struct sbi_pmu_event_data pmu_cache_event_map[PERF_COUNT_HW_CACHE_M
},
};

+static void pmu_sbi_check_event(struct sbi_pmu_event_data *edata)
+{
+ struct sbiret ret;
+
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH,
+ 0, cmask, 0, edata->event_idx, 0, 0);
+ if (!ret.error) {
+ sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP,
+ ret.value, 0x1, SBI_PMU_STOP_FLAG_RESET, 0, 0, 0);
+ } else if (ret.error == SBI_ERR_NOT_SUPPORTED) {
+ /* This event cannot be monitored by any counter */
+ edata->event_idx = -EINVAL;
+ }
+}
+
+static void pmu_sbi_update_events(void)
+{
+ /* Ensure events are not already mapped to a counter */
+ sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP,
+ 0, cmask, SBI_PMU_STOP_FLAG_RESET, 0, 0, 0);
+
+ for (int i = 0; i < ARRAY_SIZE(pmu_hw_event_map); i++)
+ pmu_sbi_check_event(&pmu_hw_event_map[i]);
+
+ for (int i = 0; i < ARRAY_SIZE(pmu_cache_event_map); i++)
+ for (int j = 0; j < ARRAY_SIZE(pmu_cache_event_map[i]); j++)
+ for (int k = 0; k < ARRAY_SIZE(pmu_cache_event_map[i][j]); k++)
+ pmu_sbi_check_event(&pmu_cache_event_map[i][j][k]);
+}
+
static int pmu_sbi_ctr_get_width(int idx)
{
return pmu_ctr_list[idx].width;
@@ -1046,6 +1076,9 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
if (pmu_sbi_get_ctrinfo(num_counters, &cmask))
goto out_free;

+ /* Check which standard events are available */
+ pmu_sbi_update_events();
+
ret = pmu_sbi_setup_irqs(pmu, pdev);
if (ret < 0) {
pr_info("Perf sampling/filtering is not supported as sscof extension is not available\n");
--
2.42.0


_______________________________________________
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv