Re: [PATCH v6 2/2] drivers/perf: hisi: Add driver for HiSilicon PCIe PMU

From: liuqi (BA)
Date: Fri Jun 18 2021 - 05:32:53 EST




On 2021/6/18 1:57, Will Deacon wrote:
On Thu, Jun 17, 2021 at 07:00:26PM +0800, liuqi (BA) wrote:


On 2021/6/16 21:42, Will Deacon wrote:
Hi,

On Wed, Jun 16, 2021 at 09:54:23AM +0800, liuqi (BA) wrote:
On 2021/6/15 17:35, Will Deacon wrote:
On Tue, Jun 15, 2021 at 04:57:09PM +0800, liuqi (BA) wrote:
On 2021/6/12 0:23, Will Deacon wrote:
On Mon, May 31, 2021 at 09:32:31PM +0800, Qi Liu wrote:
+ /* Process data to set unit of latency as "us". */
+ if (is_latency_event(idx))
+ return div64_u64(data * us_per_cycle, data_ext);
+
+ if (is_bus_util_event(idx))
+ return div64_u64(data * us_per_cycle, data_ext);
+
+ if (is_buf_util_event(idx))
+ return div64_u64(data, data_ext * us_per_cycle);

Why do we need to do all this division in the kernel? Can't we just expose
the underlying values and let userspace figure out what it wants to do with
the numbers?

Our PMU hardware support 8 sets of counters to count bandwidth, latency and
utilization events.

For example, when users set latency event, common counter will count delay
cycles, and extern counter count number of PCIe packets automaticly. And we
do not have a event number for counting number of PCIe packets.

So this division cannot move to userspace tool.

Why can't you expose the packet counter as an extra event to userspace?

Maybe I didn’t express it clearly.

As there is no hardware event number for PCIe packets counting, extern
counter count packets *automaticly* when latency events is selected by
users.

This means users cannot set "config=0xXX" to start packets counting event.
So we can only get the value of counter and extern counter in driver and do
the division, then pass the result to userspace.

I still think it would be ideal if we could expose both values to userspace
rather than combine them somehow. Hmm. Anyway...

I struggled to figure out exactly what's being counted from the
documentation patch (please update that). Please can you explain exactly
what appears in the HISI_PCIE_CNT and HISI_PCIE_EXT_CNT registers for the
different modes of operation? Without that, the ratios you've chosen to
report seem rather arbitrary.


PCIe PMU events can be devided into 2 types: one type is counted by
HISI_PCIE_CNT, the other type is counted by HISI_PCIE_EXT_CNT and
HISI_PCIE_CNT, including bandwidth events, latency events, buffer
utilization and bus utilization.

if user sets "event=0x10, subevent=0x02", this means "latency of RX memory
read" is selected. HISI_PCIE_CNT counts total delay cycles and
HISI_PCIE_EXT_CNT counts PCIe packets number at the same time. So PMU driver
could obtain average latency by caculating: HISI_PCIE_CNT /
HISI_PCIE_EXT_CNT.

if users sets "event=0x04, subevent=0x01", this means bandwidth of RX memory
read is selected. HISI_PCIE_CNT counts total packet data volume and
HISI_PCIE_EXT_CNT counts cycles, so PMU driver could obtain average
bandwidth by caculating: HISI_PCIE_CNT / HISI_PCIE_EXT_CNT.

The same logic is used when calculating bus utilization and buffer
utilization. Seems I should add this part in Document patch,I 'll do this in
next version, thanks.

I also couldn't figure out how the latency event works. For example, I was
assuming it would be a filter (a bit like the length), so you could say
things like "I'm only interested in packets with a latency higher than x"
but it doesn't look like it works that way.

Thanks,

latency is not a filter, PCIe PMU has a group of lactency events, their
event number are within the latency_events_list, and the above explains how
latency events work.

PMU drivers have TLP length filter for bandwidth events, users could set
like "I only interested in bandwidth of packets with TLP length bigger than
x".

Thanks for the explanations, I think I get it a bit better now. But I still
think we should be exposing both of the values to userspace instead of
reporting the ratio from which the individual counters are then
unrecoverable.

It will complicate the driver slightly, but can we instead expose the
events independently and then allowing scheduling some of them in groups?

That way we just treat HISI_PCIE_CNT and HISI_PCIE_EXT_CNT as separate
counters, but with a scheduling constraint that events in a register pair
must be in the same group.

Will

Hi Will,

I got what you mean, treating HISI_PCIE_CNT and HISI_PCIE_EXT_CNT as separate counters is a great idea, but here is a hardware limitation.

The behavior of HISI_PCIE_EXT_CNT is controlled by hardware logical, so HISI_PCIE_EXT_CNT only works when latency/bandwidth/... events number are set in HISI_PCIE_EVENT_CTRL. So driver cannot separate this two counters, they must work together because of hardware limitation.

We try to expose both values of counters at the same time, but there seems only one "event->count" for driver to expose value. Is there any method to do this?

Thanks,
Qi

.