Re: [PATCH 1/2] powerpc/vpa_pmu: Add interface to expose vpa counters via perf

From: Christophe Leroy
Date: Fri Sep 13 2024 - 02:30:54 EST




Le 28/08/2024 à 12:21, Kajol Jain a écrit :
The pseries Shared Processor Logical Partition(SPLPAR) machines
can retrieve a log of dispatch and preempt events from the
hypervisor using data from Disptach Trace Log(DTL) buffer.
With this information, user can retrieve when and why each dispatch &
preempt has occurred. Added an interface to expose the Virtual Processor
Area(VPA) DTL counters via perf.

The following events are available and exposed in sysfs:

vpa_dtl/dtl_cede/ - Trace voluntary (OS initiated) virtual processor waits
vpa_dtl/dtl_preempt/ - Trace time slice preempts
vpa_dtl/dtl_fault/ - Trace virtual partition memory page faults.
vpa_dtl/dtl_all/ - Trace all (dtl_cede/dtl_preempt/dtl_fault)

Added interface defines supported event list, config fields for the
event attributes and their corresponding bit values which are exported
via sysfs. User could use the standard perf tool to access perf events
exposed via vpa-dtl pmu.

The VPA DTL PMU counters do not interrupt on overflow or generate any
PMI interrupts. Therefore, the kernel needs to poll the counters, added
hrtimer code to do that. The timer interval can be provided by user via
sample_period field in nano seconds.

Result on power10 SPLPAR system with 656 cpu threads.
In the below perf record command with vpa_dtl pmu, -c option is used
to provide sample_period whch corresponding to 1000000000ns i.e; 1sec
and the workload time is also 1 second, hence we are getting 656 samples:

[command] perf record -a -R -e vpa_dtl/dtl_all/ -c 1000000000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.828 MB perf.data (656 samples) ]

There is one hrtimer added per vpa-dtl pmu thread. Code added to handle
addition of dtl buffer data in the raw sample. Since DTL does not provide
IP address for a sample and it just have traces on reason of
dispatch/preempt, we directly saving DTL buffer data to perf.data file as
raw sample. For each hrtimer restart call, interface will dump all the
new dtl entries added to dtl buffer as a raw sample.

To ensure there are no other conflicting dtl users (example: debugfs dtl
or /proc/powerpc/vcpudispatch_stats), interface added code to use
"down_write_trylock" call to take the dtl_access_lock. The dtl_access_lock
is defined in dtl.h file. Also added global reference count variable called
"dtl_global_refc", to ensure dtl data can be captured per-cpu. Code also
added global lock called "dtl_global_lock" to avoid race condition.

Signed-off-by: Kajol Jain <kjain@xxxxxxxxxxxxx>
---
Notes:

- Made code changes on top of recent fix sent by Michael Ellerman.
Link to the patch: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20240819122401.513203-1-mpe@xxxxxxxxxxxxxx/

arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/vpa-pmu.c | 469 ++++++++++++++++++++++++++++++++++++
include/linux/cpuhotplug.h | 1 +
3 files changed, 471 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/perf/vpa-pmu.c


Seems like it doesn't build on PPC64:

arch/powerpc/perf/vpa-pmu.c#L212
passing argument 1 of 'up_write' from incompatible pointer type [-Wincompatible-pointer-types]

arch/powerpc/perf/vpa-pmu.c#L261
passing argument 1 of 'down_write_trylock' from incompatible pointer type [-Wincompatible-pointer-types]

arch/powerpc/perf/vpa-pmu.c#L402
passing argument 1 of 'up_write' from incompatible pointer type [-Wincompatible-pointer-types]