[PATCH V9 0/4] Add the page size in the perf record (kernel)

From: kan . liang
Date: Thu Oct 01 2020 - 09:59:37 EST


From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

Changes since V8
- Drop active_mm which can cause kernel panic

Changes since V7
- Use active_mm to replace mm and init_mm
- Update the commit message of the patch 1

Changes since V6
- Return the MMU page size of a given virtual address, not the kernel
software page size
- Add PERF_SAMPLE_DATA_PAGE_SIZE support for Power
- Allow large PEBS for PERF_SAMPLE_CODE_PAGE_SIZE
- Only include the kernel patches. The perf tool patches will be posted
later separately once the kernel patches are accepted.

Changes since V5
- Introduce a new universal page walker for the page size in the perf
subsystem.
- Rebased on Peter's tree.

Current perf can report both virtual addresses and physical addresses,
but not the page size. Without the page size information of the utilized
page, users cannot decide whether to promote/demote large pages to
optimize memory usage.

The patch set was submitted a year ago.
https://lkml.kernel.org/r/1549648509-12704-1-git-send-email-kan.liang@xxxxxxxxxxxxxxx
It introduced a __weak function, perf_get_page_size(), aim to retrieve
the page size via a given virtual address in the generic code, and
implemented a x86 specific version of perf_get_page_size().
However, the proposal was rejected, because it's a pure x86
implementation.
https://lkml.kernel.org/r/20190208200731.GN32511@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

At that time, it's not easy to support perf_get_page_size() universally,
because some key functions, e.g., p?d_large, are not supported by some
architectures.

Now, the generic p?d_leaf() functions are added in the latest kernel.
https://lkml.kernel.org/r/20191218162402.45610-2-steven.price@xxxxxxx
Starts from V6, a new universal perf_get_page_size() function is
implemented based on the generic p?d_leaf() functions.

On some platforms, e.g., X86, the page walker is invoked in an NMI
handler. So the page walker must be NMI-safe and low overhead. Besides,
the page walker should work for both user and kernel virtual address.
The existing generic page walker, e.g., walk_page_range_novma(), is a
little bit complex and doesn't guarantee the NMI-safe. The follow_page()
is only for the user-virtual address. So a simpler page walk function is
implemented here.

Kan Liang (3):
perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE
powerpc/perf: Support PERF_SAMPLE_DATA_PAGE_SIZE

Stephane Eranian (1):
perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE

arch/powerpc/perf/core-book3s.c | 6 +-
arch/x86/events/intel/ds.c | 11 ++-
arch/x86/events/perf_event.h | 2 +-
include/linux/perf_event.h | 2 +
include/uapi/linux/perf_event.h | 6 +-
kernel/events/core.c | 114 +++++++++++++++++++++++++++++++-
6 files changed, 133 insertions(+), 8 deletions(-)

--
2.17.1