Re: [PATCH V9 1/4] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE

From: Peter Zijlstra
Date: Fri Oct 09 2020 - 05:10:11 EST


On Thu, Oct 01, 2020 at 06:57:46AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
> +/*
> + * Return the MMU page size of a given virtual address
> + */
> +static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr)
> +{
> + pgd_t *pgd;
> + p4d_t *p4d;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte;
> +
> + pgd = pgd_offset(mm, addr);
> + if (pgd_none(*pgd))
> + return 0;
> +
> + p4d = p4d_offset(pgd, addr);
> + if (!p4d_present(*p4d))
> + return 0;
> +
> + if (p4d_leaf(*p4d))
> + return 1ULL << P4D_SHIFT;
> +
> + pud = pud_offset(p4d, addr);
> + if (!pud_present(*pud))
> + return 0;
> +
> + if (pud_leaf(*pud))
> + return 1ULL << PUD_SHIFT;
> +
> + pmd = pmd_offset(pud, addr);
> + if (!pmd_present(*pmd))
> + return 0;
> +
> + if (pmd_leaf(*pmd))
> + return 1ULL << PMD_SHIFT;
> +
> + pte = pte_offset_map(pmd, addr);
> + if (!pte_present(*pte)) {
> + pte_unmap(pte);
> + return 0;
> + }
> +
> + pte_unmap(pte);
> + return PAGE_SIZE;
> +}

So this mostly works, but gets a number of hugetlb and arch specific
things wrong.

With the first 3 patches, this is only exposed to x86 and Power.
Michael, does the above work for you?

Looking at:

arch/powerpc/include/asm/book3s/64/hugetlb.h:check_and_get_huge_psize()

You seem to limit yourself to page-table sizes, however if I then look
at the same function in:

arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
arch/powerpc/include/asm/nohash/hugetlb-book3e.h

it doesn't seem to constrain itself so.

Patch 4 makes it all far worse by exposing it to pretty much everybody.

Now, I think we can fix at least the user mappings with the below delta,
but if archs are using non-page-table MMU sizes we'll need arch helpers.

ARM64 is in that last boat.

Will, can you live with the below, if not, what would you like to do,
make the entire function __weak so that you can override it, or hook
into it somewhere?

DaveM, I saw Sparc64 also has 'funny' hugetlb sizes. Are those only for
hugetlb, or are you also using them for the kernel map?

(we might not need the #ifdef gunk, but I've not yet dug out my cross
compilers this morning)

---
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7009,6 +7009,7 @@ static u64 perf_virt_to_phys(u64 virt)
*/
static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr)
{
+ struct page *page;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
@@ -7030,15 +7031,27 @@ static u64 __perf_get_page_size(struct m
if (!pud_present(*pud))
return 0;

- if (pud_leaf(*pud))
+ if (pud_leaf(*pud)) {
+#ifdef pud_page
+ page = pud_page(*pud);
+ if (PageHuge(page))
+ return page_size(compound_head(page));
+#endif
return 1ULL << PUD_SHIFT;
+ }

pmd = pmd_offset(pud, addr);
if (!pmd_present(*pmd))
return 0;

- if (pmd_leaf(*pmd))
+ if (pmd_leaf(*pmd)) {
+#ifdef pmd_page
+ page = pmd_page(*pmd);
+ if (PageHuge(page))
+ return page_size(compound_head(page));
+#endif
return 1ULL << PMD_SHIFT;
+ }

pte = pte_offset_map(pmd, addr);
if (!pte_present(*pte)) {
@@ -7046,6 +7059,10 @@ static u64 __perf_get_page_size(struct m
return 0;
}

+ page = pte_page(*pte);
+ if (PageHuge(page))
+ return page_size(compound_head(page));
+
pte_unmap(pte);
return PAGE_SIZE;
}