Re: [PATCH V9 1/4] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE

From: Will Deacon
Date: Fri Oct 09 2020 - 05:38:00 EST


On Fri, Oct 09, 2020 at 11:09:27AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 01, 2020 at 06:57:46AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
> > +/*
> > + * Return the MMU page size of a given virtual address
> > + */
> > +static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr)
> > +{
> > + pgd_t *pgd;
> > + p4d_t *p4d;
> > + pud_t *pud;
> > + pmd_t *pmd;
> > + pte_t *pte;
> > +
> > + pgd = pgd_offset(mm, addr);
> > + if (pgd_none(*pgd))
> > + return 0;
> > +
> > + p4d = p4d_offset(pgd, addr);
> > + if (!p4d_present(*p4d))
> > + return 0;
> > +
> > + if (p4d_leaf(*p4d))
> > + return 1ULL << P4D_SHIFT;
> > +
> > + pud = pud_offset(p4d, addr);
> > + if (!pud_present(*pud))
> > + return 0;
> > +
> > + if (pud_leaf(*pud))
> > + return 1ULL << PUD_SHIFT;
> > +
> > + pmd = pmd_offset(pud, addr);
> > + if (!pmd_present(*pmd))
> > + return 0;
> > +
> > + if (pmd_leaf(*pmd))
> > + return 1ULL << PMD_SHIFT;
> > +
> > + pte = pte_offset_map(pmd, addr);
> > + if (!pte_present(*pte)) {
> > + pte_unmap(pte);
> > + return 0;
> > + }
> > +
> > + pte_unmap(pte);
> > + return PAGE_SIZE;
> > +}
>
> So this mostly works, but gets a number of hugetlb and arch specific
> things wrong.
>
> With the first 3 patches, this is only exposed to x86 and Power.
> Michael, does the above work for you?
>
> Looking at:
>
> arch/powerpc/include/asm/book3s/64/hugetlb.h:check_and_get_huge_psize()
>
> You seem to limit yourself to page-table sizes, however if I then look
> at the same function in:
>
> arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
> arch/powerpc/include/asm/nohash/hugetlb-book3e.h
>
> it doesn't seem to constrain itself so.
>
> Patch 4 makes it all far worse by exposing it to pretty much everybody.
>
> Now, I think we can fix at least the user mappings with the below delta,
> but if archs are using non-page-table MMU sizes we'll need arch helpers.
>
> ARM64 is in that last boat.
>
> Will, can you live with the below, if not, what would you like to do,
> make the entire function __weak so that you can override it, or hook
> into it somewhere?

Hmm, so I don't think we currently have any PMUs that set 'data->addr'
on arm64, in which case maybe none of this currently matters for us.

However, I must admit that I couldn't figure out exactly what gets exposed
to userspace when the backend drivers don't look at the sample_type or
do anything with the addr field.

Will