Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE

From: Liang, Kan
Date: Mon Aug 10 2020 - 18:38:39 EST




On 8/10/2020 5:47 PM, Dave Hansen wrote:
On 8/10/20 2:24 PM, Kan Liang wrote:
+static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr)
+{
+ struct page *page;
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ pgd = pgd_offset(mm, addr);
+ if (pgd_none(*pgd))
+ return 0;
+
+ p4d = p4d_offset(pgd, addr);
+ if (!p4d_present(*p4d))
+ return 0;
+
+#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
+ if (p4d_leaf(*p4d)) {
+ page = p4d_page(*p4d);
+
+ if (PageCompound(page))
+ return page_size(compound_head(page));
+
+ return P4D_SIZE;
+ }
+#endif
+
+ pud = pud_offset(p4d, addr);
+ if (!pud_present(*pud))
+ return 0;
+
+#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
+ if (pud_leaf(*pud)) {
+ page = pud_page(*pud);
+
+ if (PageCompound(page))
+ return page_size(compound_head(page));
+
+ return PUD_SIZE;
+ }
+#endif
+
+ pmd = pmd_offset(pud, addr);
+ if (!pmd_present(*pmd))
+ return 0;
+
+#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
+ if (pmd_leaf(*pmd)) {
+ page = pmd_page(*pmd);
+
+ if (PageCompound(page))
+ return page_size(compound_head(page));
+
+ return PMD_SIZE;
+ }
+#endif
+
+ pte = pte_offset_map(pmd, addr);
+ if (!pte_present(*pte)) {
+ pte_unmap(pte);
+ return 0;
+ }
+
+ pte_unmap(pte);
+ return PAGE_SIZE;
+}

It's probably best if we very carefully define up front what is getting
reported here. For instance, I believe we already have some fun cases
with huge tmpfs where a compound page is mapped with 4k PTEs. Kirill
also found a few drivers doing this as well. I think there were also
some weird cases for ARM hugetlbfs where there were multiple hardware
page table entries mapping a single hugetlbfs page. These would be
cases where compound_head() size would be greater than the size of the
leaf paging structure entry.

This is also why we have KerelPageSize and MMUPageSize in /proc/$pid/smaps.

So, is this returning the kernel software page size or the MMU size?


This tries to return the kernel software page size. I will add a commit to the function. For the above cases, I think they can be detected by PageCompound(page). The current code should already cover them. Is my understanding correct?

Thanks,
Kan