Re: [PATCH V3 01/13] perf/core, x86: Add PERF_SAMPLE_DATA_PAGE_SIZE

From: Peter Zijlstra
Date: Thu Jan 31 2019 - 07:59:18 EST


On Thu, Jan 31, 2019 at 01:37:25PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 30, 2019 at 06:23:42AM -0800, kan.liang@xxxxxxxxxxxxxxx wrote:
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index 374a197..03bf45d 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -2578,3 +2578,45 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
> > cap->events_mask_len = x86_pmu.events_mask_len;
> > }
> > EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability);
> > +
> > +/*
> > + * map x86 page levels to perf page sizes
> > + */
> > +static const enum perf_page_size perf_page_size_map[PG_LEVEL_NUM] = {
> > + [PG_LEVEL_NONE] = PERF_PAGE_SIZE_NONE,
> > + [PG_LEVEL_4K] = PERF_PAGE_SIZE_4K,
> > + [PG_LEVEL_2M] = PERF_PAGE_SIZE_2M,
> > + [PG_LEVEL_1G] = PERF_PAGE_SIZE_1G,
> > + [PG_LEVEL_512G] = PERF_PAGE_SIZE_512G,
> > +};
> > +
> > +u64 perf_get_page_size(u64 virt)
> > +{
> > + unsigned long flags;
> > + unsigned int level;
> > + pte_t *pte;
> > +
> > + if (!virt)
> > + return 0;
> > +
> > + /*
> > + * Interrupts are disabled, so it prevents any tear down
> > + * of the page tables.
> > + * See the comment near struct mmu_table_batch.
> > + */
> > + local_irq_save(flags);
> > + if (virt >= TASK_SIZE)
> > + pte = lookup_address(virt, &level);
> > + else {
> > + if (current->mm)
> > + pte = lookup_address_in_pgd(pgd_offset(current->mm, virt),
> > + virt, &level);
>
> Aside from all the missin {}, I'm fairly sure this is broken since this
> happens from NMI context. This can interrupt switch_mm() and things like
> use_temporary_mm().

Ah, I'm confused again. This is a software page-table walk and is not
affected by the current CR3 state, which is much safer.

The rest of the comment still apply of course.