Re: [PATCH 1/2] perf: Add munmap callback

From: Stephane Eranian
Date: Wed Oct 24 2018 - 15:31:07 EST


Hi,

On Wed, Oct 24, 2018 at 8:12 AM <kan.liang@xxxxxxxxxxxxxxx> wrote:
>
> From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>
> To calculate the physical address, perf needs to walk the pages tables.
> The related mapping may has already been removed from pages table in
> some cases (e.g. large PEBS). The virtual address recorded in the first
> PEBS records may already be unmapped before draining PEBS buffers.
>
> Add a munmap callback to notify the PMU of any unmapping, which only be
> invoked when the munmap is implemented.
>
The need for this new record type extends beyond physical address conversions
and PEBS. A long while ago, someone reported issues with symbolization related
to perf lacking munmap tracking. It had to do with vma merging. I think the
sequence of mmaps was as follows in the problematic case:
1. addr1 = mmap(8192);
2. munmap(addr1 + 4096, 4096)
3. addr2 = mmap(addr1+4096, 4096)

If successful, that yields addr2 = addr1 + 4096 (could also get the
same without forcing the address).

In that case, if I recall correctly, the vma for 1st mapping (now at
4k) and that of the 2nd mapping (4k)
get merged into a single 8k vma and this is what perf_events will
record for PERF_RECORD_MMAP.
On the perf tool side, it is assumed that if two timestamped mappings
overlap then, the latter overrides
the former. In this case, perf would loose the mapping of the first
4kb and assume all symbols comes from
2nd mapping. Hopefully I got the scenario right. If so, then you'd
need PERF_RECORD_UNMAP to
disambiguate assuming the perf tool is modified accordingly.


>
> Signed-off-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
> ---
> include/linux/perf_event.h | 3 +++
> kernel/events/core.c | 25 +++++++++++++++++++++++++
> mm/mmap.c | 1 +
> 3 files changed, 29 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f0ca79..7f0a9258ce1f 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -400,6 +400,7 @@ struct pmu {
> */
> void (*sched_task) (struct perf_event_context *ctx,
> bool sched_in);
> + void (*munmap) (void);
> /*
> * PMU specific data size
> */
> @@ -1113,6 +1114,7 @@ static inline void perf_event_task_sched_out(struct task_struct *prev,
> }
>
> extern void perf_event_mmap(struct vm_area_struct *vma);
> +extern void perf_event_munmap(void);
> extern struct perf_guest_info_callbacks *perf_guest_cbs;
> extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
> extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
> @@ -1333,6 +1335,7 @@ static inline int perf_unregister_guest_info_callbacks
> (struct perf_guest_info_callbacks *callbacks) { return 0; }
>
> static inline void perf_event_mmap(struct vm_area_struct *vma) { }
> +static inline void perf_event_munmap(void) { }
> static inline void perf_event_exec(void) { }
> static inline void perf_event_comm(struct task_struct *tsk, bool exec) { }
> static inline void perf_event_namespaces(struct task_struct *tsk) { }
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5a97f34bc14c..00338d6fbed7 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3184,6 +3184,31 @@ static void perf_pmu_sched_task(struct task_struct *prev,
> }
> }
>
> +void perf_event_munmap(void)
> +{
> + struct perf_cpu_context *cpuctx;
> + unsigned long flags;
> + struct pmu *pmu;
> +
> + local_irq_save(flags);
> + list_for_each_entry(cpuctx, this_cpu_ptr(&sched_cb_list), sched_cb_entry) {
> + pmu = cpuctx->ctx.pmu;
> +
> + if (!pmu->munmap)
> + continue;
> +
> + perf_ctx_lock(cpuctx, cpuctx->task_ctx);
> + perf_pmu_disable(pmu);
> +
> + pmu->munmap();
> +
> + perf_pmu_enable(pmu);
> +
> + perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
> + }
> + local_irq_restore(flags);
> +}
> +
> static void perf_event_switch(struct task_struct *task,
> struct task_struct *next_prev, bool sched_in);
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 5f2b2b184c60..61978ad8c480 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2777,6 +2777,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> /*
> * Remove the vma's, and unmap the actual pages
> */
> + perf_event_munmap();
> detach_vmas_to_be_unmapped(mm, vma, prev, end);
> unmap_region(mm, vma, prev, start, end);
>
> --
> 2.17.1
>