Re: [PATCH] perf/x86/intel: Drop kernel samples even though :u is specified

From: Peter Zijlstra
Date: Fri May 19 2017 - 05:29:24 EST


On Fri, May 19, 2017 at 06:19:12PM +0800, Jin Yao wrote:
> When doing sampling without PEBS
>
> perf record -e cycles:u ...
>
> On workloads that do a lot of kernel entry/exits we see kernel
> samples, even though :u is specified. This is due to skid existing.
>
> This is a security issue because it can leak kernel addresses even
> though kernel sampling support is disabled.
>
> The patch drops the kernel samples if exclude_kernel is specified.
>
> For example, test on Haswell desktop.
>
> perf record -e cycles:u <mgen>
> perf report --stdio
>
> Before patch applied:
>
> 99.77% mgen mgen [.] buf_read
> 0.20% mgen mgen [.] rand_buf_init
> 0.01% mgen [kernel.vmlinux] [k] apic_timer_interrupt


> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 580b60f..e6745e1 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1463,6 +1463,12 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
> if (!x86_perf_event_set_period(event))
> continue;
>
> + /*
> + * For security, drop the skid kernel samples.
> + */
> + if (skid_kernel_samples(event, regs))
> + continue;
> +
> if (perf_event_overflow(event, &data, regs))
> x86_pmu_stop(event, 0);
> }
> @@ -1679,6 +1685,24 @@ ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
> pmu_attr->event_str_noht);
> }
>
> +bool skid_kernel_samples(struct perf_event *event, struct pt_regs *regs)
> +{
> + u64 ip;
> +
> + /*
> + * Without PEBS, we may get kernel samples even though
> + * exclude_kernel is specified due to skid in sampling.
> + */
> + if ((event->attr.exclude_kernel) &&
> + (event->attr.sample_type & PERF_SAMPLE_IP)) {
> + ip = perf_instruction_pointer(regs);
> + if (kernel_ip(ip))
> + return true;
> + }
> +
> + return false;
> +}
> +
> EVENT_ATTR(cpu-cycles, CPU_CYCLES );
> EVENT_ATTR(instructions, INSTRUCTIONS );
> EVENT_ATTR(cache-references, CACHE_REFERENCES );


I would much rather see this in generic code, somewhere around
__perf_event_overflow() I suppose. That would retain proper accounting
for the interrupt rate etc..

Also it would work for all architectures. Because I'm thinking more than
just x86 will suffer from skid.

If you're really worried, I suppose you can put it behind a PERF_PMU_CAP
flag or something.