Re: [PATCH] perf/core: generate overflow signal when samples are dropped (WAS: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region)

From: Peter Zijlstra
Date: Tue Jul 04 2017 - 05:03:31 EST


On Wed, Jun 28, 2017 at 03:55:07PM -0700, Kyle Huey wrote:

> > Having thought about this some more, I think Vince does make a good
> > point that throwing away samples is liable to break stuff, e.g. that
> > which only relies on (non-sensitive) samples.
> >
> > It still seems wrong to make up data, though.

It is something we do in other places as well though. For example the
printk() %pK thing fakes NULL pointers when kptr_restrict is set.

Faking data gets a wee bit tricky in how much data we need to clear
through, its not only IP, pretty much everything we get from the
interrupt context, like the branch stack and registers is also suspect.

> > Maybe for exclude_kernel && !exclude_user events we can always generate
> > samples from the user regs, rather than the exception regs. That's going
> > to be closer to what the user wants, regardless. I'll take a look
> > tomorrow.
>
> I'm not very familiar with the kernel internals, but the reason I
> didn't suggest this originally is it seems like it will be difficult
> to determine what the "correct" userspace registers are. For example,
> what happens if a performance counter is fixed to a given tid, the
> interrupt fires during a context switch from that task to another that
> is not being monitored, and the kernel is far enough along in the
> context switch that the current task struct has been switched out?
> Reporting the new task's registers seems as bad as reporting the
> kernel's registers. But maybe this is easier than I imagine for
> whatever reason.

If the counter is fixed to a task then its scheduled along with the
task. We'll schedule out the event before doing the actual task switch
and switch in the new event after.

That said, with a per-cpu event the TID sample value is indeed subject
to skid like you describe.