Re: [PATCH v1 2/2] perf/core: Fake regs for leaked kernel samples

From: peterz
Date: Thu Aug 06 2020 - 05:20:11 EST


On Thu, Aug 06, 2020 at 10:26:29AM +0800, Jin, Yao wrote:

> > +static struct pt_regs *sanitize_sample_regs(struct perf_event *event, struct pt_regs *regs)
> > +{
> > + struct pt_regs *sample_regs = regs;
> > +
> > + /* user only */
> > + if (!event->attr.exclude_kernel || !event->attr.exclude_hv ||
> > + !event->attr.exclude_host || !event->attr.exclude_guest)
> > + return sample_regs;
> > +
>
> Is this condition correct?
>
> Say counting user event on host, exclude_kernel = 1 and exclude_host = 0. It
> will go "return sample_regs" path.

I'm not sure, I'm terminally confused on virt stuff.

Suppose we have nested virt:

L0-hv
|
G0/L1-hv
|
G1

And we're running in G0, then:

- 'exclude_hv' would exclude L0 events
- 'exclude_host' would ... exclude L1-hv events?
- 'exclude_guest' would ... exclude G1 events?

Then the next question is, if G0 is a host, does the L1-hv run in
G0 userspace or G0 kernel space?

I was assuming G0 userspace would not include anything L1 (kvm is a
kernel module after all), but what do I know.

> > @@ -11609,7 +11636,8 @@ SYSCALL_DEFINE5(perf_event_open,
> > if (err)
> > return err;
> > - if (!attr.exclude_kernel) {
> > + if (!attr.exclude_kernel || !attr.exclude_callchain_kernel ||
> > + !attr.exclude_hv || !attr.exclude_host || !attr.exclude_guest) {
> > err = perf_allow_kernel(&attr);
> > if (err)
> > return err;
> >
>
> I can understand the conditions "!attr.exclude_kernel || !attr.exclude_callchain_kernel".
>
> But I'm not very sure about the "!attr.exclude_hv || !attr.exclude_host || !attr.exclude_guest".

Well, I'm very sure G0 userspace should never see L0 or G1 state, so
exclude_hv and exclude_guest had better be true.

> On host, exclude_hv = 1, exclude_guest = 1 and exclude_host = 0, right?

Same as above, is G0 host state G0 userspace?

> So even exclude_kernel = 1 but exclude_host = 0, we will still go
> perf_allow_kernel path. Please correct me if my understanding is wrong.

Yes, because with those permission checks in place it means you have
permission to see kernel bits.