Re: [PATCH v1 2/2] perf/core: Fake regs for leaked kernel samples

From: Jin, Yao
Date: Fri Aug 07 2020 - 01:32:50 EST

Hi Peter,

On 8/6/2020 5:24 PM, peterz@xxxxxxxxxxxxx wrote:
On Thu, Aug 06, 2020 at 11:18:27AM +0200, peterz@xxxxxxxxxxxxx wrote:
On Thu, Aug 06, 2020 at 10:26:29AM +0800, Jin, Yao wrote:

+static struct pt_regs *sanitize_sample_regs(struct perf_event *event, struct pt_regs *regs)
+ struct pt_regs *sample_regs = regs;
+ /* user only */
+ if (!event->attr.exclude_kernel || !event->attr.exclude_hv ||
+ !event->attr.exclude_host || !event->attr.exclude_guest)
+ return sample_regs;

Is this condition correct?

Say counting user event on host, exclude_kernel = 1 and exclude_host = 0. It
will go "return sample_regs" path.

I'm not sure, I'm terminally confused on virt stuff.


Suppose we have nested virt:


And we're running in G0, then:

- 'exclude_hv' would exclude L0 events
- 'exclude_host' would ... exclude L1-hv events?
- 'exclude_guest' would ... exclude G1 events?


Then the next question is, if G0 is a host, does the L1-hv run in
G0 userspace or G0 kernel space?

I was assuming G0 userspace would not include anything L1 (kvm is a
kernel module after all), but what do I know.

@@ -11609,7 +11636,8 @@ SYSCALL_DEFINE5(perf_event_open,
if (err)
return err;
- if (!attr.exclude_kernel) {
+ if (!attr.exclude_kernel || !attr.exclude_callchain_kernel ||
+ !attr.exclude_hv || !attr.exclude_host || !attr.exclude_guest) {
err = perf_allow_kernel(&attr);
if (err)
return err;

I can understand the conditions "!attr.exclude_kernel || !attr.exclude_callchain_kernel".

But I'm not very sure about the "!attr.exclude_hv || !attr.exclude_host || !attr.exclude_guest".

Well, I'm very sure G0 userspace should never see L0 or G1 state, so
exclude_hv and exclude_guest had better be true.

On host, exclude_hv = 1, exclude_guest = 1 and exclude_host = 0, right?

Same as above, is G0 host state G0 userspace?

So even exclude_kernel = 1 but exclude_host = 0, we will still go
perf_allow_kernel path. Please correct me if my understanding is wrong.

Yes, because with those permission checks in place it means you have
permission to see kernel bits.

So if I understand 'exclude_host' wrong -- a distinct possibility -- can
we then pretty please have the above [A-B] corrected and put in a
comment near perf_event_attr and the exclude_* comments changed to refer
to that?

In my previous mail, I explained what I understood for 'exclude_host', but not sure if it's correct. Needs more review comments.

Jin Yao