Re: [PATCH v1 2/2] perf/core: Fake regs for leaked kernel samples

From: Jin, Yao
Date: Fri Aug 07 2020 - 01:32:50 EST


Hi Peter,

On 8/6/2020 5:24 PM, peterz@xxxxxxxxxxxxx wrote:
On Thu, Aug 06, 2020 at 11:18:27AM +0200, peterz@xxxxxxxxxxxxx wrote:
On Thu, Aug 06, 2020 at 10:26:29AM +0800, Jin, Yao wrote:

+static struct pt_regs *sanitize_sample_regs(struct perf_event *event, struct pt_regs *regs)
+{
+ struct pt_regs *sample_regs = regs;
+
+ /* user only */
+ if (!event->attr.exclude_kernel || !event->attr.exclude_hv ||
+ !event->attr.exclude_host || !event->attr.exclude_guest)
+ return sample_regs;
+

Is this condition correct?

Say counting user event on host, exclude_kernel = 1 and exclude_host = 0. It
will go "return sample_regs" path.

I'm not sure, I'm terminally confused on virt stuff.

[A]

Suppose we have nested virt:

L0-hv
|
G0/L1-hv
|
G1

And we're running in G0, then:

- 'exclude_hv' would exclude L0 events
- 'exclude_host' would ... exclude L1-hv events?
- 'exclude_guest' would ... exclude G1 events?

[B]

Then the next question is, if G0 is a host, does the L1-hv run in
G0 userspace or G0 kernel space?

I was assuming G0 userspace would not include anything L1 (kvm is a
kernel module after all), but what do I know.

@@ -11609,7 +11636,8 @@ SYSCALL_DEFINE5(perf_event_open,
if (err)
return err;
- if (!attr.exclude_kernel) {
+ if (!attr.exclude_kernel || !attr.exclude_callchain_kernel ||
+ !attr.exclude_hv || !attr.exclude_host || !attr.exclude_guest) {
err = perf_allow_kernel(&attr);
if (err)
return err;


I can understand the conditions "!attr.exclude_kernel || !attr.exclude_callchain_kernel".

But I'm not very sure about the "!attr.exclude_hv || !attr.exclude_host || !attr.exclude_guest".

Well, I'm very sure G0 userspace should never see L0 or G1 state, so
exclude_hv and exclude_guest had better be true.

On host, exclude_hv = 1, exclude_guest = 1 and exclude_host = 0, right?

Same as above, is G0 host state G0 userspace?

So even exclude_kernel = 1 but exclude_host = 0, we will still go
perf_allow_kernel path. Please correct me if my understanding is wrong.

Yes, because with those permission checks in place it means you have
permission to see kernel bits.

So if I understand 'exclude_host' wrong -- a distinct possibility -- can
we then pretty please have the above [A-B] corrected and put in a
comment near perf_event_attr and the exclude_* comments changed to refer
to that?


In my previous mail, I explained what I understood for 'exclude_host', but not sure if it's correct. Needs more review comments.

Thanks
Jin Yao