Re: [PATCH v1 2/2] perf/core: Fake regs for leaked kernel samples

From: Jin, Yao
Date: Tue Aug 04 2020 - 22:15:31 EST


Hi Peter,

On 8/4/2020 7:49 PM, peterz@xxxxxxxxxxxxx wrote:
On Fri, Jul 31, 2020 at 10:56:17AM +0800, Jin Yao wrote:
@@ -6973,7 +6973,8 @@ static struct perf_callchain_entry __empty_callchain = { .nr = 0, };
struct perf_callchain_entry *
perf_callchain(struct perf_event *event, struct pt_regs *regs)
{
- bool kernel = !event->attr.exclude_callchain_kernel;
+ bool kernel = !event->attr.exclude_callchain_kernel &&
+ !event->attr.exclude_kernel;

This seems weird; how can we get there. Also it seems to me that if you
have !exclude_callchain_kernel you already have permission for kernel
bits, so who cares.


In perf tool, exclude_callchain_kernel is set to 1 when perf-record only collects the user callchains and exclude_kernel is set to 1 when events are configured to run in user space.

So if an event is configured to run in user space, that should make sense we don't need it's kernel callchains.

But it seems to me there is no code logic in perf tool which can make sure !exclude_callchain_kernel -> !exclude_kernel.

Jiri, Arnaldo, is my understanding correct?

bool user = !event->attr.exclude_callchain_user;
/* Disallow cross-task user callchains. */
bool crosstask = event->ctx->task && event->ctx->task != current;
@@ -6988,12 +6989,39 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
return callchain ?: &__empty_callchain;
}
+static struct pt_regs *leak_check(struct perf_event_header *header,
+ struct perf_event *event,
+ struct pt_regs *regs)
+{
+ struct pt_regs *regs_fake = NULL;
+
+ if (event->attr.exclude_kernel && !user_mode(regs)) {
+ if (!(current->flags & PF_KTHREAD)) {
+ regs_fake = task_pt_regs(current);
+ if (!user_mode(regs_fake)) {
+ regs_fake = NULL;
+ instruction_pointer_set(regs, -1L);
+ }
+ } else
+ instruction_pointer_set(regs, -1L);

That violates coding style, also:


Thanks, I should use:

} else {
instruction_pointer_set(regs, -1L);
}

if (!(current->flags & PF_KTHREAD)) {
regs_fake = task_pt_regs(current);
if (!user_mode(regs_fake)) /* is this not a BUG? */

We don't need !user_mode(regs_fake) check here, it's unnecessary check.

regs_fake = NULL;
}

if (!regs_fake)
instruction_pointer_set(regs, -1L);

Seems simpler to me.


So the new code looks like:

if (event->attr.exclude_kernel && !user_mode(regs)) {
if (!(current->flags & PF_KTHREAD)) {
regs_fake = task_pt_regs(current);
if (!regs_fake)
instruction_pointer_set(regs, -1L);
} else {
instruction_pointer_set(regs, -1L);
}

+ if ((header->misc & PERF_RECORD_MISC_CPUMODE_MASK) ==
+ PERF_RECORD_MISC_KERNEL) {
+ header->misc &= ~PERF_RECORD_MISC_CPUMODE_MASK;
+ header->misc |= PERF_RECORD_MISC_USER;
+ }

Why the conditional? At this point it had better be unconditionally
user, no?

headers->misc &= ~PERF_RECORD_MISC_CPUMODE_MASK;
headers->misc |= PERF_RECORD_MISC_USER;


#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0)
#define PERF_RECORD_MISC_CPUMODE_UNKNOWN (0 << 0)
#define PERF_RECORD_MISC_KERNEL (1 << 0)
#define PERF_RECORD_MISC_USER (2 << 0)
#define PERF_RECORD_MISC_HYPERVISOR (3 << 0)
#define PERF_RECORD_MISC_GUEST_KERNEL (4 << 0)
#define PERF_RECORD_MISC_GUEST_USER (5 << 0)

If we unconditionally set user, it will reset for hypervisor, guest kernel and guest_user.

Thanks
Jin Yao

+ }
+
+ return regs_fake;
+}