Re: [PATCH] perf: Fix oops when kthread execs user process

From: Peter Zijlstra
Date: Wed May 29 2019 - 12:47:46 EST


On Wed, May 29, 2019 at 05:25:28PM +0100, Will Deacon wrote:

> > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> > > >
> > > > > > if (user_mode(regs)) {
> > > > >
> > > > > Hmm, so it just occurred to me that Mark's observation is that the regs
> > > > > can be junk in some cases. In which case, should we be checking for
> > > > > kthreads first?

> Sorry, I'm not trying to catch you out! Just trying to understand what the
> semantics are supposed to be.
>
> I do find the concept of user_mode(regs) bizarre for the idle task. By the
> above, we definitely have a bug on arm64 (user_mode(regs) tends to be
> true for the idle task), and I couldn't figure out how you avoided it on
> x86. I guess it happens to work because the stack is zero-initialised or
> something?

So lets take the whole thing:

static void perf_sample_regs_user(struct perf_regs *regs_user,
struct pt_regs *regs,
struct pt_regs *regs_user_copy)
{
if (user_mode(regs)) {
regs_user->abi = perf_reg_abi(current);
regs_user->regs = regs;
} else if (!(current->flags & PF_KTHREAD)) {
perf_get_regs_user(regs_user, regs, regs_user_copy);
} else {
regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;
regs_user->regs = NULL;
}
}

This is called from the perf-generate-a-sample path, which is typically
an exception (IRQ/NMI/whatever) or a software/tracepoint thing.

In the exception case, the @regs argument are the exception register, as
provided by your entry.S to your exception handlers. In the
software/tracepoint thing, it is the result of
perf_arch_fetch_caller_regs().

So @regs is always 'sane' and user_mode(regs) tells us if the exception
came from userspace (and software/tracepoints always fail this, they
'obviously' don't come from userspace). If we're idle, we're not from
userspace, so this branch doesn't matter.

Next, we test if there is a userspace part _at_all_, this is the newly
minted: '!(current->flags & PF_KTHREAD)', if that passes, we use
architecture magic -- task_pt_regs() -- to get the user-regs. This can
be crap. But since the idle task will always fail our test (as would
the old one, idle->mm is always NULL), we'll never get here for idle.

Then failing the above two, as we must for idle, we'll default to
ABI_NONE/NULL.


Does that help?