Re: çåï[PATCH] perf core: Use KSTK_ESP() instead of pt_regs->sp while output user regs

From: Andy Lutomirski
Date: Tue Dec 30 2014 - 18:30:21 EST


On Dec 30, 2014 11:03 AM, "Peter Zijlstra" <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Dec 25, 2014 at 07:48:28AM -0800, Andy Lutomirski wrote:
> > On a quick look, there are plenty of other bugs in there besides just
> > the stack pointer issue. The ABI check that uses TIF_IA32 in the perf
> > core is completely wrong. TIF_IA32 may be equal to the actual
> > userspace bitness by luck, but, if so, that's more or less just luck.
> > And there's a user_mode test that should be user_mode_vm.
> >
> > Also, it's not just sp that's wrong. There are various places that
> > you can interrupt in which many of the registers have confusing
> > locations. You could try using the cfi unwind data, but that's
> > unlikely to work for regs like cs and ss, and, during context switch,
> > this has very little chance of working.
> >
> > What's the point of this feature? Honestly, my suggestion would be to
> > delete it instead of trying to fix it. It's also not clear to me that
> > there aren't serious security problems here -- it's entirely possible
> > for sensitive *kernel* values to and up in task_pt_regs at certain
> > times, and if you run during context switch and there's no code to
> > suppress this dump during context switch, then you could be showing
> > regs that belong to the wrong task.
>
> Of course the people who actually wrote the code are not on CC :/
>
> There's two users of this iirc;
>
> 1) the dwarf stack unwinder thingy, which basically dumps the userspace
> regs and the top of userspace stack on 'event'.
>

Given how the x86_64* entry code works, using task_pt_regs from
anywhere except explicitly supported contexts (including exceptions
that originated in userspace and a small handful of system calls) is
asking for trouble. NMI context is especially bad.

How important is this feature, and which registers matter? It might
be possible to use a dwarf unwinder on the kernel call stack to get
most of the regs from most contexts, and it might also be possible to
make small changes to the entry code to make it possible to get some
of the registers reliably, but it's not currently possible to safely
use task_pt_regs *at all* from NMI context unless you've at least
blacklisted a handful of origin RIP values that give dangerously bogus
results. (Using do_nmi's regs parameter if user_mode_vm(regs) is a
different story.)

* I'm not nearly as familiar with the 32-bit entry code, so I don't
know whether we have the same issues there.

> 2) the recent sample_regs_intr, which dumps the register set at
> 'event', be it kernel or userspace.
>

What's wrong with the PMI's pt_regs for that? If we interrupted the
kernel, they'll be kernel regs (with all their attendant security
issues) and, if we interrupted userspace, then they'll be the full,
correct userspace registers.

--Andy

>
> The first is somewhat usable when lacking framepointers while still
> desiring some unwind information, the second is useful to things like
> call argument profiling and the like.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/