Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

From: Josh Poimboeuf
Date: Thu Jun 23 2016 - 11:55:44 EST


On Wed, Jun 22, 2016 at 05:09:11PM -0700, Andy Lutomirski wrote:
> On Wed, Jun 22, 2016 at 9:30 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > Andy,
> >
> > So I got a chance to look at this some more. I'm thinking that to make
> > this feature more consistently useful, we shouldn't only annotate
> > pt_regs frames for calls to handlers; other calls should be annotated as
> > well: preempt_schedule_irq, CALL_enter_from_user_mode,
> > prepare_exit_to_usermode, SWAPGS, TRACE_IRQS_OFF, DISABLE_INTERRUPTS,
> > etc. That way, the unwinder will always be able to find pt_regs from an
> > interrupt/exception, even if starting from one of these other calls.
> >
> > But then, things get ugly. You have to either setup and tear down the
> > frame for every possible call, or do a higher-level setup/teardown
> > across multiple calls, which invalidates several assumptions in the
> > entry code about the location of pt_regs on the stack.
> >
>
> Here's yet another harebrained idea. Maybe it works better than my
> previous harebrained ideas :)
>
> Your patch is already creating a somewhat nonstandard stack frame:
>
> + movq %rbp, 0*8(%rsp)
> + movq $entry_frame_ret, 1*8(%rsp)
> + movq %rsp, %rbp
>
> It's kind of a normal stack frame, but rbp points at something odd,
> and to unwind it fully correctly, the unwinder needs to know about it.
>
> What if we made it even more special, along the lines of:
>
> leaq offset_to_ptregs(%rsp), %rbp
> xorq $-1, %rbp
>
> IOW, don't write anything to the stack at all, and just put a special
> value into RBP that says "the next frame is pt_regs at such-and-such
> address". Do this once on entry and make sure to restore RBP (from
> pt_regs) on exit. Now the unwinder can notice that RBP has the high
> bits clear *and* that the negation of it points to the stack, and it
> can figure out what's going on.
>
> What do you think? Am I nuts or could this work?
>
> It had better not have much risk of breaking things worse than they
> currently are, given that current kernel allow user code to stick any
> value it likes into the very last element of the RBP chain.

I think it's a good idea, and it could work... BUT it would break
external unwinders like gdb for the in-kernel entry case.

For interrupts and exceptions in kernel mode, rbp *is* valid. Sure, it
doesn't tell you the interrupted function, but it does tell you its
caller. A generic frame pointer unwinder skips the interrupted
function, but at least it keeps going. If we encoded rbp on entry, that
would break.

--
Josh