Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

From: Josh Poimboeuf
Date: Mon May 02 2016 - 09:52:55 EST

On Fri, Apr 29, 2016 at 05:08:50PM -0700, Andy Lutomirski wrote:
> On Apr 29, 2016 3:41 PM, "Josh Poimboeuf" <jpoimboe@xxxxxxxxxx> wrote:
> >
> > On Fri, Apr 29, 2016 at 02:37:41PM -0700, Andy Lutomirski wrote:
> > > On Fri, Apr 29, 2016 at 2:25 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > > >> I suppose we could try to rejigger the code so that rbp points to
> > > >> pt_regs or similar.
> > > >
> > > > I think we should avoid doing something like that because it would break
> > > > gdb and all the other unwinders who don't know about it.
> > >
> > > How so?
> > >
> > > Currently, rbp in the entry code is meaningless. I'm suggesting that,
> > > when we do, for example, 'call \do_sym' in idtentry, we point rbp to
> > > the pt_regs. Currently it points to something stale (which the
> > > dump_stack code might be relying on. Hmm.) But it's probably also
> > > safe to assume that if you unwind to the 'call \do_sym', then pt_regs
> > > is the next thing on the stack, so just doing the section thing would
> > > work.
> >
> > Yes, rbp is meaningless on the entry from user space. But if an
> > in-kernel interrupt occurs (e.g. page fault, preemption) and you have
> > nested entry, rbp keeps its old value, right? So the unwinder can walk
> > past the nested entry frame and keep going until it gets to the original
> > entry.
> Yes.
> It would be nice if we could do better, though, and actually notice
> the pt_regs and identify the entry. For example, I'd love to see
> "page fault, RIP=xyz" printed in the middle of a stack dump on a
> crash.
> Also, I think that just following rbp links will lose the
> actual function that took the page fault (or whatever function
> pt_regs->ip actually points to).

Hm. I think we could fix all that in a more standard way. Whenever a
new pt_regs frame gets saved on entry, we could also create a new stack
frame which points to a fake kernel_entry() function. That would tell
the unwinder there's a pt_regs frame without otherwise breaking frame
pointers across the frame.

Then I guess we wouldn't need my other solution of putting the idt
entries in a special section.

How does that sound?

> Have you looked at my vdso unwinding test at all? If we could do
> something similar for the kernel, IMO it would make testing much more
> pleasant.

I found it, but I'm not sure what it would mean to do something similar
for the kernel. Do you mean doing something like an NMI sampling-based
approach where we periodically do a random stack sanity check?

(If so, I do have something like that planned.)