Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

From: Andy Lutomirski
Date: Mon May 23 2016 - 23:52:37 EST


On May 23, 2016 7:28 PM, "Josh Poimboeuf" <jpoimboe@xxxxxxxxxx> wrote:
> > Maybe I'm coming around to liking this idea.
>
> Ok, good :-)
>
> > In an ideal world (DWARF support, high-quality unwinder, nice pretty
> > printer, etc), unwinding across a kernel exception would look like:
> >
> > - some_func
> > - some_other_func
> > - do_page_fault
> > - page_fault
> >
> > After page_fault, the next unwind step takes us to the faulting RIP
> > (faulting_func) and reports that all GPRs are known. It should
> > probably learn this fact from DWARF if DWARF is available, instead of
> > directly decoding pt_regs, due to a few funny cases in which pt_regs
> > may be incomplete. A nice pretty printer could now print all the
> > regs.
> >
> > - faulting_func
> > - etc.
> >
> > For this to work, we don't actually need the unwinder to explicitly
> > know where pt_regs is.
>
> That's true (but only for DWARF).
>
> > Food for thought, though: if user code does SYSENTER with TF set,
> > you'll end up with partial pt_regs. There's nothing the kernel can do
> > about it. DWARF will handle it without any fanfare, but I don't know
> > if it's going to cause trouble for you pre-DWARF.
>
> In this case it should see the stack pointer is past the pt_regs offset,
> so it would just report it as an empty stack.

OK

>
> > I'm also not sure it makes sense to apply this before the unwinder
> > that can consume it is ready. Maybe, if it would be consistent with
> > your plans, it would make sense to rewrite the unwinder first, then
> > apply this and teach live patching to use the new unwinder, and *then*
> > add DWARF support?
>
> For the purposes of livepatch, the reliable unwinder needs to detect
> whether an interrupt/exception pt_regs frame exists on a sleeping task
> (or current). This patch would allow us to do that.
>
> So my preferred order of doing things would be:
>
> 1) Brian Gerst's switch_to() cleanup and any related unwinder fixes
> 2) this patch for annotating pt_regs stack frames
> 3) reliable unwinder, similar to what I already posted, except it relies
> on this patch instead of PF_PREEMPT_IRQ, and knows how to deal with
> the new inactive task frame format of #1
> 4) livepatch consistency model which uses the reliable unwinder
> 5) rewrite unwinder, and port all users to the new interface
> 6) add DWARF unwinder
>
> 1-4 are pretty much already written, whereas 5 and 6 will take
> considerably more work.

Fair enough.

>
> > > + /*
> > > + * Create a stack frame for the saved pt_regs. This allows frame
> > > + * pointer based unwinders to find pt_regs on the stack.
> > > + */
> > > + .macro CREATE_PT_REGS_FRAME regs=%rsp
> > > +#ifdef CONFIG_FRAME_POINTER
> > > + pushq \regs
> > > + pushq $pt_regs+1
> >
> > Why the +1?
>
> Some unwinders like gdb are smart enough to report the function which
> contains the instruction *before* the return address. Without the +1,
> they would show the wrong function.

Lovely. Want to add a comment?

>
> > > + pushq %rbp
> > > + movq %rsp, %rbp
> > > +#endif
> > > + .endm
> >
> > I keep wanting this to be only two pushes and to fudge rbp to make it
> > work, but I don't see a good way. But let's call it
> > CREATE_NESTED_ENTRY_FRAME or something, and let's rename pt_regs to
> > nested_frame or similar.
>
> Or, if we aren't going to annotate syscall pt_regs, we could give it a
> more specific name. CREATE_INTERRUPT_FRAME and interrupt_frame()?

CREATE_INTERRUPT_FRAME is confusing because it applies to idtentry,
too. CREATE_PT_REGS_FRAME is probably fine.

> > > +
> > > +/* fake function which allows stack unwinders to detect pt_regs frames */
> > > +#ifdef CONFIG_FRAME_POINTER
> > > +ENTRY(pt_regs)
> > > + nop
> > > + nop
> > > +ENDPROC(pt_regs)
> > > +#endif /* CONFIG_FRAME_POINTER */
> >
> > Why is this two bytes long? Is there some reason it has to be more
> > than one byte?
>
> Similar to above, this is related to the need to support various
> unwinders. Whether the unwinder displays the ret_addr or the
> instruction preceding it, either way the instruction needs to be inside
> the function for the function to be reported.

OK.

--Andy