Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

From: Andy Lutomirski
Date: Fri May 20 2016 - 13:00:09 EST


On Fri, May 20, 2016 at 9:41 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> On Fri, May 20, 2016 at 08:41:00AM -0700, Andy Lutomirski wrote:
>> On Fri, May 20, 2016 at 7:05 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>> > On Thu, May 19, 2016 at 04:39:51PM -0700, Andy Lutomirski wrote:
>> >> On Thu, May 19, 2016 at 4:15 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>> >> > Note this example is with today's unwinder. It could be made smarter to
>> >> > get the RIP from the pt_regs so the '?' could be removed from
>> >> > copy_page_to_iter().
>> >> >
>> >> > Thoughts?
>> >>
>> >> I think we should do that. The silly sample patch I sent you (or at
>> >> least that I think I sent you) did that, and it worked nicely.
>> >
>> > Yeah, we can certainly do something similar to make the unwinder
>> > smarter. It should be very simple with this approach: if it finds the
>> > pt_regs() function on the stack, the (struct pt_regs *) pointer will be
>> > right after it.
>>
>> That seems barely easier than checking if it finds a function in
>> .entry that's marked on the stack,
>
> Don't forget you'd also have to look up the function's pt_regs offset in
> the table.
>
>> and the latter has no runtime cost.
>
> Well, I had been assuming the three extra pushes and one extra pop for
> interrupts would be negligible, but I'm definitely no expert there. Any
> idea how I can measure it?

I think it would be negligible, at least for interrupts, since
interrupts are already extremely expensive. But I don't love adding
assembly code that makes them even slower. The real thing I dislike
about this approach is that it's not a normal stack frame, so you need
code in the unwinder to unwind through it correctly, which makes me
think that you're not saving much complexity by adding the pushes.
But maybe I'm wrong.

>
>> > I'm not sure about the idea of a table. I get the feeling it would add
>> > more complexity to both the entry code and the unwinder. How would you
>> > specify the pt_regs location when it's on a different stack? (See the
>> > interrupt macro: non-nested interrupts will place pt_regs on the task
>> > stack before switching to the irq stack.)
>>
>> Hmm. I need to think about the interrupt stack case a bit. Although
>> the actual top of the interrupt stack has a nearly fixed format, and I
>> have old patches to clean it up and make it actually be fixed. I'll
>> try to dust those off and resend them soon.
>
> How would that solve the problem? Would pt_regs be moved or copied to
> the irq stack?

Hmm.

Maybe the right way would be to unwind off the IRQ stack in two steps.
Step one would be to figure out that you're on the IRQ stack and pop
your way off it. Step two would be to find pt_regs.

But that could be rather nasty to implement. Maybe what we actually
want to do is this:

First, apply this thing:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=2231ec7e0bcc1a2bc94a17081511ab54cc6badd1

that gives us a common format for the IRQ stack.

Second, use my idea of making a table of offsets to pt_regs, so we'd
have, roughly:

ENTER_IRQ_STACK old_rsp=%r11
call __do_softirq
ANNOTATE_IRQSTACK_PTREGS_CALL offset=0
LEAVE_IRQ_STACK

the idea here is that offset=0 means that there is no offset beyond
that implied by ENTER_IRQ_STACK. What actually gets written to the
table is offset 8, because ENTER_IRQ_STACK itself does one push.

So far, this has no runtime overhead at all.

Then, in the unwinder, the logic becomes:

If the return address is annotated in the ptregs entry table, look up
the offset. If the offset is in bounds, then use it. If the offset
is out of bounds and we're on the IRQ stack, then unwind the
ENTER_IRQ_STACK as well.

Does that seem reasonable? I can try to implement it and see what it
looks like.

--Andy

>
>> > If you're worried about performance, I can remove the syscall
>> > annotations. They're optional anyway, since the pt_regs is always at
>> > the same place on the stack for syscalls.
>> >
>> > I think three extra pushes wouldn't be a performance issue for
>> > interrupts/exceptions. And they'll go away when we finally bury
>> > CONFIG_FRAME_POINTER.
>>
>> I bet we'll always need to support CONFIG_FRAME_POINTER for some
>> embedded systems.
>
> Yeah, probably.
>
>> I'll play with this a bit.
>
> Thanks, looking forward to seeing what you come up with.
>
> --
> Josh



--
Andy Lutomirski
AMA Capital Management, LLC