Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking
From: Andy Lutomirski
Date: Wed Jun 22 2016 - 14:26:45 EST
On Wed, Jun 22, 2016 at 11:22 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> On Wed, Jun 22, 2016 at 10:59:23AM -0700, Andy Lutomirski wrote:
>> > So I got a chance to look at this some more. I'm thinking that to make
>> > this feature more consistently useful, we shouldn't only annotate
>> > pt_regs frames for calls to handlers; other calls should be annotated as
>> > well: preempt_schedule_irq, CALL_enter_from_user_mode,
>> > prepare_exit_to_usermode, SWAPGS, TRACE_IRQS_OFF, DISABLE_INTERRUPTS,
>> > etc. That way, the unwinder will always be able to find pt_regs from an
>> > interrupt/exception, even if starting from one of these other calls.
>> >
>> > But then, things get ugly. You have to either setup and tear down the
>> > frame for every possible call, or do a higher-level setup/teardown
>> > across multiple calls, which invalidates several assumptions in the
>> > entry code about the location of pt_regs on the stack.
>> >
>> > Also problematic is that several of the macros (like TRACE_IRQS_IRETQ)
>> > make assumptions about the location of pt_regs. And they're used by
>> > both syscall and interrupt code. So if we didn't create a frame pointer
>> > header for syscalls, we'd basically need two versions of the macros: one
>> > for irqs/exceptions and one for syscalls.
>> >
>> > So I think the cleanest way to handle this is to always allocate two
>> > extra registers on the stack in ALLOC_PT_GPREGS_ON_STACK. Then all
>> > entry code can assume that pt_regs is at a constant location, and all
>> > the above problems go away. Another benefit is that we'd only need two
>> > saves instead of three -- the pointer to pt_regs is no longer needed
>> > since pt_regs is always immediately after the frame header.
>> >
>> > I worked up a patch to implement this -- see below. It writes the frame
>> > pointer in all entry paths, including syscalls. This helps keep the
>> > code simple.
>> >
>> > The downside is a small performance penalty: with getppid()-in-a-loop on
>> > my laptop, the average syscall went from 52ns to 53ns, which is about a
>> > 2% slowdown. But I doubt it would be measurable in a real-world
>> > workload.
>> >
>> > It looks like about half the slowdown is due to the extra stack
>> > allocation (which presumably adds a little d-cache pressure on the stack
>> > memory) and the other half is due to the stack writes.
>> >
>> > I could remove the writes from the syscall path but it would only save
>> > about half a ns, and it would make the code less robust. Plus it's nice
>> > to have the consistency of having *all* pt_regs frames annotated.
>>
>> This is a bit messy, and I'm not really sure that the entry code
>> should be have to operate under constraints like this. Also,
>> convincing myself this works for NMI sounds unpleasant.
>>
>> Maybe we should go back to my idea of just listing the call sites in a table.
>
> So are you suggesting something like:
>
> .macro ENTRY_CALL func pt_regs_offset=0
> call \func
> 1: .pushsection .entry_calls, "a"
> .long 1b - .
> .long \pt_regs_offset
> .popsection
> .endm
>
> and then change every call in the entry code to ENTRY_CALL?
Yes, exactly, modulo whether the section name is good. hpa is
probably the authority on that.
--Andy