Re: [PATCH v2] [LBR] Dump LBRs on Exception

From: Andy Lutomirski
Date: Tue Dec 02 2014 - 14:34:10 EST


On Tue, Dec 2, 2014 at 11:09 AM, Berthier, Emmanuel
<emmanuel.berthier@xxxxxxxxx> wrote:
>> From: Andy Lutomirski [mailto:luto@xxxxxxxxxxxxxx]
>> Sent: Friday, November 28, 2014 4:15 PM
>> To: Berthier, Emmanuel
>> Cc: Thomas Gleixner; H. Peter Anvin; X86 ML; Jarzmik, Robert; LKML
>> Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception
>>
>> On Fri, Nov 28, 2014 at 12:44 AM, Berthier, Emmanuel
>> <emmanuel.berthier@xxxxxxxxx> wrote:
>> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> > index df088bb..f39cded 100644
>> > --- a/arch/x86/kernel/entry_64.S
>> > +++ b/arch/x86/kernel/entry_64.S
>> > @@ -1035,6 +1035,46 @@ apicinterrupt IRQ_WORK_VECTOR \
>> > irq_work_interrupt smp_irq_work_interrupt #endif
>> >
>> > +.macro STOP_LBR
>> > +#ifdef CONFIG_LBR_DUMP_ON_EXCEPTION
>> > + testl $3,CS+8(%rsp) /* Kernel Space? */
>> > + jz 1f
>> > + testl $1, lbr_dump_on_exception
>>
>> Is there a guarantee that, if lbr_dump_on_exception is true, then LBR is on?
>> What happens if you schedule between stopping and resuming LBR?
>
> Good point. The current assumption is to rely on the numerous exceptions to "re-arm" the LBR recording.
> Even if we bypass UserSpace page faults, we can keep rely on kernel VMalloc page faults to re-arm the recording.

I don't really understand this. Presumably page_fault should leave
the LBR setting exactly the way it found it, because otherwise it'll
need all kinds of fancy coordination with perf. And vmalloc faults
are very rare.

You should also make sure that the perf code is okay with a PMI
nesting *inside* a fault that has disabled LBR. Also, page faults are
rather performance sensitive, so the performance hit from this isn't
so great.

And keep in mind that we can context switch both inside an exception
handler and on the way out, so that all needs to work, too.

TBH, I'm wondering whether this is actually a good idea. It might be
more valuable and less scary to try to make this work for BUG instead.
To get the most impact, it might be worth allocating a new exception
vector for BUG and using 'int 0xwhatever', and the prologue to that
could read out all the MSRs without any branches.

--Andy

>
>> > + jz 1f
>> > + push %rax
>> > + push %rcx
>> > + push %rdx
>> > + movl $MSR_IA32_DEBUGCTLMSR, %ecx
>> > + rdmsr
>> > + and $~1, %eax /* Disable LBR recording */
>> > + wrmsr
>>
>> wrmsr is rather slow. Have you checked whether this is faster than just
>> saving the LBR trace on exception entry?
>
> The figures I have show that for common MSR, rdmsr and wrmsr have quite the same impact, around 100 cycles (greatly depends on the arch).
> The cost of stop/start is: 2 rdmsr + 2 wrmsr = 4 msr
> The cost of reading LBR is: 1 rdmsr for TOS + 2 rdmsr per record, and there are from 8 to 32 Records (Arch specific) = between 17 to 65 msr.
> I've measured on Atom arch (8 records): LBR read versus stop/start: around x3 more time.
> As the LBR size is arch dependent, it's not easy to implement the record reading in ASM without any branch, and this would generate maintenance dependency.
> I prefer to let perf_event_lbr dealing with all that stuff.
>
> Thx,
>
> Emmanuel.
>



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/