RE: [PATCH v2] [LBR] Dump LBRs on Exception

From: Berthier, Emmanuel
Date: Thu Dec 04 2014 - 11:02:22 EST


> From: Andy Lutomirski [mailto:luto@xxxxxxxxxxxxxx]
> Sent: Wednesday, December 3, 2014 8:30 PM
> To: Berthier, Emmanuel
> Cc: Thomas Gleixner; H. Peter Anvin; X86 ML; Jarzmik, Robert; LKML
> Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception
> > The final patch will bypass the new code in case of UserSpace page fault, so
> performance impact will be very low.
> > LBRs copy takes much more time than LBR stop/start.
> >
> > The simple is the better:
> >
> > .macro STOP_LBR
> > #ifdef CONFIG_LBR_DUMP_ON_EXCEPTION
> > testl $3,CS(%rsp) /* Kernel Space? */
> > jnz 1f
> > testl $3, PER_CPU_VAR(lbr_dump_state) /* Disabled? */
> > jnz 1f
>
> But that just wasted two of your LBR slots.

No: false test does not generate Branch record, ex:

Last Branch Records:
to: [<ffffffff828122a0>] page_fault+0x0/0x90
from: [<ffffffff823c0e06>] sysrq_handle_crash+0x16/0x20
to: [<ffffffff823c0df0>] sysrq_handle_crash+0x0/0x20
from: [<ffffffff823c156c>] __handle_sysrq+0x9c/0x170
to: [<ffffffff823c1562>] __handle_sysrq+0x92/0x170

> > push %rax
> > push %rcx
> > push %rdx
> > movl $MSR_IA32_DEBUGCTLMSR, %ecx
> > rdmsr
> > and $~1, %eax /* Disable LBR recording */
> > wrmsr
> > pop %rdx
> > pop %rcx
> > pop %rax
>
> And the general problem with this approach (even ignoring the performance
> hit, and kernel faults on user addresses really do happen in real workloads) is
> that you're not saving and restoring MSR_IA32_DEBUGCTL.
> It may be that
> the rest of your patch does whatever magic is needed to make this work, but
> from just this code it's not at all obvious that this is correct.

The algorithm is quite simple:
When I enter in Exception handler, I stop LBR recording, and dump its content later if needed.
When I leave Exception Handler, I restart LBR recording.
So, after the first exception, LBR in On.
In case of nested Exceptions and crash, you're right, LBR will probably not be relevant.
But your proposal does not solve this issue: If we save registers during 1rst exception, and then overwrite them during 2nd level,
we will lose relevant info if crash is due to the 1rst exception.

> Hence my suggestion for rdmsr -- if you're willing to enable this and take the
> performance hit, you can simplify it a lot and save some branch slots by
> unconditionally doing the rdmsrs if you've enabled the LBR tracing IDT entry.
> The simplification from using rdmsr isn't that the save code is simplified -- it's
> that there's no state change on exception entry, so you don't need to worry
> about restoring state correctly on the way out or during a context switch.
> And you can enable/disable the whole thing just by writing to the IDT, so
> there's no performance hit at all in the disabled case.

Concerning performances: if it's really matter, the better is to disable the CONFIG switch.
But if we enable it, it's for using it I guess, and in that case, bypassing UserSpace page faults is better.
You're proposal of "unconditionally doing the rdmsrs" is not good in that case.
The only small gain is when CONFIG is enable and feature is disabled by cmdline:
- with my proposal, we get 1 test and 1 jmp more (if I switch Kernel test with LBR state test): for an exception treatment, does it really matter?

We can mix our proposals: keep my STOP/START code, and replace the dynamic disabling test by IDT change.
I hope the code will stay readable.
Do we really want to save 2 instructions?

Thanks,

Emmanuel.