Re: [PATCH v2] [LBR] Dump LBRs on Exception
From: Andy Lutomirski
Date: Thu Dec 04 2014 - 13:10:23 EST
On Thu, Dec 4, 2014 at 8:01 AM, Berthier, Emmanuel
>> From: Andy Lutomirski [mailto:luto@xxxxxxxxxxxxxx]
>> Sent: Wednesday, December 3, 2014 8:30 PM
>> To: Berthier, Emmanuel
>> Cc: Thomas Gleixner; H. Peter Anvin; X86 ML; Jarzmik, Robert; LKML
>> Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception
>> > The final patch will bypass the new code in case of UserSpace page fault, so
>> performance impact will be very low.
>> > LBRs copy takes much more time than LBR stop/start.
>> > The simple is the better:
>> > .macro STOP_LBR
>> > #ifdef CONFIG_LBR_DUMP_ON_EXCEPTION
>> > testl $3,CS(%rsp) /* Kernel Space? */
>> > jnz 1f
>> > testl $3, PER_CPU_VAR(lbr_dump_state) /* Disabled? */
>> > jnz 1f
>> But that just wasted two of your LBR slots.
> No: false test does not generate Branch record, ex:
> Last Branch Records:
> to: [<ffffffff828122a0>] page_fault+0x0/0x90
> from: [<ffffffff823c0e06>] sysrq_handle_crash+0x16/0x20
> to: [<ffffffff823c0df0>] sysrq_handle_crash+0x0/0x20
> from: [<ffffffff823c156c>] __handle_sysrq+0x9c/0x170
> to: [<ffffffff823c1562>] __handle_sysrq+0x92/0x170
>> > push %rax
>> > push %rcx
>> > push %rdx
>> > movl $MSR_IA32_DEBUGCTLMSR, %ecx
>> > rdmsr
>> > and $~1, %eax /* Disable LBR recording */
>> > wrmsr
>> > pop %rdx
>> > pop %rcx
>> > pop %rax
>> And the general problem with this approach (even ignoring the performance
>> hit, and kernel faults on user addresses really do happen in real workloads) is
>> that you're not saving and restoring MSR_IA32_DEBUGCTL.
>> It may be that
>> the rest of your patch does whatever magic is needed to make this work, but
>> from just this code it's not at all obvious that this is correct.
> The algorithm is quite simple:
> When I enter in Exception handler, I stop LBR recording, and dump its content later if needed.
> When I leave Exception Handler, I restart LBR recording.
> So, after the first exception, LBR in On.
> In case of nested Exceptions and crash, you're right, LBR will probably not be relevant.
> But your proposal does not solve this issue: If we save registers during 1rst exception, and then overwrite them during 2nd level,
> we will lose relevant info if crash is due to the 1rst exception.
>> Hence my suggestion for rdmsr -- if you're willing to enable this and take the
>> performance hit, you can simplify it a lot and save some branch slots by
>> unconditionally doing the rdmsrs if you've enabled the LBR tracing IDT entry.
>> The simplification from using rdmsr isn't that the save code is simplified -- it's
>> that there's no state change on exception entry, so you don't need to worry
>> about restoring state correctly on the way out or during a context switch.
>> And you can enable/disable the whole thing just by writing to the IDT, so
>> there's no performance hit at all in the disabled case.
> Concerning performances: if it's really matter, the better is to disable the CONFIG switch.
> But if we enable it, it's for using it I guess, and in that case, bypassing UserSpace page faults is better.
> You're proposal of "unconditionally doing the rdmsrs" is not good in that case.
> The only small gain is when CONFIG is enable and feature is disabled by cmdline:
> - with my proposal, we get 1 test and 1 jmp more (if I switch Kernel test with LBR state test): for an exception treatment, does it really matter?
> We can mix our proposals: keep my STOP/START code, and replace the dynamic disabling test by IDT change.
> I hope the code will stay readable.
> Do we really want to save 2 instructions?
I don't really care about the number of instructions. But there are
still all the nasty cases:
- Context switch during exception processing (both in the C handler
and in the retint code).
- PMI during exception processing.
- Exception while perf is poking at LBR msrs.
Where are you planning on saving the start/stop previous state?
AMA Capital Management, LLC
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/