Re: [PATCH v2] [LBR] Dump LBRs on Exception

From: Andy Lutomirski
Date: Thu Nov 27 2014 - 16:56:33 EST


On Thu, Nov 27, 2014 at 1:22 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Thu, 27 Nov 2014, Emmanuel Berthier wrote:
>> diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
>> index 45fa730..0a69365 100644
>> --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
>> +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
>> @@ -4,7 +4,7 @@
>> #include <asm/perf_event.h>
>> #include <asm/msr.h>
>> #include <asm/insn.h>
>> -
>
> This newline is intentional to seperate asm includes from the local
> one.
>
>> static void __intel_pmu_lbr_enable(void)
>> {
>> u64 debugctl;
>> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>
>> + lbr_set_used_by_perf(true);
>
> This cannot work.
>
> CPU0 CPU1
>
> __intel_pmu_lbr_enable()
> lbr_set_used_by_perf(true);
> __intel_pmu_lbr_disable()
> lbr_set_used_by_perf(false);
>
> This is a per cpu property.
>
> And there is more to that. Let's look at a single CPU.
>
> lbr for oops is enabled
>
> context switch()
> __intel_pmu_lbr_enable() -> LBR used by perf, oops dumper disabled
>
> context switch()
> __intel_pmu_lbr_disable() -> LBR not longer used by perf, oops
> dumper enabled
>
> So after that context switch we crash in the kernel and LBR is empty
> because we did disable it at the context switch.
>
> So you need per cpu state, which handles the LBR dumper state:
>
> #define LBR_OOPS_DISABLED 0x01
> #define LBR_PERF_USAGE 0x02
>
> DEFINE_PER_CPU(unsigned long, lbr_dump_state) = LBR_OOPS_DISABLED;
>
> lbr_perf_enable()
> this_cpu_add(lbr_dump_state, LBR_PERF_USAGE);
>
> lbr_perf_disable()
> if (!this_cpu_sub_return(lbr_dump_state, LBR_PERF_USAGE))
> enable_lbr_oops();
>
> Now of course you need to handle this in the exception path per cpu as
> well.
>
>> /*
>> * Exception entry points.
>> */
>> @@ -1063,6 +1103,8 @@ ENTRY(\sym)
>> subq $ORIG_RAX-R15, %rsp
>> CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
>>
>> + STOP_LBR
>
> We really cannot do this unconditionally for every exception. This
> wants to be conditional, i.e.
>
> .if \stop_lbr
> cond_stop_lbr
> .endif
>
> So we can select which exceptions actually get that treatment.
> do_page_fault is probably the only one which is interesting
> here.
>
> Now looking at your macro maze, I really wonder whether we can do it a
> little bit less convoluted. We need to push/pop registers. error_entry
> saves the registers already and has a (admitedly convoluted)
> kernel/user space check. But we might be able to do something sane
> there. Cc'ing Andy as he is the master of that universe.
>

Can one of you give me some context as to what this code is intended
to do? I haven't followed the thread.

In particular, knowing why this needs to be in asm instead of in C
would be nice, because asm in entry_64.S has an amazing ability to
have little bugs hiding for years.

There's also the caveat that, especialy for the IST exceptions, you're
running in a weird context in which lots of things that are usually
safe are verboten. Page faults can be tricky too, though.

--Andy

> Thanks,
>
> tglx
>
>
>



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/