RE: [PATCH] [LBR] Dump LBRs on Oops
From: Berthier, Emmanuel
Date: Wed Nov 26 2014 - 09:17:49 EST
> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
> Sent: Wednesday, November 26, 2014 2:08 PM
> To: Berthier, Emmanuel
> Cc: mingo@xxxxxxxxxx; hpa@xxxxxxxxx; x86@xxxxxxxxxx; Jarzmik, Robert;
> linux-kernel@xxxxxxxxxxxxxxx
> Subject: RE: [PATCH] [LBR] Dump LBRs on Oops
>
> On Wed, 26 Nov 2014, Berthier, Emmanuel wrote:
> > > So, if perf uses LBR we do not print it? What a weird design decision.
> > > If the machine crashes, we want that information no matter whether
> > > perf is active or not. What kind of twisted logic is that?
> >
> > Ok, let me explain.
> > LBR usages are exclusive. Perf uses LBR to calculate some CPU
> > statistics. I use LBR to track code execution before Exceptions.
> > So, as soon as we enable perf, I disable LBR dump and vice versa.
>
> That wants to be documented in the code.
Ok, will be in patch 2
> > > > + } else if (x86_pmu.lbr_nr == 0) {
> > > > + pr_cont(" (x86_model unknown, check
> > > intel_pmu_init())\n");
> > >
> > > Huch? Why we get here if the pmu does not support it at all? Why
> > > should we bother to print it? If it's not printed it's not available. It's that
> simple.
> >
> > That's a warning to point out that current core is not supported. New
> > cores have to be declared in
> >
> > intel_pmu_init() after:
> >
> > switch (boot_cpu_data.x86_model) {
> > . . .
> > case 28: /* Atom */
> > case 38: /* Lincroft */
> > case 39: /* Penwell */
> > case 53: /* Cloverview */
> > case 54: /* Cedarview */
> >
> > I work on new cores and their names will not be revealed before a while.
> > So, next time this feature will be used on a new core, it's important
> > to understand why is not supported and where to make the simple
> > update.
>
> We add printks not for people who work on the support of unreleased
> hardware. They should better know what they are doing. If they can't figure
> that out they should not touch the kernel in the first place.
LoL
I'm part of those people, I've touched the kernel and I've figured out what was wrong.
And I would like to be helped next year for the next Core: I'm an old man and I need
to leave a white stone trail ;-)
Could we agree on that one?
> > > > void show_regs(struct pt_regs *regs) {
> > > > int i;
> > > > @@ -314,10 +352,15 @@ void show_regs(struct pt_regs *regs)
> > > > unsigned char c;
> > > > u8 *ip;
> > > >
> > > > + /*
> > > > + * Called before show_stack_log_lvl() as it could trig
> > > > + * page_fault and reenable LBR
> > >
> > > Huch? The kernel stack dump is going to page fault? If that happens
> > > then you are in deep shit anyway. I doubt that anything useful gets
> > > out of the machine at this point, LBR or not.
> > >
> > > Aside of that if we want to debug with the LBR then we better freeze
> > > that whole thing across a dump and be done with it.
> >
> > I met that case but did no dig deeply into it...
>
> Hmm, a corrupted stack might trigger this together with some of the other
> debug options enabled. So we really might to put it in front.
Didn't catch you. Could you elaborate on that?
> > > > diff --git a/arch/x86/kernel/entry_64.S
> > > > b/arch/x86/kernel/entry_64.S index df088bb..120e989 100644
> > > > --- a/arch/x86/kernel/entry_64.S
> > > > +++ b/arch/x86/kernel/entry_64.S
> > > > @@ -1035,6 +1035,42 @@ apicinterrupt IRQ_WORK_VECTOR \
> > > > irq_work_interrupt smp_irq_work_interrupt #endif
> > > >
> > > > +.macro STOP_LBR
> > > > +#ifdef CONFIG_LBR_DUMP_ON_EXCEPTION
> > > > + testl $1, lbr_dump_on_exception
> > > > + jz 1f
> > > > + push %rax
> > > > + push %rcx
> > > > + push %rdx
> > > > + movl $MSR_IA32_DEBUGCTLMSR, %ecx
> > > > + rdmsr
> > > > + and $~1, %eax /* Disable LBR recording */
> > > > + wrmsr
> > > > + pop %rdx
> > > > + pop %rcx
> > > > + pop %rax
> > > > +1:
> > > > +#endif
> > > > +.endm
> > > > +
> > > > +.macro START_LBR
> > > > +#ifdef CONFIG_LBR_DUMP_ON_EXCEPTION
> > > > + testl $1, lbr_dump_on_exception
> > > > + jz 1f
> > > > + push %rax
> > > > + push %rcx
> > > > + push %rdx
> > > > + movl $MSR_IA32_DEBUGCTLMSR, %ecx
> > > > + rdmsr
> > > > + or $1, %eax /* Enable LBR recording */
>
> > So, by default, when Perf is not used, LBR will be enabled at the
> > first exception (usually a simple page fault) with default filtering
> > options, i.e no filtering. As soon as we start perf, the
> > lbr_dump_on_exception global is unset and LBR start/stop are bypassed.
> >
> > LBR filtering is reset during perf stop.
>
> So while I can see how this could be useful there are a few things which need
> more thought:
>
> 1) We want to enable/disable this at boot time.
>
> In the disabled case we might also stub out the test/jz and replace
> it by an unconditional jump, but that needs more thought.
I can add a cmdline option to disable it at boot time.
Do you propose to use code instruction patching (same as ftrace) also?
Is-it really worth to bypass test/jz as page fault handling is much more than few instructions?
> 2) Right now you stop the trace on every exception no matter whether
> it comes from user or kernel space.
>
> Stopping the trace when we handle a user space fault does not make
> any sense and inflicts just pointless overhead.
>
> Aside of that if the fault handler then crashes we do not have the
> LBR information because we froze it when entering from user space
> in the first place.
Agree, but the LBR buffer contains only 8 records: we have to stop it as soon as possible.
If we add some test/jump/call before stopping it, relevant info will be flushed out.
Thanks.
> Thanks,
>
> tglx
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/