RE: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

From: Liu, Chuansheng
Date: Wed Oct 16 2013 - 20:29:14 EST




> -----Original Message-----
> From: Ingo Molnar [mailto:mingo.kernel.org@xxxxxxxxx] On Behalf Of Ingo
> Molnar
> Sent: Wednesday, October 16, 2013 8:51 PM
> To: Steven Rostedt
> Cc: LKML; Thomas Gleixner; H. Peter Anvin; Frederic Weisbecker; Andrew
> Morton; paulmck@xxxxxxxxxxxxxxxxxx; Peter Zijlstra; x86@xxxxxxxxxx; Wang,
> Xiaoming; Li, Zhuangzhi; Liu, Chuansheng
> Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault
>
>
> * Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> > On Wed, 16 Oct 2013 08:11:18 +0200
> > Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > >
> > > * Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > >
> > > > Since the NMI iretq nesting has been fixed, there's no reason that
> > > > an NMI handler can not take a page fault for vmalloc'd code. No locks
> > > > are taken in that code path, and the software now handles nested NMIs
> > > > when the fault re-enables NMIs on iretq.
> > > >
> > > > Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> > > > warn on triggers a vmalloc fault for some reason, then we can go into
> > > > an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> > > > the variable to make it happen "once").
> > > >
> > > > Reported-by: "Liu, Chuansheng" <chuansheng.liu@xxxxxxxxx>
> > > > Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > >
> > > Would be nice to see the warning quoted that triggered this.
> >
> > Sure, want me to add this to the change log?
>
> Yeah, that would be helpful - but only the stack trace portion I suspect,
> to make it clear what caused the fault.
>
> The one posted in the thread shows:
>
> [ 17.148755] [<c2825b08>] do_page_fault+0x8/0x10
> [ 17.153926] [<c2823066>] error_code+0x5a/0x60
> [ 17.158905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 17.164760] [<c208d1a9>] ? module_address_lookup+0x29/0xb0
> [ 17.170999] [<c208dddb>] kallsyms_lookup+0x9b/0xb0
> [ 17.186804] [<c208def4>] sprint_symbol+0x14/0x20
> [ 17.192063] [<c208df1e>] __print_symbol+0x1e/0x40
> [ 17.197430] [<c25e00d7>] ? ashmem_shrink+0x77/0xf0
> [ 17.202895] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [ 17.208845] [<c205bdf5>] ? up+0x25/0x40
> [ 17.213242] [<c2039cb7>] ? console_unlock+0x337/0x440
> [ 17.218998] [<c2818236>] ? printk+0x38/0x3a
> [ 17.223782] [<c20006d0>] __show_regs+0x70/0x190
> [ 17.228954] [<c200353a>] show_regs+0x3a/0x1b0
> [ 17.233931] [<c2818236>] ? printk+0x38/0x3a
> [ 17.238717] [<c2824182>]
> arch_trigger_all_cpu_backtrace_handler+0x62/0x80
> [ 17.246413] [<c2823919>] nmi_handle.isra.0+0x39/0x60
> [ 17.252071] [<c2823a29>] do_nmi+0xe9/0x3f0
>
> So kallsyms_lookup() faulted, while the NMI watchdog triggered a
> show_regs()? How is that possible?
Not NMI watchdog triggered show_regs(), when we call arch_trigger_all_cpu_backtrace(),
the NMI handler arch_trigger_all_cpu_backtrace_handler() will call show_regs().

>
> Thanks,
>
> Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/