Re: [PATCH 4/5] x86, NMI: Allow NMI reason io port (0x61) to beprocessed on any CPU

From: Robert Richter
Date: Wed Oct 20 2010 - 06:03:38 EST


On 19.10.10 20:23:12, Huang Ying wrote:
> On Wed, 2010-10-20 at 02:37 +0800, Don Zickus wrote:
> > On Tue, Oct 19, 2010 at 06:25:07PM +0200, Robert Richter wrote:
> > > On 19.10.10 17:07:01, Robert Richter wrote:
> > > > On 15.10.10 22:22:17, Don Zickus wrote:
> > > > > From: Huang Ying <ying.huang@xxxxxxxxx>
> > > > >
> > > > > In original NMI handler, NMI reason io port (0x61) is only processed
> > > > > on BSP. This makes it impossible to hot-remove BSP. To solve the
> > > > > issue, a raw spinlock is used to make the port can be processed on any
> > > > > CPU.
> > > > >
> > > > > Signed-off-by: Huang Ying <ying.huang@xxxxxxxxx>
> > > > > Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>
> > > > > ---
> > > > > arch/x86/kernel/traps.c | 45 +++++++++++++++++++++++++--------------------
> > > > > 1 files changed, 25 insertions(+), 20 deletions(-)
> > >
> > > > > @@ -400,28 +405,28 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> > > > > return;
> > > > >
> > > > > /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> > > > > - cpu = smp_processor_id();
> > > > > - /* Only the BSP gets external NMIs from the system. */
> > > > > - if (!cpu) {
> > > > > - reason = get_nmi_reason();
> > > > > - if (reason & NMI_REASON_MASK) {
> > > > > - if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
> > > > > - == NOTIFY_STOP)
> > > > > - return;
> > > > > - if (reason & NMI_REASON_SERR)
> > > > > - pci_serr_error(reason, regs);
> > > > > - else if (reason & NMI_REASON_IOCHK)
> > > > > - io_check_error(reason, regs);
> > > > > + raw_spin_lock(&nmi_reason_lock);
> > > >
> > > > What about using raw_spin_trylock() instead? We don't have to wait
> > > > here since we are already processing it by another cpu.
> > >
> > > This would avoid a global lock and also deadlocking in case of a
> > > potential #gp in the nmi handler.
> >
> > I would feel more comfortable with it too. I can't find a reason where
> > trylock would do harm.
>
> One possible issue can be as follow:
>
> - PCI SERR NMI raised on CPU 0
> - IOCHK NMI raised on CPU 1
>
> If we use try lock, we may get unknown NMI on one CPU. Do you guys think
> so?

This could be a valid point. On the other side the former
implementation to let only handle cpu #0 i/o interrupts didn't trigger
unknown nmis, so try_lock wouldn't change much compared to this. To be
sure we might do a NOTIFY_STOP in the unknown path if we don't get the
lock.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/