Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
From: Jerry Hoemann
Date: Wed Feb 14 2018 - 13:13:51 EST
Ingo,
I have a patch set under review that brings hpwdt into compliance
with the watchdog core.
One of the changes removes the callback into firmware in hpwdt_pretimeout
and its associated spinlock.
https://lkml.org/lkml/2018/2/12/30
I will add you to the CC list of the next version of the set.
Jerry
On Wed, Feb 14, 2018 at 10:44:05AM +0100, Borislav Petkov wrote:
> + Jerry
>
> On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> >
> > * Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote:
> >
> > > Dave Hansen and I had some discussions on how to handle the nested NMI and
> > > firmware calls. We thought of using a per cpu counter to record the nesting
> > > call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 transition.
> > > Will this change be sufficient?
> >
> > Yeah, so I think the first question should be: does the firmware call from NMI
> > context make sense to begin with?
> >
> > Because in this particular case it does not appear to be so: the reason for the
> > BIOS/firmware call appears to be to determine how we nmi_panic() after receiving
> > an NMI that no other NMI handler handled: with a passive-aggressive "I don't know"
> > panic message or with a slightly more informative panic message.
> >
> > That's not a real value-add to users - so we can avoid all these complications by
> > applying the patch below:
> >
> > drivers/watchdog/hpwdt.c | 30 ++++--------------------------
> > 1 file changed, 4 insertions(+), 26 deletions(-)
> >
> > As a bonus the spinlock use can be removed as well.
> >
> > Thanks,
> >
> > Ingo
> >
> > ====================>
> > From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar <mingo@xxxxxxxxxx>
> > Date: Wed, 14 Feb 2018 10:24:41 +0100
> > Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from
> > NMI context
> >
> > Taking a spinlock and calling into the firmware are problematic things to
> > do from NMI callbacks.
> >
> > It also seems completely pointless in this particular case:
> >
> > - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
> > callback, but there's almost no use for it: we use it only to determine
> > whether to panic with an 'unknown NMI' or a slightly more descriptive
> > message.
> >
> > - cmn_regs and rom_lock is not used anywhere else, other than early detection
> > of the firmware capability.
> >
> > So remove rom_lock, make the cmn_regs call from the detection routine only,
> > and thus make the NMI callback a lot more robust.
> >
> > Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> > ---
> > drivers/watchdog/hpwdt.c | 30 ++++--------------------------
> > 1 file changed, 4 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > index f1f00dfc0e68..2544fe482fe3 100644
> > --- a/drivers/watchdog/hpwdt.c
> > +++ b/drivers/watchdog/hpwdt.c
> > @@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
> > static unsigned int allow_kdump = 1;
> > static unsigned int is_icru;
> > static unsigned int is_uefi;
> > -static DEFINE_SPINLOCK(rom_lock);
> > static void *cru_rom_addr;
> > -static struct cmn_registers cmn_regs;
> >
> > extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
> > unsigned long *pRomEntry);
> > @@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
> > unsigned long physical_bios_base = 0;
> > unsigned long physical_bios_offset = 0;
> > int retval = -ENODEV;
> > + struct cmn_registers cmn_regs = { };
> >
> > bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
> >
> > @@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
> > */
> > static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
> > {
> > - unsigned long rom_pl;
> > - static int die_nmi_called;
> > -
> > if (!hpwdt_nmi_decoding)
> > return NMI_DONE;
> >
> > if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
> > return NMI_DONE;
> >
> > - spin_lock_irqsave(&rom_lock, rom_pl);
> > - if (!die_nmi_called && !is_icru && !is_uefi)
> > - asminline_call(&cmn_regs, cru_rom_addr);
> > - die_nmi_called = 1;
> > - spin_unlock_irqrestore(&rom_lock, rom_pl);
> > -
> > if (allow_kdump)
> > hpwdt_stop();
> >
> > - if (!is_icru && !is_uefi) {
> > - if (cmn_regs.u1.ral == 0) {
> > - nmi_panic(regs, "An NMI occurred, but unable to determine source.\n");
> > - return NMI_HANDLED;
> > - }
> > - }
> > - nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> > - "for the NMI is logged in any one of the following "
> > + nmi_panic(regs,
> > + "An NMI occurred. Depending on your system the reason "
> > + "for the NMI might be logged in any one of the following "
> > "resources:\n"
> > "1. Integrated Management Log (IML)\n"
> > "2. OA Syslog\n"
> > @@ -744,13 +729,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
> > HPWDT_ARCH);
> > return retval;
> > }
> > -
> > - /*
> > - * We know this is the only CRU call we need to make so lets keep as
> > - * few instructions as possible once the NMI comes in.
> > - */
> > - cmn_regs.u1.rah = 0x0D;
> > - cmn_regs.u1.ral = 0x02;
> > }
> >
> > /*
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
--
-----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett Packard Enterprise
-----------------------------------------------------------------------------