RE: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR interrupt handler

From: Ghannam, Yazen
Date: Thu Aug 16 2018 - 15:00:52 EST


> -----Original Message-----
> From: linux-edac-owner@xxxxxxxxxxxxxxx <linux-edac-owner@xxxxxxxxxxxxxxx>
> On Behalf Of Ghannam, Yazen
> Sent: Thursday, August 9, 2018 1:18 PM
> To: Borislav Petkov <bp@xxxxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> Subject: RE: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR
> interrupt handler
>
> > -----Original Message-----
> > From: Borislav Petkov <bp@xxxxxxxxx>
> > Sent: Thursday, August 9, 2018 11:16 AM
> > To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> > Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> > Subject: Re: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR
> > interrupt handler
> >
> > On Thu, Aug 09, 2018 at 09:08:33AM -0500, Yazen Ghannam wrote:
> > > From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> > >
> > > If threshold_init_device() fails then per_cpu(threshold_banks) will be
> > > deallocated. The thresholding interrupt handler will still be active, so
> >
> > So fix the code so that *that* doesn't happen instead of adding checks
> > to the interrupt handler.
> >
> > I.e.,
> >
> > if (err) {
> > mce_threshold_vector = default_threshold_interrupt;
> > return err;
> > }
> >
>
> Okay. I'll make that change.
>

I don't think this is enough. We have a gap between when the interrupt
handler is set up during boot in __mcheck_cpu_init_vendor() and when all
the data structures are created during threshold_init_device().

So I think we should keep the NULL pointer checks for now to keep this fix
small. I can make a new patch following your suggestion above.

We can change the code so that we create the data structures during the
earlier init process, but I think this will be a much bigger change. This could
fall under the idea of decoupling the handling code from sysfs.

Thanks,
Yazen