RE: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != thread 0

From: Ghannam, Yazen
Date: Thu Jun 29 2017 - 13:58:23 EST


> -----Original Message-----
> From: themoken@xxxxxxxxx [mailto:themoken@xxxxxxxxx] On Behalf Of
> Jack Miller
> Sent: Thursday, June 29, 2017 12:23 PM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: Jack Miller <jack@xxxxxxxxxxx>; Borislav Petkov <bp@xxxxxxx>; linux-
> kernel@xxxxxxxxxxxxxxx; tglx@xxxxxxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 !=
> thread 0
>
> On Wed, Jun 28, 2017 at 1:58 PM, Ghannam, Yazen
> <Yazen.Ghannam@xxxxxxx> wrote:
> >> With my patch applied, I see entries like l3_cache under hardware
> >> thread 0's directory (it's shifted to CPU 1, so machinecheck1).
> >> Without my patch, only machinecheck0 has anything interesting in it
> >> (insn_fetch, l2_cache etc.) because the init failed on CPU 1.
> >>
> >
> > What happens with SMT off?
>
> I haven't been able to test with SMT off (since it's apparent that 'nosmt'
> doesn't really do anything and I don't locally have a firmware option to turn it
> off).
>
> First things first though, like Boris I'd like to know if there's a better way to
> detect this master thread, other than by APIC ID. Right now I'm working on a
> v2 that will remove the CPU check, let each one perform the rdmsr and only
> update empty bank info. I believe this call is being serialized elsewhere (need
> to check), but if I could keep this patch to a one-liner by detecting the right
> thread, I'd like to.
>

There is a master thread per Die, so a multi-Die system will have multiple master
threads. In which case, we still need to decide which master thread will populate
the array.

I have a solution that doesn't rely on using a specific thread. I'll send it up shortly.

Thanks,
Yazen