Re: x86/mce/therm_throt incorrect THERM_STATUS_CLEAR_CORE_MASK?

From: Arnd Bergmann
Date: Thu Jun 02 2022 - 12:18:31 EST


On Thu, Jun 2, 2022 at 5:52 PM srinivas pandruvada
<srinivas.pandruvada@xxxxxxxxxxxxxxx> wrote:
>
> On Thu, 2022-06-02 at 11:19 +0200, Arnd Bergmann wrote:
> > I have a Xeon W-2265 (family 6, model 85, stepping 7) that started
> > constantly spewing messages from the therm_throt driver after one
> > core overheated:
> >
> I think this is a Cascade Lake system. Have you tried the latest micro-
> code?

Thanks for your quick reply. I have installed the latest microcode 0x5003302
now (manually, because the version provided by the distro was still using
version 0x5003102).

After that, I tried writing the value 0x2a80 from userspace, and
that did not cause a trap, so I assume that fixed it.

It's hard to be sure, as the system has only run into the broken
state twice during its life, and now it's fine. I'll reply here if it
ever comes back with the new microcode.

Thanks a lot!

Arnd