Re: [PATCH] Handle Ice Lake MONITOR erratum

From: Jim Mattson

Date: Wed May 27 2026 - 23:08:32 EST


On Mon, Apr 21, 2025 at 12:22:05PM -0700, Dave Hansen wrote:
> Andrew Cooper reported some boot issues on Ice Lake servers when
> running Xen that he tracked down to MWAIT not waking up. Do the safe
> thing and consider them buggy since there's a published erratum.
> Note: I've seen no reports of this occurring on Linux.
>
> Add Ice Lake servers to the list of shaky MONITOR implementations with
> no workaround available. Also, before the if() gets too unwieldy, move
> it over to a x86_cpu_id array. Additionally, add a comment to the
> X86_BUG_MONITOR consumption site to make it clear how and why affected
> CPUs get IPIs to wake them up.
>
> There is no equivalent erratum for the "Xeon D" Ice Lakes so
> INTEL_ICELAKE_D is not affected.
>
> The erratum is called ICX143 in the "3rd Gen Intel Xeon Scalable
> Processors, Codename Ice Lake Specification Update". It is Intel
> document 637780, currently available here:
>
> https://cdrdv2.intel.com/v1/dl/getContent/637780

The erratum says, "Due to this erratum, the processor may hang."

We are seeing some Ice Lake Xeon E5 machines panic due to hard lockups, and
then the kdump kernel dies with "Fatal machine check from unknown source."
Is this behavior consistent with this erratum?

This seems to only happen on Cloud machines, but we always intercept
MONITOR and MWAIT on Ice Lake hosts, so I'm not sure why virtualization
would be a factor.

Thanks,

--jim