RE: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs
From: Luck, Tony
Date: Wed Jan 06 2021 - 19:27:07 EST
> Please see below for an updated patch.
Yes. That worked:
[ 78.946069] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143
[ 78.946151] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143
[ 78.946153] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
I guess that more than one CPU hit the timeout and so your new message was printed twice
before the panic code took over?
Once again, the whole of socket 1 is MIA rather than just the pair of threads on one of the cores there.
But that's a useful improvement (eliminating the other three sockets on this system).
Tested-by: Tony Luck <tony.luck@xxxxxxxxx>
-Tony