Re: [Patch V2] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process.

From: Luck, Tony
Date: Mon Dec 07 2015 - 18:26:37 EST


On Mon, Dec 07, 2015 at 11:34:27PM +0100, Borislav Petkov wrote:
> BIOS is doing funny cores enumeration:
>
> node #0, CPUs 0-7
> node #1, CPUs 8-15
> node #2, CPUs 16-23
> node #3, CPUs 24-31
>
> and then starts from node 0 again:
>
> .... node #0, CPUs: #32 #33 #34 #35 #36 #37 #38 #39
> .... node #1, CPUs: #40 #41 #42 #43 #44 #45 #46 #47
> .... node #2, CPUs: #48 #49 #50 #51 #52 #53 #54 #55
> .... node #3, CPUs: #56 #57 #58 #59 #60 #61 #62 #63

That's normal. BIOS writers are encouraged to list all the
hyperthread 0 cpus from each core, and then add the hyperthread
1 cpus later in the table. That way an OS that boots less than
all the cpus will get the maximum number of real cores into
play.

> 0x00000010 Memory Uncorrectable non-fatal
>
> it generates an MCE only on the node 0 cores. For that log see the end
> of this mail. The gist of it is that the CPUs on which #MC gets raised
> are the cores on node 0, i.e., 0-7 and 32-39.

I think all the threads on all the sockets must have shown up
in the machine check handler ... but only the ones on socket0
printed anything (they can all see the error in bank5 which is
shared across the socket ... but cpus 8-15 etc. will see no errors
in any banks ... so will be silent.)

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/