Re: [PATCH] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU

From: Chen Yucong
Date: Mon May 19 2014 - 03:56:43 EST


On Mon, 2014-05-19 at 09:26 +0200, Borislav Petkov wrote:
> On Mon, May 19, 2014 at 09:55:40AM +0800, Chen Yucong wrote:
> > But all other CPUs also have to wait monarch CPU to exit from mce_end.
> > What's the difference between monarch CPU and Per-CPU for clearing
> > mces_seen? In practice, there is no difference between them. If we use
> > monarch CPU to clear mces_seen, then Per-CPU variable can not play out
> > its advantage.
>
> I'll let you stare at mce_reign() a little bit longer... Also, pay
> attention to its callsite, that might help.
>
We can find the following code segment in mce_end:
-----
...
if (order == 1) {
/* CHECKME: Can this race with a parallel hotplug? */
int cpus = num_online_cpus();

/*
* Monarch: Wait for everyone to go through their
scanning
* loops.
*/
while (atomic_read(&mce_executing) <= cpus) {
if (mce_timed_out(&timeout))
goto reset;
ndelay(SPINUNIT);
}

mce_reign();
barrier();
ret = 0;
...
-----
If a timeout occurs in monarch CPU, what will happen for the above code
segment?
The monarch CPU will directly execute -goto reset-, so mce_reign will
not be invoked. That way, the clear operation of mces_seen will be
skipped, and the stale value of mces_seen will reappear on the next mce.

thx!
cyc


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/