Re: [PATCH 5/6] x86-mce: check if no_way_out applies before deciding not to clear MCE banks.

From: Havard Skinnemoen
Date: Wed Jul 09 2014 - 19:00:13 EST


On Wed, Jul 9, 2014 at 2:00 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> + if (!(no_way_out && cfg->tolerant < 3))
> mce_clear_state(toclear);
>
> Style - I think this is easier to grok:
>
> if (!no_way_out || cfg->tolerant >=3)
> mce_clear_state(toclear);
>
> but not too strongly if other like !(a && b) form.

I tend to agree with you. It came up during our internal review, and
others argued the other way. But since I'm in charge now, I'll change
it back ;-)

> I'm never sure how to treat the crazy levels of "tolerant" though. Do
> we really want to clear the banks? In one sense we do ... we are still
> running and might see more UC errors. Since newer UC errors don't
> overwrite older ones, clearing the banks allows us to see how many
> errors are piling up and being ignored.
>
> But running with tolerant==3 is likely to end in tears ... should we erase
> the evidence on what bad things happened?

It probably doesn't make a huge difference since you're not supposed
to run with tolerant=3, but I kind of understood the logic to be that
if we're going to keep running, we need to clear the banks, and if
we're going to crash, we need to leave them intact so whatever runs
next gets a chance to look at them. So with tolerant==3, we are going
to continue running, and I think for debugging purposes, it's useful
to see how many additional errors are happening.

Havard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/