Re: [PATCH 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll
From: Chen Yucong
Date: Mon Oct 27 2014 - 22:22:15 EST
On Mon, 2014-10-27 at 23:10 +0000, Luck, Tony wrote:
> + m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
> + severity = mce_severity(m, mca_cfg.tolerant, NULL);
>
> This seems a big hack to make mce_severity() work when called from
> CMCI context (when MCG_STATUS register is not set). It would also
> be confusing as the subsequent logged entries would show MCIP and RIPV
> bits set in the mcg_status.
>
In fact, I have already noticed this issue from the start. But the
Intel SDM document that MCIP/RIPV/EIPV are specific to machine check
exception. And I don't know if the above flag bits will be checked in
CMCI context by error log/decode handlers.
> If someone can think of a less hacky way to do this, that would be good. Otherwise
> the code needs a comment, and should reset m->mcg_status to avoid making logs
> that have incorrect data.
>
Yes! the above code snippet should be commented. And another method
that can be used for restoring m->mcgstatus is shown below.
+ u8 mcgs = m->mcgstatus & 0xff;
+
+ m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
+ severity = mce_severity(m, mca_cfg.tolerant, NULL);
+ m->mcgstatus = (m->mcgstatus & ~0xff) | mcgs;
thx!
cyc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/