(2010/05/27 12:21), Huang Ying wrote:I have heard about that on some machine, some hardware error output pin
of chipset may be linked with some input pin of CPU which can cause MCE.
That is, MCE is used to report some chipset errors too. I think that is
why notify_die is called in do_machine_check. Simply removing notify_die
is not good for these machines.
Hum, it sounds like "notify_die here is hook for proprietary chipset
driver". Anyone who have such machine and driver in real?
Problems are (1) many callbacks will behave wrongly since they don't
aware that DIE_NMI event can be posted from Machine Check, and (2)
if the machine is not such special hardware it is just waste of time
in critical context where quick page-poisoning might be required.
One quick alternative is define "DIE_MCE" and use it instead, but
if special hook like this is really required, I suppose we should
invent some special interface for external plug-in like a chipset's
LLHEH (low-level hardware error handler) etc., to allow additional
platform-specific error handling in critical context.