Re: [Patch-next] Remove notify_die in do_machine_check functioin

From: Andi Kleen
Date: Thu May 27 2010 - 02:57:59 EST


, Hidetoshi Seto wrote:
(2010/05/27 12:21), Huang Ying wrote:
I have heard about that on some machine, some hardware error output pin
of chipset may be linked with some input pin of CPU which can cause MCE.
That is, MCE is used to report some chipset errors too. I think that is
why notify_die is called in do_machine_check. Simply removing notify_die
is not good for these machines.

Hum, it sounds like "notify_die here is hook for proprietary chipset
driver". Anyone who have such machine and driver in real?

No, the die hook was to be compatible with the old KDB patchkit
which hooked into MCE too.

Problems are (1) many callbacks will behave wrongly since they don't
aware that DIE_NMI event can be posted from Machine Check, and (2)
if the machine is not such special hardware it is just waste of time
in critical context where quick page-poisoning might be required.

Yes the best action is probably to just remove it right now.

One quick alternative is define "DIE_MCE" and use it instead, but
if special hook like this is really required, I suppose we should
invent some special interface for external plug-in like a chipset's
LLHEH (low-level hardware error handler) etc., to allow additional
platform-specific error handling in critical context.

I don't think we need or want that.

-Andi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/