RE: [PATCH 5/5] mce: recover from "action required" errors reportedin data path in usermode

From: Luck, Tony
Date: Thu Sep 08 2011 - 01:16:26 EST


> __memory_failure() handling calls some routines, such
> as is_free_buddy_page(), which needs to acquire the spin
> lock, zone->lock. How can we guarantee that other CPUs
> haven't acquired the lock when receiving #mc broadcast
> and entering #mc handlers ?

By the time I call __memory_failure() - the other cpus have
been released from mce handler - so they are back executing
normal code.

But Chen Gong's earlier comments made me look again at entry_64.S
code - ane I realized that I missed seeing code in the return
path from do_machine_check() that switched from MCE stack to
regular kernel stack before processing TIF_MCE_NOTIFY.

I may go back and re-visit a path that I looked at to change
do_machine_check from "void" return to "unsigned long" and have
it return the address for the "AR" case and "0" otherwise.
Then we could switch out of machine check stack to non-mce
context to call __memory_failure(). When I looked at this
before the entry_64.S path looked plausible. The 32-bit
path looked to be painful (too many macros in entry_32.S)


-Tony
N‹§²æìr¸›yúèšØb²X¬¶ÇvØ^–)Þ{.nÇ+‰·¥Š{±‘êçzX§¶›¡Ü}©ž²ÆzÚ&j:+v‰¨¾«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹®w¥¢¸?™¨è­Ú&¢)ßf”ù^jÇy§m…á@A«a¶Úÿ 0¶ìh®å’i