RE: [PATCH 5/6] x86, mce: handle "action required" errors

From: Luck, Tony
Date: Wed Dec 14 2011 - 14:05:43 EST


> > + if (!mce_find_info(&paddr))
> > + mce_panic("Lost address", NULL, NULL);
>
> Wouldn't it be good to return struct mce_info *mi here in addition to
> &paddr...

Great idea (actually "instead of" works better than "in addition too").

> so that you don't need to iterate again over the mce_info array but do:
>
> mce_clear_info(mi);

Just coded it - looks much better. Will send new version soon with
this change, and Ingo's suggestions incorporated.

> This assumes, of course, that you have only one AR MCE per task, per
> return to userspace. I guess this is fine for now.

While we might have multiple memory references in flight at once, we'd
have to be really unlucky to hit multiple 2-bit errors at the same
time (unless there was some system level failure in the memory controller,
in which case we not likely to be able to recover).

In current processor implementations, the recoverable errors are all
reported in just one machine check bank - so we can't actually process
more than one at a time.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/