Re: [PATCH 4/4 Rebase] x86, MCE: Avoid potential deadlock in MCE context

From: Borislav Petkov
Date: Wed May 20 2015 - 05:28:08 EST


On Wed, May 20, 2015 at 03:35:38PM -0400, Chen, Gong wrote:
> Printing in MCE context is a no-no, currently, as printk is not
> NMI-safe. If some of the notifiers on the MCE chain call *printk*, we
> may deadlock. In order to avoid that, delay printk into process context
> to fix it.
>
> Background info at: https://lkml.org/lkml/2014/6/27/26
>
> Reported-by: Xie XiuQi <xiexiuqi@xxxxxxxxxx>
> Signed-off-by: Chen, Gong <gong.chen@xxxxxxxxxxxxxxx>
> Link: http://lkml.kernel.org/r/1406797523-28710-6-git-send-email-gong.chen@xxxxxxxxxxxxxxx
> [ Boris: rewrite a bit. ]
> Signed-off-by: Borislav Petkov <bp@xxxxxxx>
> ---
> arch/x86/include/asm/mce.h | 1 +
> arch/x86/kernel/cpu/mcheck/mce-apei.c | 2 +-
> arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++++--
> arch/x86/kernel/cpu/mcheck/mce_intel.c | 1 -
> arch/x86/kernel/cpu/mcheck/therm_throt.c | 1 +
> arch/x86/kernel/cpu/mcheck/threshold.c | 1 +
> 6 files changed, 10 insertions(+), 4 deletions(-)

....

> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index 1af51b1586d7..2733f275237d 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -427,6 +427,7 @@ static inline void __smp_thermal_interrupt(void)
> {
> inc_irq_stat(irq_thermal_count);
> smp_thermal_vector();
> + mce_queue_irq_work();

Hmm, at a second glance, this looks wrong. I think we should do that
call in intel_thermal_interrupt().

> asmlinkage __visible void smp_thermal_interrupt(struct pt_regs *regs)
> diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
> index 7245980186ee..d695faa234eb 100644
> --- a/arch/x86/kernel/cpu/mcheck/threshold.c
> +++ b/arch/x86/kernel/cpu/mcheck/threshold.c
> @@ -22,6 +22,7 @@ static inline void __smp_threshold_interrupt(void)
> {
> inc_irq_stat(irq_threshold_count);
> mce_threshold_vector();
> + mce_queue_irq_work();

Same here.

mce_queue_irq_work() call should be issued in both AMD and Intel
threshold handlers but not in the generic one which is unlikely to queue
any MCE...

Right?


--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/