Re: [PATCH v2 1/2] apic: Fix error interrupt report at all APs

From: Suresh Siddha
Date: Fri Jan 07 2011 - 13:41:41 EST


On Fri, 2011-01-07 at 08:36 -0800, Song, Youquan wrote:
> Recently, customer report that once machine boot, there are many error interrupt
> reported with exact number of all APs.
>
> The root cause is Local APIC will generate error interrupt when it detect
> the illegal vector (one in 0 ~ 15) in an interrupt message received or
> interrupt generate from local vector table or self IPI. SDM3A.chapter 10.
>
> AP LAPIC thermal sensor register will be reset to 0x10000, if thermal throttling
> interrupt take over by BIOS, it need restore AP with the thermal sensor register
> value of geting from BSP, otherwise cause system issue. If BIOS does not take
> over the thermal interrupt, The restore value will be CPU rest value of 0x10000,
> which means the interrupt vector is zero. After writing 0x10000 to thermal
> sensor LVT, the processor will recieve the error interrupt report if the APIC
> error interrupt is also set.
>
> This patch add check the BIOS whether take over the thermal interrupt by look
> at interrupt delivery mode not fixed mode(BIOS handle will be SMI mode) before
> restore AP's thermal LVT. So the agony noise of error interrupt will dismiss
> when boot on machine that BIOS does not handle thermal interrupt..
>
>
> Signed-off-by: Youquan Song <youquan.song@xxxxxxxxx>
> ---
> arch/x86/include/asm/apicdef.h | 1 +
> arch/x86/kernel/cpu/mcheck/therm_throt.c | 12 +++++++-----
> 2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
> index a859ca4..f587be0 100644
> --- a/arch/x86/include/asm/apicdef.h
> +++ b/arch/x86/include/asm/apicdef.h
> @@ -78,6 +78,7 @@
> #define APIC_DEST_LOGICAL 0x00800
> #define APIC_DEST_PHYSICAL 0x00000
> #define APIC_DM_FIXED 0x00000
> +#define APIC_DM_FIXED_MASK 0x00700

It should be called APIC_DM_MASK. Other than that this patch looks good.

Acked-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>

> #define APIC_DM_LOWEST 0x00100
> #define APIC_DM_SMI 0x00200
> #define APIC_DM_REMRD 0x00300
> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index 4b68326..ecd6992 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -405,18 +405,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
> */
> rdmsr(MSR_IA32_MISC_ENABLE, l, h);
>
> + h = lvtthmr_init;
> /*
> * The initial value of thermal LVT entries on all APs always reads
> * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
> * sequence to them and LVT registers are reset to 0s except for
> * the mask bits which are set to 1s when APs receive INIT IPI.
> - * Always restore the value that BIOS has programmed on AP based on
> - * BSP's info we saved since BIOS is always setting the same value
> - * for all threads/cores
> + * If BIOS take over the thermal interrupt and set its interrupt
> + * delivery mode to SMI not fixed, it restore the value that BIOS has
> + * programmed on AP based on BSP's info we saved since BIOS is always
> + * setting the same value for all threads/cores.
> */
> - apic_write(APIC_LVTTHMR, lvtthmr_init);
> + if (h & APIC_DM_FIXED_MASK)
> + apic_write(APIC_LVTTHMR, lvtthmr_init);
>
> - h = lvtthmr_init;
>
> if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
> printk(KERN_DEBUG

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/