Re: [PATCH] fix x2apic defect that Linux kernel doesn't mask 8259Ainterrupt during the time window between changing VT-d table base addressand initializing these VT-d entries

From: Suresh Siddha
Date: Thu Sep 20 2012 - 18:22:53 EST


On Wed, 2012-09-12 at 07:02 +0000, Zhang, Lin-Bao (ESSN-MCXS-Linux
Kernel R&D) wrote:
> Hi all,
> This defect can be observed when the x2apic setting in BIOS is set to
> "auto" and the BIOS has virtual wire mode enabled on a power up. This
> defect was found on a 2.6.32 based kernel.

I assume you are able to reproduce the issue with the latest kernel
aswell?

What virtual wire mode is it?

Virtual wire mode-A (where the PIC output is connected to LINT0 of the
Local APIC) doesn't go through interrupt-remapping and virtual wire
mode-B (where the PIC output is routed through the IO-APIC RTE) will be
completely disabled as all the BIOS setup IO-APIC RTE's are masked by
the Linux kernel from the time we enable interrupt-remapping to the time
IO-APIC RTE's are properly re-configured by the Linux kernel again.

So I am at a loss to understand what is causing this.

>
> The kernel code (smpboot.c, apic.c) does not mask 8259A interrupts
> before changing and initializing the new VT-d table when x2apic
> virtual wire mode is enable on power up. The Linux Kernel expects
> virtual wire mode to be disabled when booting and enables it when
> interrupts are masked.
>
> The BIOS code builds a simple VT-d table on power up. While the Linux
> Kernel boots, it first builds an empty VT-d table and use it. After
> some time, the Linux Kernel then initializes the IO-APIC redirect
> table, and then initializes the VT-d entries. The window between
> initializing the redirect table and the VT-d entries, the 8259A
> interrupts are not masked. If an interrupt occurs in this window, the
> Linux Kernel will not find a valid entry for this interrupt. The
> kernel treats it to be a fatal error and panics. If the error never
> gets cleared, the Linux kernel continuously print this error:
> "NMI: IOCK error (debug interrupt?) for reason"

Not sure why we get a NMI instead of a vt-d fault? Perhaps the vt-d
fault is also getting reported via NMI in this platform?

Does your tested kernel has this fix?
commit 254e42006c893f45bca48f313536fcba12206418
Author: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
Date: Mon Dec 6 12:26:30 2010 -0800

x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic

Will you be able to provide the failing kernel log so that I can better
understand the issue?

thanks,
suresh

> The fix to this defect, the code change is to mask 8259A interrupts
> before changing VT-d table and initializing VT-d entries. Then unmask
> interrupts after completing the redirect table entries.
>
>
> Signed-off-by: Zhang, Lin-Bao <linbao.zhang@xxxxxx>
> Tested-by: Nigel Croxon <nigel.croxon@xxxxxx>
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 24deb30..299172c 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1556,7 +1556,6 @@ void __init enable_IR_x2apic(void)
> }
>
> local_irq_save(flags);
> - legacy_pic->mask_all();
> mask_ioapic_entries();
>
> if (x2apic_preenabled && nox2apic) @@ -1603,7 +1602,6 @@ void __init enable_IR_x2apic(void)
> skip_x2apic:
> if (ret < 0) /* IR enabling failed */
> restore_ioapic_entries();
> - legacy_pic->restore_mask();
> local_irq_restore(flags);
> }
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 7c5a8c3..95fee01 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1000,7 +1000,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
> zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
> }
> set_cpu_sibling_map(0);
> -
> + mask_8259A();
>
> if (smp_sanity_check(max_cpus) < 0) {
> pr_info("SMP disabled\n"); @@ -1037,6 +1037,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
> apic->setup_portio_remap();
>
> smpboot_setup_io_apic();
> + unmask_8259A();
> +
> /*
> * Set up local APIC timer on boot CPU.
> */
>
>
>
> -- Bob(Zhang LinBao)
> åæïâäæäçäåç,æäçääâ
> "If not us, who ? if not now, when ?"
> ESSN-MCBS linux kernel enginner
>
>
> NÐérybXèv^?è{.n?ä{èzXÐâ}èz?j:+v?èzZ+?zfïh~iz?wã?ã??æf^jèym@Aaå 0éh?i


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/