Re: Erratum 383 fix for 32 bit x86 kernels
From: Greg KH
Date: Fri Sep 24 2010 - 12:03:20 EST
On Fri, Sep 24, 2010 at 01:52:40PM +0200, Joerg Roedel wrote:
> Hi Greg,
>
> here is a patch which I want to ask you to include into the current
> -stable kernels. This patch fixes the occurence of AMD Erratum 383 on
> 32 bit x86 kernels when using CPU hotplug. This patch is a combined
> patch created by cherry-picking (and fixing a small compile error)
> upstream commits:
>
> fd89a137924e0710078c3ae855e7cec1c43cb845
> b7d460897739e02f186425b7276e3fdb1595cea7
>
> The second commit fixes a problem in the first one. The patch I attach
> here is against 2.6.32.22 but should also apply against 2.6.35 (At least
> cherry-picking from 2.6.36-rc gave no conflicts).
> Please consider to include this patch into the -stable kernel series.
>
> Regards,
>
> Joerg
>
> >From d12a669119908f1c24a3b5037445c124c312eea5 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <joerg.roedel@xxxxxxx>
> Date: Thu, 22 Jul 2010 11:32:00 +0200
> Subject: [PATCH] x86-32: Fix crashes with CPU hotplug on AMD machines
>
> This patch fixes machine crashes which occured when heavily
> testing cpu hotplug code on a 32 bit kernel. The kernel
> crashed because these tests triggered AMD erratum 383 which
> resulted in a machine check exception.
> The erratum triggered in the test cases because of the
> following steps:
>
> 1. On 32 bit the swapper_pg_dir page table is used
> as the initial page table for booting a secondary
> cpu.
>
> 2. To make this work swapper_pg_dir needs a direct
> mapping of physical memory in it (the low
> mappings)
>
> 3. Other cpus may use swapper_pg_dir while the low
> mappings are present (when leave_mm is called).
>
> 4. This can result in TLB entries for addresses
> below __PAGE_OFFSET to be established due to
> speculative TLB loads. These TLB entries are
> marked global and possibly large.
>
> 5. When the CPU which has this entry loaded switches
> to another page table this global TLB entry is not
> flushed.
>
> 6. The process accesses an address covered by this
> TLB entry but there is a permission mismatch
> (present TLB entry covers a large global page not
> accessible for userspace).
>
> 7. Due to the permission mismatch the hardware
> walks the new page table and establishes a new
> 4kb TLB entry. Due to AMD erratum 383 there might
> be a small window of time now where both TLB
> entries are present.
>
> 8. After the page walk the hardware does a new TLB
> lookup which may result in two matches. This
> results in a machine check exception which
> signals a TLB multimatch error. The machine
> crashes.
>
> There are two ways to fix this issue:
>
> 1. Always do a global TLB flush when a new cr3 is
> loaded and the old page table was swapper_pg_dir.
> I consider this a hack hard to understand.
>
> 2. Do not use swapper_pg_dir to boot secondary cpus.
>
> This patch implements solution 2. It introduces a
> trampoline_pg_dir which has the same layout as
> swapper_pg_dir with low_mappings. This page table is used as
> the initial page table of the booting cpu. Later in the
> bringup process it switches to swapper_pg_dir and does a
> global tlb flush. This fixes the crashes in our test cases.
>
> Signed-off-by: Joerg Roedel <joerg.roedel@xxxxxxx>
> ---
> arch/x86/include/asm/pgtable_32.h | 1 +
> arch/x86/include/asm/trampoline.h | 3 +++
> arch/x86/kernel/head_32.S | 8 +++++++-
> arch/x86/kernel/setup.c | 2 ++
> arch/x86/kernel/smpboot.c | 21 ++++++---------------
> arch/x86/kernel/trampoline.c | 18 ++++++++++++++++++
> 6 files changed, 37 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
> index 2984a25..f686f49 100644
> --- a/arch/x86/include/asm/pgtable_32.h
> +++ b/arch/x86/include/asm/pgtable_32.h
> @@ -26,6 +26,7 @@ struct mm_struct;
> struct vm_area_struct;
>
> extern pgd_t swapper_pg_dir[1024];
> +extern pgd_t trampoline_pg_dir[1024];
>
> static inline void pgtable_cache_init(void) { }
> static inline void check_pgt_cache(void) { }
> diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
> index cb507bb..8f78fdf 100644
> --- a/arch/x86/include/asm/trampoline.h
> +++ b/arch/x86/include/asm/trampoline.h
> @@ -13,14 +13,17 @@ extern unsigned char *trampoline_base;
>
> extern unsigned long init_rsp;
> extern unsigned long initial_code;
> +extern unsigned long initial_page_table;
> extern unsigned long initial_gs;
>
> #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
>
> extern unsigned long setup_trampoline(void);
> +extern void __init setup_trampoline_page_table(void);
> extern void __init reserve_trampoline_memory(void);
> #else
> static inline void reserve_trampoline_memory(void) {};
> +extern void __init setup_trampoline_page_table(void) {};
> #endif /* CONFIG_X86_TRAMPOLINE */
I don't think that last setup_trampoline_page_table() line is correct
here.
Shouldn't it be:
static inline void setup_trampoline_page_table(void) {};
instead?
Otherwise I get the following error building the .32 code with this
patch:
CC arch/x86/kernel/setup.o
arch/x86/kernel/setup.c: In function âsetup_archâ:
arch/x86/kernel/setup.c:1001:2: error: implicit declaration of function âsetup_trampoline_page_tableâ
Is this really how the code looks upstream?
Hm, even with changing the function prototype, I still get an error
building on the .32-stable tree on x86-64, so I'm dropping this patch
from there.
Also, it didn't apply cleanly to .32-stable, I had to apply this chunk
by hand, no big deal.
So, why not I just take the original git commits that are in Linus's
tree? That should work, right? If so, do I just need to use those two
above-mentioned commits? Or something else? I prefer taking the
original commits as it makes spelunking over time much easier.
thanks,
greg k-h
>
> #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
> index 37c3d4b..75e3981 100644
> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
> /*
> * Enable paging
> */
> - movl $pa(swapper_pg_dir),%eax
> + movl pa(initial_page_table), %eax
> movl %eax,%cr3 /* set the page table pointer.. */
> movl %cr0,%eax
> orl $X86_CR0_PG,%eax
> @@ -608,6 +608,8 @@ ignore_int:
> .align 4
> ENTRY(initial_code)
> .long i386_start_kernel
> +ENTRY(initial_page_table)
> + .long pa(swapper_pg_dir)
>
> /*
> * BSS section
> @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir)
> #endif
> swapper_pg_fixmap:
> .fill 1024,4,0
> +#ifdef CONFIG_X86_TRAMPOLINE
> +ENTRY(trampoline_pg_dir)
> + .fill 1024,4,0
> +#endif
> ENTRY(empty_zero_page)
> .fill 4096,1,0
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index b4ae4ac..6600cfd 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1008,6 +1008,8 @@ void __init setup_arch(char **cmdline_p)
> paging_init();
> x86_init.paging.pagetable_setup_done(swapper_pg_dir);
>
> + setup_trampoline_page_table();
> +
> tboot_probe();
>
> #ifdef CONFIG_X86_64
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index c4f33b2..7b05eb1 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -73,7 +73,6 @@
>
> #ifdef CONFIG_X86_32
> u8 apicid_2_node[MAX_APICID];
> -static int low_mappings;
> #endif
>
> /* State of each CPU */
> @@ -300,8 +299,8 @@ notrace static void __cpuinit start_secondary(void *unused)
> }
>
> #ifdef CONFIG_X86_32
> - while (low_mappings)
> - cpu_relax();
> + /* switch away from the trampoline page-table */
> + load_cr3(swapper_pg_dir);
> __flush_tlb_all();
> #endif
>
> @@ -765,6 +764,10 @@ do_rest:
> initial_code = (unsigned long)start_secondary;
> stack_start.sp = (void *) c_idle.idle->thread.sp;
>
> +#ifdef CONFIG_X86_32
> + initial_page_table = __pa(&trampoline_pg_dir);
> +#endif
> +
> /* start_ip had better be page-aligned! */
> start_ip = setup_trampoline();
>
> @@ -894,20 +897,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
>
> per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
>
> -#ifdef CONFIG_X86_32
> - /* init low mem mapping */
> - clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> - min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
> - flush_tlb_all();
> - low_mappings = 1;
> -
> err = do_boot_cpu(apicid, cpu);
>
> - zap_low_mappings(false);
> - low_mappings = 0;
> -#else
> - err = do_boot_cpu(apicid, cpu);
> -#endif
> if (err) {
> pr_debug("do_boot_cpu failed %d\n", err);
> return -EIO;
> diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
> index c652ef6..a874495 100644
> --- a/arch/x86/kernel/trampoline.c
> +++ b/arch/x86/kernel/trampoline.c
> @@ -1,6 +1,7 @@
> #include <linux/io.h>
>
> #include <asm/trampoline.h>
> +#include <asm/pgtable.h>
> #include <asm/e820.h>
>
> #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
> @@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void)
> memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
> return virt_to_phys(trampoline_base);
> }
> +
> +void __init setup_trampoline_page_table(void)
> +{
> +#ifdef CONFIG_X86_32
> + /* Copy kernel address range */
> + clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
> + swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> + min_t(unsigned long, KERNEL_PGD_PTRS,
> + KERNEL_PGD_BOUNDARY));
> +
> + /* Initialize low mappings */
> + clone_pgd_range(trampoline_pg_dir,
> + swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> + min_t(unsigned long, KERNEL_PGD_PTRS,
> + KERNEL_PGD_BOUNDARY));
> +#endif
> +}
> --
> 1.7.0.4
>
> --
> AMD Operating System Research Center
>
> Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
> General Managers: Alberto Bozzo, Andrew Bowd
> Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/