Re: early microcode on amd is broken when no initramfs provided
From: Torsten Kaiser
Date: Sun Jul 21 2013 - 00:01:43 EST
On Sun, Jul 21, 2013 at 12:59 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Sat, Jul 20, 2013 at 09:01:33PM +0200, Torsten Kaiser wrote:
>> On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
>> > On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
>> >> config is attached
>> >
>> > Ok, I can reproduce the hang with your config but even with:
>> >
>> > $ grep MICROCODE .config
>> > # CONFIG_MICROCODE is not set
>> > # CONFIG_MICROCODE_INTEL_EARLY is not set
>> > # CONFIG_MICROCODE_AMD_EARLY is not set
>> >
>> > which means, it cannot be microcode-related.
>> >
>> > And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
>> > the boot would probably continue. And if so, this is that 60 sec delay
>> > where the kernel tries to find firmware.
>> >
>> > Hmm...
>>
>> I have the same problem: Booting 3.11-rc1 hangs after the line:
>> ACPI: Executed 3 blocks of module-level executable AML code
>>
>> I bisected it down to the early microcode changes:
>> 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
>> implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
>> fixup) completely fail to boot (No output beyond "Booting kernel") ,
>> from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make
>> find_ucode_in_initrd() __init") I'm seeing this hang.
>>
>> Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system
>> now sucessfully boots 3.11-rc1.
>
> Ok, I need to be able to reproduce that first - I wasn't that successful
> with Johannes' setup.
>
> So, can you please send .config and how you're loading your microcode?
> Is it in the initrd or are you doing that later, how? Grub entry please.
>
> Also, is it just plain v3.11-rc1 or with patches ontop?
>
> Also, /proc/cpuinfo please.
.config and cpuinfo attached.
Microcode seems not to be loaded at all, for MICROCODE_EARLY I did not
attach the needed file / cpio and the normal update mechanism seems to
not have a newer microcode that what the BIOS is providing.
I'm using a custom initrd, but that can't be used for MICROCODE_EARLY
because its compressed and does not contain a AuthenticAMD.bin. Its
also not containing microcode_amd.bin, because I'm suppling that via
CONFIG_EXTRA_FIRMWARE.
Grub entry:
title 3.11.0-rc1-crypt
root (hd0,0)
kernel (hd0,0)/boot/kernel-3.11.0-rc1 fastboot crypt_root=/dev/md6
video=1280x1024 radeon.dpm=1
initrd (hd0,0)/boot/ramfs-2011.gz
savedefault
I was using plain 3.11-rc1 except the changes I made to debug this.
What I think you need: A system that is fatally affected by AMD
Erratum 400 and an 64bit kernel.
>From my debugging I found the following sequence of events occurs on my system:
The BSP will call load_ucode_ap().
That will call collect_cpu_info_amd_early(), which will fill the
cpuinfo_x86.x86 and cpuinfo_x86.microcode fields of the
cpu_info-per-cpu-structure that has not yet been setup. Because this
code will only be used with MICROCODE_EARLY disabling this options
make my system boot. OTOH this function is called regardless if
AuthenticAMD.bin is available or not, thats why I'm hitting it even
without the special cpio.
Then the BSP will call init_amd() to apply the errata fixes. That uses
cpu_has_amd_erratum(), but that function is not using the cpuinfo_x86
that was supplied to init_amd() (And used for the following
set_cpu_bug() is the erratum was found!), but instead is guessing
itself if it should use the per-cpu data or boot_cpu_data. And it uses
the not yet initialized per-cpu data for that guess. Which normally
works fine, because that will all be zeroed out, but
collect_cpu_info_amd_early() has filled ->x86 and so
cpu_has_amd_erratum() wil use the partly filled per-cpu data instead
of the correct boot_cpu_data. But because collect_cpu_info_amd_early()
did not fill ->x86_vendor that field is still 0 == X86_VENDOR_INTEL
and cpu_has_amd_erratum() will lie that no erratum is present.
So the C1E work around is not applied and as soon as ACPI enables this
the boot hangs.
Something like the following (whitespace mangled by Gmail, if it looks
OK for you, I will send it as a clean patch) fixes
cpu_has_amd_erratum() for me, but I did not look how the early
microcode loading should work if AuthenticAMD.bin is available to
offer a fix the premature accesses to per-cpu cpu_info.
--- 3.11-rc1/arch/x86/kernel/cpu/amd.c.orig 2013-07-21
05:42:42.130346496 +0200
+++ 3.11-rc1/arch/x86/kernel/cpu/amd.c 2013-07-21 05:45:09.420345843 +0200
@@ -512,7 +512,7 @@
static const int amd_erratum_383[];
static const int amd_erratum_400[];
-static bool cpu_has_amd_erratum(const int *erratum);
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
{
@@ -729,11 +729,11 @@
value &= ~(1ULL << 24);
wrmsrl_safe(MSR_AMD64_BU_CFG2, value);
- if (cpu_has_amd_erratum(amd_erratum_383))
+ if (cpu_has_amd_erratum(c, amd_erratum_383))
set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
}
- if (cpu_has_amd_erratum(amd_erratum_400))
+ if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_APIC_C1E);
rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
@@ -879,22 +879,14 @@
static const int amd_erratum_383[] =
AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
-static bool cpu_has_amd_erratum(const int *erratum)
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
{
- struct cpuinfo_x86 *cpu = __this_cpu_ptr(&cpu_info);
int osvw_id = *erratum++;
u32 range;
u32 ms;
- /*
- * If called early enough that current_cpu_data hasn't been initialized
- * yet, fall back to boot_cpu_data.
- */
- if (cpu->x86 == 0)
- cpu = &boot_cpu_data;
-
- if (cpu->x86_vendor != X86_VENDOR_AMD)
- return false;
+ /* Should never be called on Non-AMD-CPUs */
+ BUG_ON(cpu->x86_vendor != X86_VENDOR_AMD);
if (osvw_id >= 0 && osvw_id < 65536 &&
cpu_has(cpu, X86_FEATURE_OSVW)) {
? üNëQ ÅW:¨? ?<Ëvã¶?û|??3?{?¶Ý?§ïÌñ! I P¶¼áqluâ?úZrnúï§
à *?EÚª*¼
?z¡ÀøqFÞ¯Ïw?Çû»§§ï³ßv/»·»Ãîaöõñi÷¿³LÌJ¡g,ãú' Î_Þÿúø×çËÙ§?ÎÎ~:ýðv6[ïÞ^vO3úúòõñ·whýøúòÃ??PQ.ø²¹¼?s}õ½ûyóù@Þïá/?®jª¹(??Q?±j@?ZËZ7QD_?ì?¾^^|?©|¸¼8éhHEWÐra^?ܽÝÿ?Óýxo&·Ç¿aøæa÷ÕBú?¹ ë?ÉFÕR?Ê?°Ò?®uE(ãVdÃ??hVÒ?ÆEQ?JƲ&+HS?ÝjàÔÒ sV.õjÀ-YÉ*N®âÇ?y½???Áä8ÌQ
^jV©1Ùê?ñåÊ?²aaA¶vq?6??ØêZ±¢¹¡«%ɲ?äKQq½*ÆýR?óyk?íÈÉ6èETCem&xúÎò?ÎoY¢5©?é??³`GÓµl$«,UÅHÀèÅ?9üZðJé?®êr? ?dÉâdv>|Ϊ?±?B)>ÏÃ)«ZIVf)ô5)u³ªaY?¬`Î1
Ã\?J?Ï?[?Ùøtî4«áÐ?Æ£¹)U???ÀÞð??ËeÆP??
$??çFÒ??$ª?³?.r²TW'¾¢Âù°¿ûs÷ðáíáqæö!àá¯