Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

From: Dmitry Malkin
Date: Thu Jul 26 2018 - 04:11:21 EST

On 07/25/2018 11:21 PM, Kirill A. Shutemov wrote:
On Wed, Jul 25, 2018 at 05:26:02PM +0000, Dmitry Malkin wrote:
there may be some other reasons which may cause undefined behavior (reboot
for example):

in arch/x86/boot/compressed/pgtable_64.c in function paging_prepare():

1. structure "paging_config" allocated on stack without setting default
value for flag "l5_required":
struct paging_config paging_config = {};
l5_required is set only if CONFIG_X86_5LEVEL is defined
Hm? C99 initializer zeros the structure.
Here I only see std=gnu89.

2. reading from memory which may be reserved in case of EFI systems:
ÂÂ ebda_start = *(unsigned short *)0x40e << 4;
ÂÂ bios_start = *(unsigned short *)0x413 << 10;
Also, on EFI system without CSM it will results in all zeros. Which will
place trampoline_start to 0x9d000. And it also may be reserved memory. In
fact I have such system and it is causes instant reboot (when code starts
copying to "trampoline_start").
Could you show dmesg from such system?
Sure, here it is (please note than not both pages are reserved but only second one: 0x9e000-0x9ffff):

[ÂÂÂ 0.000000] Linux version 4.17.9-1.el7.elrepo.x86_64 (mockbuild@Build64R7) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)) #1 SMP Sun Jul 22 11:57:51 EDT 2018
[ÂÂÂ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.17.9-1.el7.elrepo.x86_64 root=UUID=51cc5f87-2bb2-45b5-a0ee-691970f9cf06 ro crashkernel=auto rhgb quiet
[ÂÂÂ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ÂÂÂ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ÂÂÂ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ÂÂÂ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ÂÂÂ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ÂÂÂ 0.000000] x86/fpu: xstate_offset[2]:Â 576, xstate_sizes[2]: 256
[ÂÂÂ 0.000000] x86/fpu: xstate_offset[3]:Â 832, xstate_sizes[3]: 64
[ÂÂÂ 0.000000] x86/fpu: xstate_offset[4]:Â 896, xstate_sizes[4]: 64
[ÂÂÂ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
[ÂÂÂ 0.000000] e820: BIOS-provided physical RAM map:
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000057fff] usable
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x0000000000058000-0x0000000000058fff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x0000000000059000-0x000000000009dfff] usable
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000e0fff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000c4a14fff] usable
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000c4a15000-0x00000000c4a15fff] ACPI NVS
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000c4a16000-0x00000000c4a3ffff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000c4a40000-0x00000000c91acfff] usable
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000c91ad000-0x00000000c9749fff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000c974a000-0x00000000c9776fff] ACPI data
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000c9777000-0x00000000cba86fff] ACPI NVS
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000cba87000-0x00000000cbefdfff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000cbefe000-0x00000000cbefefff] usable
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000cbf00000-0x00000000cbffffff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000fe000000-0x00000000fe010fff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[ÂÂÂ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000022f7fffff] usable
[ÂÂÂ 0.000000] NX (Execute Disable) protection: active
[ÂÂÂ 0.000000] e820: update [mem 0xc42c9018-0xc4321057] usable ==> usable
[ÂÂÂ 0.000000] e820: update [mem 0xc42c9018-0xc4321057] usable ==> usable
[ÂÂÂ 0.000000] e820: update [mem 0xc42b9018-0xc42c8c57] usable ==> usable
[ÂÂÂ 0.000000] e820: update [mem 0xc42b9018-0xc42c8c57] usable ==> usable
[ÂÂÂ 0.000000] e820: update [mem 0xc42a8018-0xc42b8257] usable ==> usable
[ÂÂÂ 0.000000] e820: update [mem 0xc42a8018-0xc42b8257] usable ==> usable
[ÂÂÂ 0.000000] extended physical RAM map:
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x0000000000000000-0x0000000000057fff] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x0000000000058000-0x0000000000058fff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x0000000000059000-0x000000000009dfff] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x000000000009e000-0x000000000009ffff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000000e0000-0x00000000000e0fff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x0000000000100000-0x00000000c42a8017] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c42a8018-0x00000000c42b8257] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c42b8258-0x00000000c42b9017] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c42b9018-0x00000000c42c8c57] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c42c8c58-0x00000000c42c9017] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c42c9018-0x00000000c4321057] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c4321058-0x00000000c4a14fff] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c4a15000-0x00000000c4a15fff] ACPI NVS
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c4a16000-0x00000000c4a3ffff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c4a40000-0x00000000c91acfff] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c91ad000-0x00000000c9749fff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c974a000-0x00000000c9776fff] ACPI data
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000c9777000-0x00000000cba86fff] ACPI NVS
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000cba87000-0x00000000cbefdfff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000cbefe000-0x00000000cbefefff] usable
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000cbf00000-0x00000000cbffffff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000fe000000-0x00000000fe010fff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[ÂÂÂ 0.000000] reserve setup_data: [mem 0x0000000100000000-0x000000022f7fffff] usable
[ÂÂÂ 0.000000] efi: EFI v2.40 by American Megatrends
[ÂÂÂ 0.000000] efi:Â ESRT=0xcbd9de18Â ACPI=0xc974f000Â ACPI 2.0=0xc974f000Â SMBIOS=0xcbd99000Â SMBIOS 3.0=0xcbd98000
[ÂÂÂ 0.000000] SMBIOS 3.0.0 present.
[ÂÂÂ 0.000000] DMI: SIEMENS AG RackPC_547G_HG-B.2.0/D3445-S1, BIOS V5.0.0.11 R1.11.0 for D3445-S1xÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 02/24/2016

3. paging_prepare(void) returns "struct paging_config". Copy by value. Is it
really specified by ABI or GCC itself that the second field (which is flag
"l5_required") will go to RDX register?

3.2.3 Parameter Passing


Returning of Values
The returning of values is done according to the following algorithm:


3. If the class is INTEGER, the next available register of the sequence
%rax, %rdx is used.

Got it, thank you.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature