Re: instant reboot caused by 194a9749c73d650c0
From: Eric Dumazet
Date: Mon Apr 16 2018 - 11:37:08 EST
On 04/16/2018 02:15 AM, Kirill A. Shutemov wrote:
> On Mon, Apr 16, 2018 at 08:07:09AM +0200, Ingo Molnar wrote:
>>
>> * Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
>>
>>> Hi Kirill
>>>
>>> For some reason, my hosts instantly crash at boot time, with absolutely no log on console.
>>>
>>> Bisection pointed to :
>>>
>>> $ git bisect bad
>>> 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8 is the first bad commit
>>> commit 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8
>>> Author: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
>>> Date: Mon Mar 12 13:02:46 2018 +0300
>>>
>>> x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
>>
>> Could you please send your .config? These early boot problems are sometimes build
>> and Kconfig environment sensitive.
>>
>> A high level description of your hardware and the distro you are using would also
>> be useful.
>
> And how do you start the kernel? EFI? Legacy boot? kexec?
>
>>
>> Kirill, I'm curious about this change:
>>
>> - /* Calculate address we are running at */
>> - call 1f
>> -1: popl %edi
>> - subl $1b, %edi
>> + /* Calculate address of paging_enabled() once we are executing in the
>> trampoline */
>> + leal paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
>>
>> Here we change the calculation from a "discover where we are executing" method to
>> a calculation method (which is fundamentally more fragile) - why?
>
> I guess, I tried to save one register -- %rdi is used for return from
> trampoline.
>
> But you're right there's no reason to do this and it may be more fragile.
>
> Eric, could you check if the patch makes any difference?
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index fca012baba19..395c122ef70b 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -562,7 +562,10 @@ ENTRY(trampoline_32bit_src)
> movl %eax, %cr4
>
> /* Calculate address of paging_enabled() once we are executing in the trampoline */
> - leal paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
> + call 1f
> +1: popl %eax
> + subl $1b, %eax
> + leal paging_enabled(%eax), %eax
>
> /* Prepare the stack for far return to Long Mode */
> pushl $__KERNEL_CS
> diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
> index 91f75638f6e6..6ff7e81b5628 100644
> --- a/arch/x86/boot/compressed/pgtable.h
> +++ b/arch/x86/boot/compressed/pgtable.h
> @@ -6,7 +6,7 @@
> #define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
>
> #define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
> -#define TRAMPOLINE_32BIT_CODE_SIZE 0x60
> +#define TRAMPOLINE_32BIT_CODE_SIZE 0x70
>
> #define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
>
>
Hi Kirill
This patch did not help.
In the mean time, Greg told me that using gcc-4.9 instead of our old gcc-4.7 based toolchain was working better.
Thanks !