Re: Purgatory compile flag changes apparently causing Kexec relocation overflows

From: Nick Desaulniers
Date: Wed Aug 28 2019 - 17:51:47 EST


On Wed, Aug 28, 2019 at 12:42 PM Steve Wahl <steve.wahl@xxxxxxx> wrote:
>
> Please CC me on responses to this.
>
> I normally would do more diligence on this, but the timing is such
> that I think it's better to get this out sooner.
>
> With the tip of the tree from https://github.com/torvalds/linux.git (a
> few days old, most recent commit fetched is
> bb7ba8069de933d69cb45dd0a5806b61033796a3), I'm seeing "kexec: Overflow
> in relocation type 11 value 0x11fffd000" when I try to load a crash
> kernel with kdump. This seems to be caused by commit
> 059f801a937d164e03b33c1848bb3dca67c0b04, which changed the compiler
> flags used to compile purgatory.ro, apparently creating 32 bit
> relocations for things that aren't necessarily reachable with a 32 bit
> reference. My guess is this only occurs when the crash kernel is
> located outside 32-bit addressable physical space.
>
> I have so far verified that the problem occurs with that commit, and
> does not occur with the previous commit. For this commit, Thomas
> Gleixner mentioned a few of the changed flags should have been looked
> at twice. I have not gone so far as to figure out which flags cause
> the problem.
>
> The hardware in use is a HPE Superdome Flex with 48 * 32GiB dimms
> (total 1536 GiB).
>
> One example of the exact error messages seen:
>
> 019-08-28T13:42:39.308110-05:00 uv4test14 kernel: [ 45.137743] kexec: Overflow in relocation type 11 value 0x17f7affd000
> 2019-08-28T13:42:39.308123-05:00 uv4test14 kernel: [ 45.137749] kexec-bzImage64: Loading purgatory failed

Thanks for the report and sorry for the breakage. Can you please send
me more information for how to precisely reproduce the issue? I'm
happy to look into fixing it.

Let me go dig up the different listed flags. Steve, it may be fastest
for you to test re-adding them in your setup to see which one is
important.

Tglx, if you want to revert the above patches, I'm ok with that. It's
important that we fix the issue eventually that my patches were meant
to address, but precisely *when* it's solved isn't critical; our
kernels can carry out of tree patches for now until the issue is
completely resolved worst case.
--
Thanks,
~Nick Desaulniers