Re: [PATCH 0/3] Add restrictions for kexec/kdump jumping between 5-level and 4-level kernel

From: Kirill A. Shutemov
Date: Sun Sep 02 2018 - 16:45:19 EST


On Thu, Aug 30, 2018 at 10:57:51PM +0800, Baoquan He wrote:
> On 08/30/18 at 05:27pm, Kirill A. Shutemov wrote:
> > On Thu, Aug 30, 2018 at 02:12:02PM +0000, Baoquan He wrote:
> > > On 08/30/18 at 04:58pm, Kirill A. Shutemov wrote:
> > > > On Wed, Aug 29, 2018 at 10:16:21PM +0800, Baoquan He wrote:
> > > > > This was suggested by Kirill several months ago, I worked out several
> > > > > patches to fix, then interrupted by other issues. So sort them out
> > > > > now and post for reviewing.
> > > >
> > > > Thanks for doing this.
> > > >
> > > > > The current upstream kernel supports 5-level paging mode and supports
> > > > > dynamically choosing paging mode during bootup according to kernel
> > > > > image, hardware and kernel parameter setting. This flexibility brings
> > > > > several issues for kexec/kdump:
> > > > > 1)
> > > > > Switching between paging modes, requires changes into target kernel.
> > > > > It means you cannot kexec() 4-level paging kernel from 5-level paging
> > > > > kernel if 4-level paging kernel doesn't include changes.
> > > > >
> > > > > 2)
> > > > > Switching from 5-level paging to 4-level paging kernel would fail, if
> > > > > kexec() put kernel image above 64TiB of memory.
> > > >
> > > > I'm not entirely sure that 64TiB is the limit here. Technically, 4-level
> > > > paging allows to address 256TiB in 1-to-1 mapping. We just don't have
> > > > machines with that wide physical address space (which don't support
> > > > 5-level paging too).
> > >
> > > Hmm, afaik, the MAX_PHYSMEM_BITS limits the maximum address space
> > > which physical RAM can mapped to. We have 256TB for the whole address
> > > space for 4-level paging, that includes user space and kernel space,
> > > it might not allow 256TB entirely for the direct mapping.
> > > And the direct mapping is only for physical RAM mapping, and
> > > kexec/kdump only cares about the physical RAM space and load them
> > > inside.
> > >
> > > # define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
> > >
> > > Not sure if my understanding is right, please correct me if I am wrong.
> >
> > IIRC, we only care about the place kexec puts the kernel before it gets
> > decompressed. After the decompression kernel will be put into the right
> > spot.
> >
> > Decompression is done in early boot where we use 1-to-1 mapping (not a
> > usual kernel virtual memory layout). All 256TiB should be reachable.
>
> My understanding that is although it's 1:1 identity mapping, it still
> has to be inside available physical RAM region. I don't remember what
> the old code did, now in __startup_64(),

I'm talking about the code that runs before __startup_64(), in
arch/x86/boot/compressed. Physcal memory start at virtual address 0 there,
without PAGE_OFFSET.

> you can see that there's a check like below, and at this time, it's
> still identity mapping.
>
> /* Is the address too large? */
> if (physaddr >> MAX_PHYSMEM_BITS)
> for (;;);
>
> Thanks
> Baoquan

--
Kirill A. Shutemov