Re: [PATCH 1/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end

From: Baoquan He
Date: Thu Sep 08 2022 - 09:34:33 EST


On 09/06/22 at 03:05pm, Ard Biesheuvel wrote:
> On Mon, 5 Sept 2022 at 14:08, Baoquan He <bhe@xxxxxxxxxx> wrote:
> >
> > On 09/05/22 at 01:28pm, Mike Rapoport wrote:
> > > On Thu, Sep 01, 2022 at 08:25:54PM +0800, Baoquan He wrote:
> > > > On 09/01/22 at 10:24am, Mike Rapoport wrote:
> > > >
> > > > max_zone_phys() only handles cases when CONFIG_ZONE_DMA/DMA32 enabled,
> > > > the disabledCONFIG_ZONE_DMA/DMA32 case is not included. I can change
> > > > it like:
> > > >
> > > > static phys_addr_t __init crash_addr_low_max(void)
> > > > {
> > > > phys_addr_t low_mem_mask = U32_MAX;
> > > > phys_addr_t phys_start = memblock_start_of_DRAM();
> > > >
> > > > if ((!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) ||
> > > > (phys_start > U32_MAX))
> > > > low_mem_mask = PHYS_ADDR_MAX;
> > > >
> > > > return low_mem_mast + 1;
> > > > }
> > > >
> > > > or add the disabled CONFIG_ZONE_DMA/DMA32 case into crash_addr_low_max()
> > > > as you suggested. Which one do you like better?
> > > >
> > > > static phys_addr_t __init crash_addr_low_max(void)
> > > > {
> > > > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
> > > > return PHYS_ADDR_MAX + 1;
> > > >
> > > > return max_zone_phys(32);
> > > > }
> > >
> > > I like the second variant better.
> >
> > Sure, will change to use the 2nd one . Thanks.
> >
>
> While I appreciate the effort that has gone into solving this problem,
> I don't think there is any consensus that an elaborate fix is required
> to ensure that the crash kernel can be unmapped from the linear map at
> all cost. In fact, I personally think we shouldn't bother, and IIRC,
> Will made a remark along the same lines back when the Huawei engineers
> were still driving this effort.
>
> So perhaps we could align on that before doing yet another version of this?

Yes, certainly. That can save everybody's effort if there's different
opinion. Thanks for looking into this and the suggestion.

About Will's remark, I checked those discussing threads, guess you are
mentioning the words in link [1]. I copy them at bottom for better
reference. Pleasae correct me if I am wrong.

With my understanding, Will said so because the patch is too complex,
and there's risk that page table kernel data itself is using could share
the same block/section mapping as crashkernel region. With these
two cons, I agree with Will that we would rather take off the protection
on crashkernel region which is done by mapping or unmapping the region,
even though the protection enhances kdump's ronusness.

Crashkernel reservation needs to know the low meory end so that DMA
buffer can be addressed by the dumping target, e.g storage disk. On the
current arm64, we have facts:
1)Currently, except of Raspberry Pi 4, all arm64 systems can support
32bit DMA addressing. So, except of RPi4, the low memory end can be
decided after memblock init is done, namely at the end of
arm64_memblock_init(). We don't need to defer the crashkernel
reservation until zone_sizes_init() is done. Those cases can be checked
in patch code.
2)For RPi4, if its storage disk is 30bit DMA addressing, then we can
use crashkernel=xM@yM to specify reservation location under 1G to
work around this.

***
Based on above facts, with my patch applied:
pros:
1) Performance issue is resolved;
2) As you can see, the code with this patch applied will much
simpler, more straightforward and clearer;
3) The protection can be kept;
4) Crashkernel reservation can be easier to succeed on small memory
system, e.g virt guest system. The earlier the reservation is done,
it's more likely to get the whole chunk of meomry.
cons:
1) Only RPi4 is put in inconvenience for crashkernel reservation. It
needs to use crashkernel=xM@yM to work around.

***
Take off the protection which is done by mapping or unmapping
crashkernel region as you and Will suggested:
pros:
1) Performance issue is resolved;
2) RPi4 will have the same convenience to set crashkernel;

cons:
1) No protection is taken on crashkernel region;
2) Code logic is twisting. There are two places to separately reserve
crashkernel, one is at the end of arm64_memblock_init(), one is at
the end of bootmem_init().
3) Except of both CONFIG_ZONE_DMA|DMA32 disabled case, crashkernel
reservation is deferred. On small memory system, e.g virt guest system,
it increases risk that the resrevation could fail very possibly caused
by memory fragmentation.

Besides, comparing the above two solutions, I also want to say kdump
is developed for enterprise level of system. We need combine with
reality when considering reasonable solution. E.g on x86_64, it has DMA
zone of 16M and DMA32 zone from 16M to 4G always in normal kernel. For
kdump, we ignore DMA zone directly because it's for ISA style devices.
Kdump doesn't support ISA style device with only 24bit DMA addressing
capability at the beginning, because it doesn't make sense, we never
hear that an enterprise level of x86_64 system needs to arm with kdump.

Hi Ard, Will, Catalin and other reviewers,

Above is my understaning and thinking about the encountered issue,
plesae help check and point out what's missing or incorrect.

Hi Nicolas,

If it's convenient to you, please help make clear if the storage disk or
network card can only address 32bit DMA buffer on RPi4. Really
appreciate that.

***
[1]Will's remark on Huawei's patch
https://lore.kernel.org/all/20220718131005.GA12406@willie-the-truck/T/#u

====quote Will's remark here
I do not think that this complexity is justified. As I have stated on
numerous occasions already, I would prefer that we leave the crashkernel
mapped when rodata is not "full". That fixes your performance issue and
matches what we do for module code, so I do not see a security argument
against it.

I do not plan to merge this patch as-is.
===