Re: [PATCH v7 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel

From: Prabhakar Kushwaha
Date: Mon Mar 09 2020 - 01:00:13 EST


. Hi John,

On Sun, Mar 8, 2020 at 12:13 AM John Donnelly
<john.p.donnelly@xxxxxxxxxx> wrote:
>
>
>
> > On Mar 7, 2020, at 5:06 AM, Chen Zhou <chenzhou10@xxxxxxxxxx> wrote:
> >
> >
> >
> > On 2020/3/5 18:13, Prabhakar Kushwaha wrote:
> >> On Mon, Dec 23, 2019 at 8:57 PM Chen Zhou <chenzhou10@xxxxxxxxxx> wrote:
> >>>
> >>> Crashkernel=X tries to reserve memory for the crash dump kernel under
> >>> 4G. If crashkernel=X,low is specified simultaneously, reserve spcified
> >>> size low memory for crash kdump kernel devices firstly and then reserve
> >>> memory above 4G.
> >>>
> >>> Signed-off-by: Chen Zhou <chenzhou10@xxxxxxxxxx>
> >>> ---
> >>> arch/arm64/kernel/setup.c | 8 +++++++-
> >>> arch/arm64/mm/init.c | 31 +++++++++++++++++++++++++++++--
> >>> 2 files changed, 36 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> >>> index 56f6645..04d1c87 100644
> >>> --- a/arch/arm64/kernel/setup.c
> >>> +++ b/arch/arm64/kernel/setup.c
> >>> @@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
> >>> kernel_data.end <= res->end)
> >>> request_resource(res, &kernel_data);
> >>> #ifdef CONFIG_KEXEC_CORE
> >>> - /* Userspace will find "Crash kernel" region in /proc/iomem. */
> >>> + /*
> >>> + * Userspace will find "Crash kernel" region in /proc/iomem.
> >>> + * Note: the low region is renamed as Crash kernel (low).
> >>> + */
> >>> + if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> >>> + crashk_low_res.end <= res->end)
> >>> + request_resource(res, &crashk_low_res);
> >>> if (crashk_res.end && crashk_res.start >= res->start &&
> >>> crashk_res.end <= res->end)
> >>> request_resource(res, &crashk_res);
> >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >>> index b65dffd..0d7afd5 100644
> >>> --- a/arch/arm64/mm/init.c
> >>> +++ b/arch/arm64/mm/init.c
> >>> @@ -80,6 +80,7 @@ static void __init reserve_crashkernel(void)
> >>> {
> >>> unsigned long long crash_base, crash_size;
> >>> int ret;
> >>> + phys_addr_t crash_max = arm64_dma32_phys_limit;
> >>>
> >>> ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> >>> &crash_size, &crash_base);
> >>> @@ -87,12 +88,38 @@ static void __init reserve_crashkernel(void)
> >>> if (ret || !crash_size)
> >>> return;
> >>>
> >>> + ret = reserve_crashkernel_low();
> >>> + if (!ret && crashk_low_res.end) {
> >>> + /*
> >>> + * If crashkernel=X,low specified, there may be two regions,
> >>> + * we need to make some changes as follows:
> >>> + *
> >>> + * 1. rename the low region as "Crash kernel (low)"
> >>> + * In order to distinct from the high region and make no effect
> >>> + * to the use of existing kexec-tools, rename the low region as
> >>> + * "Crash kernel (low)".
> >>> + *
> >>> + * 2. change the upper bound for crash memory
> >>> + * Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
> >>> + *
> >>> + * 3. mark the low region as "nomap"
> >>> + * The low region is intended to be used for crash dump kernel
> >>> + * devices, just mark the low region as "nomap" simply.
> >>> + */
> >>> + const char *rename = "Crash kernel (low)";
> >>> +
> >>> + crashk_low_res.name = rename;
> >>> + crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
> >>> + memblock_mark_nomap(crashk_low_res.start,
> >>> + resource_size(&crashk_low_res));
> >>> + }
> >>> +
> >>> crash_size = PAGE_ALIGN(crash_size);
> >>>
> >>> if (crash_base == 0) {
> >>> /* Current arm64 boot protocol requires 2MB alignment */
> >>> - crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
> >>> - crash_size, SZ_2M);
> >>> + crash_base = memblock_find_in_range(0, crash_max, crash_size,
> >>> + SZ_2M);
> >>> if (crash_base == 0) {
> >>> pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> >>> crash_size);
> >>> --
> >>
> >> I tested this patch series on ARM64-ThunderX2 with no issue with
> >> bootargs crashkenel=X@Y crashkernel=250M,low
> >>
> >> $ dmesg | grep crash
> >> [ 0.000000] crashkernel reserved: 0x0000000b81200000 -
> >> 0x0000000c81200000 (4096 MB)
> >> [ 0.000000] Kernel command line:
> >> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> >> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro
> >> crashkernel=4G@0xb81200000 crashkernel=250M,low nowatchdog earlycon
> >> [ 29.310209] crashkernel=250M,low
> >>
> >> $ kexec -p -i /boot/vmlinuz-`uname -r`
> >> --initrd=/boot/initrd.img-`uname -r` --reuse-cmdline
> >> $ echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
> >>
> >> But when i tried with crashkernel=4G crashkernel=250M,low as bootargs.
> >> Kernel is not able to allocate memory.
> >> [ 0.000000] cannot allocate crashkernel (size:0x100000000)
> >> [ 0.000000] Kernel command line:
> >> BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc4+
> >> root=UUID=866b8df3-14f4-4e11-95a1-74a90ee9b694 ro crashkernel=4G
> >> crashkernel=250M,low nowatchdog
> >> [ 29.332081] crashkernel=250M,low
> >>
> >> does crashkernel=X@Y mandatory to get allocated beyond 4G?
> >> am I missing something?
> >
>
> crashkernel=4G
>
> You need to look at the memory map on node 0 from dmesg ( or /proc/iomem ) to determine if there is any memory in that range - 0x100000000 == 1st byte above 4G .
>

i believe i have enough free memory. Please find log below

$ dmesg | grep "node 0"
[ 0.000000] Initmem setup node 0 [mem 0x00000000802f0000-0x0000009ffcffffff]
[ 0.000000] On node 0 totalpages: 33537296
[ 12.335714] pci_bus 0000:00: on NUMA node 0
$

I am passing 4G@0xb81200000 in working scenario, here 0xb81200000 is
well within node 0 range.

Logs of iomem is below:

$ cat /proc/iomem
00000000-00000000 : PCI ECAM
00000000-00000000 : PCI ECAM
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:0f
00000000-00000000 : PCI Bus 0000:10
00000000-00000000 : 0000:10:00.0
00000000-00000000 : 0000:10:00.0
00000000-00000000 : PCI Bus 0000:01
00000000-00000000 : 0000:01:00.0
00000000-00000000 : 0000:01:00.1
00000000-00000000 : PCI Bus 0000:05
00000000-00000000 : 0000:05:00.0
00000000-00000000 : 0000:05:00.1
00000000-00000000 : PCI Bus 0000:09
00000000-00000000 : 0000:09:00.0
00000000-00000000 : 0000:09:00.1
00000000-00000000 : 0000:00:10.0
00000000-00000000 : ahci
00000000-00000000 : 0000:00:10.1
00000000-00000000 : ahci
00000000-00000000 : PCI Bus 0000:80
00000000-00000000 : PCI Bus 0000:83
00000000-00000000 : 0000:83:00.0
00000000-00000000 : 0000:83:00.0
00000000-00000000 : nvme
00000000-00000000 : PCI Bus 0000:89
00000000-00000000 : 0000:89:00.0
00000000-00000000 : e1000e
00000000-00000000 : 0000:89:00.0
00000000-00000000 : 0000:89:00.0
00000000-00000000 : e1000e
00000000-00000000 : 0000:89:00.0
00000000-00000000 : e1000e
00000000-00000000 : PCI Bus 0000:8d
00000000-00000000 : 0000:8d:00.0
00000000-00000000 : 0000:8d:00.0
00000000-00000000 : mpt3sas
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : Kernel code
00000000-00000000 : reserved
00000000-00000000 : Kernel data
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : CAV901C:00
00000000-00000000 : CAV901D:00
00000000-00000000 : CAV901C:00
00000000-00000000 : CAV901E:00
00000000-00000000 : CAV901C:00
00000000-00000000 : CAV901F:00
00000000-00000000 : CAV901C:00
00000000-00000000 : CAV9006:00
00000000-00000000 : CAV9006:00
00000000-00000000 : ARMH0011:00
00000000-00000000 : ARMH0011:00
00000000-00000000 : arm-smmu-v3.0.auto
00000000-00000000 : arm-smmu-v3.0.auto
00000000-00000000 : arm-smmu-v3.1.auto
00000000-00000000 : arm-smmu-v3.1.auto
00000000-00000000 : arm-smmu-v3.2.auto
00000000-00000000 : arm-smmu-v3.2.auto
00000000-00000000 : CAV901C:01
00000000-00000000 : CAV901D:01
00000000-00000000 : CAV901C:01
00000000-00000000 : CAV901E:01
00000000-00000000 : CAV901C:01
00000000-00000000 : CAV901F:01
00000000-00000000 : CAV901C:01
00000000-00000000 : CAV9007:06
00000000-00000000 : CAV9007:06
00000000-00000000 : arm-smmu-v3.3.auto
00000000-00000000 : arm-smmu-v3.3.auto
00000000-00000000 : arm-smmu-v3.4.auto
00000000-00000000 : arm-smmu-v3.4.auto
00000000-00000000 : arm-smmu-v3.5.auto
00000000-00000000 : arm-smmu-v3.5.auto
00000000-00000000 : System RAM
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : reserved
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:01
00000000-00000000 : 0000:01:00.0
00000000-00000000 : 0000:01:00.1
00000000-00000000 : 0000:01:00.0
00000000-00000000 : 0000:01:00.1
00000000-00000000 : 0000:01:00.0
00000000-00000000 : 0000:01:00.1
00000000-00000000 : PCI Bus 0000:05
00000000-00000000 : 0000:05:00.0
00000000-00000000 : bnx2x
00000000-00000000 : 0000:05:00.1
00000000-00000000 : bnx2x
00000000-00000000 : 0000:05:00.0
00000000-00000000 : bnx2x
00000000-00000000 : 0000:05:00.0
00000000-00000000 : bnx2x
00000000-00000000 : 0000:05:00.1
00000000-00000000 : bnx2x
00000000-00000000 : 0000:05:00.1
00000000-00000000 : bnx2x
00000000-00000000 : PCI Bus 0000:09
00000000-00000000 : 0000:09:00.0
00000000-00000000 : i40e
00000000-00000000 : 0000:09:00.1
00000000-00000000 : i40e
00000000-00000000 : 0000:09:00.0
00000000-00000000 : 0000:09:00.1
00000000-00000000 : 0000:09:00.0
00000000-00000000 : i40e
00000000-00000000 : 0000:09:00.1
00000000-00000000 : i40e
00000000-00000000 : 0000:09:00.0
00000000-00000000 : 0000:09:00.1
00000000-00000000 : 0000:00:0f.0
00000000-00000000 : xhci-hcd
00000000-00000000 : 0000:00:0f.0
00000000-00000000 : 0000:00:0f.1
00000000-00000000 : xhci-hcd
00000000-00000000 : 0000:00:0f.1
00000000-00000000 : 0000:00:10.0
00000000-00000000 : ahci
00000000-00000000 : 0000:00:10.1
00000000-00000000 : ahci
00000000-00000000 : PCI Bus 0000:80

--pk