Re: s390x: kdump kernel can not boot if I load kernel and initrd images via the kexec_file_load syscall.

From: lijiang
Date: Tue May 12 2020 - 06:04:12 EST


There may be more people who would like to care about this issue. So I added
them to cc list.

Thanks.
Lianbo

å 2020å05æ12æ 17:47, lijiang åé:
> Also added Dave Young to the cc list. Thanks.
>
> å 2020å05æ12æ 10:52, lijiang åé:
>> å 2020å05æ11æ 23:01, Philipp Rudo åé:
>>> Hi Lianbo,
>>>
>> Thank you for this reply, Philipp.
>>
>>> one more question. Does the same problem occur withe the kexec_load syscall,
>>> i.e. option '-c' instead of '-s'?
>>>
>> No, kdump kernel can boot with the kexec_load syscal option '-c'.
>>
>> Currently, I only found kdump kernel can not boot with the kexec_file_load syscall(option '-s').
>>
>>> Thanks
>>> Philipp
>>>
>>> On Mon, 11 May 2020 11:15:58 +0200
>>> Philipp Rudo <prudo@xxxxxxxxxxxxx> wrote:
>>>
>>>> Hi Lianbo,
>>>>
>>>> I believe that your crashkernel memory is simply too small. Pretty much at the
>>>> beginning of the kernel log you have
>>>>
>>>>> [ 0.070468] setup: The initial RAM disk does not fit into the memory
>>>>
>>>> Although I must say 256M should be enough for most purposes...
>>>>
>>>> Could you please retry with a bigger crashkernel memory?
>>>>
>>
>> I increased the size of crash memory to 512M(crashkernel=512M), kdump kernel still can
>> not boot, there is a same issue.
>>
>> I added some debug information in the arch/s390/kernel/setup.c, and got the following logs:
>>
>> [ 0.070885] Linux version 5.7.0-rc5+ (root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
>> (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC), GNU ld version 2.30-73.el8
>> ) #3 SMP Mon May 11 10:28:57 EDT 2020
>> [ 0.070888] setup: Linux is running as a z/VM guest operating system in 64-bi
>> t mode
>> [ 0.071125] lijiang-debug initrd_start:4aeeb000 size:17180900 <----------------------
>> [ 0.071128] setup: The maximum memory size is 2048MB
>> [ 0.071130] cma: Reserved 4 MiB at 0x000000001fc00000
>> [ 0.071131] setup: The initial RAM disk does not fit into the memory
>> [ 0.071132] lijiang-debug: check_initrd 810 start:4aeeb000, size:17180900 <----------------------
>> [ 0.099765] cpu: 2 configured CPUs, 0 standby CPUs
>>
>> The size of initrd is 17M, the 512M memory should be enough. I could suspect that kdump
>> kernel doesn't find an appropriate memory block, thereby this causes the failure.
>>
>> The compressed initrd is really decompressed in the unpack_to_rootfs().
>>
>> I have a s390 machine with 2cpus and 2G memory, which is too slow. :-)
>>
>>
>> Thanks.
>> Lianbo
>>
>>
>>>> Thanks
>>>> Philipp
>>>>
>>>>
>>>> On Fri, 8 May 2020 18:45:56 +0800
>>>> lijiang <lijiang@xxxxxxxxxx> wrote:
>>>>
>>>>> Hi, Philipp Rudo
>>>>>
>>>>> Sorry to disturb you. I ran into a problem on s390 machine, can you help to have a look?
>>>>>
>>>>> Kdump kernel can not boot on s390x machines if I load the kernel and initrd images with the kexec_file_load() syscall as below:
>>>>>
>>>>> #kexec -s -p /boot//boot/vmlinuz-5.7.0-rc4+ --initrd=/boot/initramfs-5.7.0-rc4+kdump.img --command-line="rd.dasd=0.0.0120 rd.dasd=0.0.0121 rd.dasd=0.0.0122 rd.dasd=0.0.0123 rd.znet=qeth,0.0.8000,0.0.8001,0.0.8002,layer2=1,portname=z-126,portno=0 $tuned_params BOOT_IMAGE=0 nr_cpus=1 cgroup_disable=memory numa=off udev.children-max=2 panic=10 rootflags=nofail transparent_hugepage=never novmcoredd nokaslr"
>>>>>
>>>>> But the kexec reboot can work well if I use the kexec_file_load() syscall as follow:
>>>>>
>>>>> #kexec -s -l /boot//boot/vmlinuz-5.7.0-rc4+ --initrd=/boot/initramfs-5.7.0-rc4+kdump.img --command-line="root=/dev/mapper/rhel_ibm--z--126-root crashkernel=256M rd.dasd=0.0.0120 rd.dasd=0.0.0121 rd.dasd=0.0.0122 rd.dasd=0.0.0123 rd.lvm.lv=rhel_ibm-z-126/root rd.lvm.lv=rhel_ibm-z-126/swap rd.znet=qeth,0.0.8000,0.0.8001,0.0.8002,layer2=1,portname=z-126,portno=0 $tuned_params BOOT_IMAGE=0"
>>>>>
>>>>> I added the debug information in the populate_rootfs() (init/initramfs.c), and I found that the address of initrd_start is null, and also
>>>>> checked the process of kexec file load, I didn't see any errors. It's strange. Any suggestions will be appreciated.
>>>>>
>>>>> BTW: I put the kernel log at the end.
>>>>>
>>>>> Thanks.
>>>>> Lianbo
>>>>>
>>>>>
>>>>> kdump kernel log:
>>>>>
>>>>> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
>>>>> CPU 01.
>>>>> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
>>>>> CPU 00.
>>>>> [ 0.070339] Linux version 5.7.0-rc4+ (root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
>>>>> (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC), GNU ld version 2.30-73.el8
>>>>> ) #2 SMP Thu May 7 22:29:25 EDT 2020
>>>>> [ 0.070344] setup: Linux is running as a z/VM guest operating system in 64-bi
>>>>> t mode
>>>>> [ 0.070464] setup: The maximum memory size is 2048MB
>>>>> [ 0.070468] cma: Reserved 4 MiB at 0x000000000fc00000
>>>>> [ 0.070468] setup: The initial RAM disk does not fit into the memory
>>>>> [ 0.112609] cpu: 2 configured CPUs, 0 standby CPUs
>>>>> [ 0.112731] Write protected kernel read-only data: 10116k
>>>>>
>>>>> [ 0.112747] Zone ranges:
>>>>> [ 0.112748] DMA [mem 0x0000000000000000-0x000000007fffffff]
>>>>> [ 0.112750] Normal empty
>>>>> [ 0.112751] Movable zone start for each node
>>>>> [ 0.112752] Early memory node ranges
>>>>> [ 0.112753] node 0: [mem 0x0000000000000000-0x000000000fffffff]
>>>>> [ 0.112772] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
>>>>> [ 0.115953] percpu: Embedded 33 pages/cpu s96256 r8192 d30720 u135168
>>>>> [ 0.115976] Built 1 zonelists, mobility grouping on. Total pages: 64512
>>>>> [ 0.115977] Policy zone: DMA
>>>>> [ 0.115979] Kernel command line: rd.dasd=0.0.0120 rd.dasd=0.0.0121 rd.dasd=0.
>>>>> 0.0122 rd.dasd=0.0.0123 rd.znet=qeth,0.0.8000,0.0.8001,0.0.8002,layer2=1,portnam
>>>>> e=z-126,portno=0 $tuned_params BOOT_IMAGE=0 nr_cpus=1 cgroup_disable=memory numa
>>>>> =off udev.children-max=2 panic=10 rootflags=nofail transparent_hugepage=never no
>>>>> vmcoredd nokaslr
>>>>> [ 0.117247] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, l
>>>>> inear)
>>>>> [ 0.117271] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes, li
>>>>> near)
>>>>> [ 0.117297] mem auto-init: stack:off, heap alloc:off, heap free:off
>>>>> [ 0.121169] Memory: 237484K/262144K available (7652K kernel code, 1384K rwdat
>>>>> a, 2464K rodata, 3324K init, 816K bss, 20564K reserved, 4096K cma-reserved)
>>>>> [ 0.121220] random: get_random_u64 called from cache_random_seq_create+0x6a/0
>>>>> x160 with crng_init=0
>>>>> [ 0.121310] SLUB: HWalign=256, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
>>>>> [ 0.121321] ftrace: allocating 25822 entries in 101 pages
>>>>> [ 0.137295] ftrace: allocated 101 pages with 4 groups
>>>>> [ 0.137389] rcu: Hierarchical RCU implementation.
>>>>> [ 0.137390] rcu: RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=1.
>>>>> [ 0.137392] rcu: RCU calculated value of scheduler-enlistment delay is 10 jif
>>>>> fies.
>>>>> [ 0.137393] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
>>>>> [ 0.140929] NR_IRQS: 3, nr_irqs: 3, preallocated irqs: 3
>>>>> [ 0.140977] clocksource: tod: mask: 0xffffffffffffffff max_cycles: 0x3b0a9be8
>>>>> 03b0a9, max_idle_ns: 1805497147909793 ns
>>>>> [ 0.141001] Console: colour dummy device 80x25
>>>>> [ 0.142713] random: fast init done
>>>>> [ 0.144414] printk: console [ttyS0] enabled
>>>>> [ 0.144563] pid_max: default: 32768 minimum: 301
>>>>> [ 0.144598] LSM: Security Framework initializing
>>>>> [ 0.144614] Yama: becoming mindful.
>>>>> [ 0.144621] SELinux: Initializing.
>>>>> [ 0.144655] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear
>>>>> )
>>>>> [ 0.144657] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, l
>>>>> inear)
>>>>> [ 0.144877] Disabling memory control group subsystem
>>>>> [ 0.145087] rcu: Hierarchical SRCU implementation.
>>>>> [ 0.145249] smp: Bringing up secondary CPUs ...
>>>>> [ 0.145251] smp: Brought up 1 node, 1 CPU
>>>>> [ 0.145365] devtmpfs: initialized
>>>>> [ 0.145529] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, ma
>>>>> x_idle_ns: 19112604462750000 ns
>>>>> [ 0.145532] futex hash table entries: 256 (order: 4, 65536 bytes, linear)
>>>>> [ 0.145776] NET: Registered protocol family 16
>>>>> [ 0.145835] audit: initializing netlink subsys (disabled)
>>>>> [ 0.145934] Spectre V2 mitigation: execute trampolines
>>>>> [ 0.146759] audit: type=2000 audit(1588911734.995:1): state=initialized audit
>>>>> _enabled=0 res=1
>>>>> [ 0.146830] HugeTLB registered 1.00 MiB page size, pre-allocated 0 pages
>>>>> [ 0.190956] cryptd: max_cpu_qlen set to 1000
>>>>> [ 0.194413] iommu: Default domain type: Translated
>>>>> [ 0.194510] SCSI subsystem initialized
>>>>> [ 0.194516] pps_core: LinuxPPS API ver. 1 registered
>>>>> [ 0.194518] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giome
>>>>> tti <giometti@xxxxxxxx>
>>>>> [ 0.194520] PTP clock support registered
>>>>> [ 0.199801] NetLabel: Initializing
>>>>> [ 0.199803] NetLabel: domain hash size = 128
>>>>> [ 0.199804] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
>>>>> [ 0.199818] NetLabel: unlabeled traffic allowed by default
>>>>> [ 0.219300] VFS: Disk quotas dquot_6.6.0
>>>>> [ 0.219322] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
>>>>> [ 0.219350] os_info: entry 0: not available (addr=0x0 size=0)
>>>>> [ 0.219352] os_info: entry 1: copied (addr=0x67a37000 size=200)
>>>>> [ 0.219353] os_info: crashkernel: addr=0x6fc00000 size=268435456
>>>>> [ 0.220405] NET: Registered protocol family 2
>>>>> [ 0.220552] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096
>>>>> bytes, linear)
>>>>> [ 0.220557] TCP established hash table entries: 2048 (order: 2, 16384 bytes,
>>>>> linear)
>>>>> [ 0.220570] TCP bind hash table entries: 2048 (order: 3, 32768 bytes, linear
>>>>> [ 0.220587] TCP: Hash tables configured (established 2048 bind 2048)
>>>>> [ 0.220607] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
>>>>> [ 0.220614] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
>>>>> [ 0.220649] NET: Registered protocol family 1
>>>>> [ 0.220657] NET: Registered protocol family 44
>>>>> [ 0.220695] jlb-debug: populate_rootfs 661 initrd:0
>>>>> [ 0.220697] jlb-debug: populate_rootfs 663
>>>>> [ 0.221396] alg: No test for crc32be (crc32be-vx)
>>>>> [ 0.221802] Initialise system trusted keyrings
>>>>> [ 0.221809] Key type blacklist registered
>>>>> [ 0.221826] workingset: timestamp_bits=45 max_order=16 bucket_order=0
>>>>> [ 0.223690] integrity: Platform Keyring initialized
>>>>> [ 0.227865] NET: Registered protocol family 38
>>>>> [ 0.227869] Key type asymmetric registered
>>>>> [ 0.227870] Asymmetric key parser 'x509' registered
>>>>> [ 0.227877] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 2
>>>>> 49)
>>>>> [ 0.227894] io scheduler mq-deadline registered
>>>>> [ 0.227896] io scheduler kyber registered
>>>>> [ 0.227924] io scheduler bfq registered
>>>>> [ 0.227972] atomic64_test: passed
>>>>> [ 0.228308] rdac: device handler registered
>>>>> [ 0.228326] hp_sw: device handler registered
>>>>> [ 0.228327] emc: device handler registered
>>>>> [ 0.228343] alua: device handler registered
>>>>> [ 0.228385] cio: Channel measurement facility initialized using format extend
>>>>> ed (mode autodetected)
>>>>> [ 0.228596] drop_monitor: Initializing network drop monitor service
>>>>> [ 0.228659] Initializing XFRM netlink socket
>>>>> [ 0.228760] NET: Registered protocol family 10
>>>>> [ 0.229046] Segment Routing with IPv6
>>>>> [ 0.229063] NET: Registered protocol family 17
>>>>> [ 0.229085] mpls_gso: MPLS GSO support
>>>>> [ 0.229136] registered taskstats version 1
>>>>> [ 0.229145] Loading compiled-in X.509 certificates
>>>>> [ 0.272961] Loaded X.509 cert 'Build time autogenerated kernel key: 6de832de3
>>>>> 5ed366a6c3c2d0e99b0d84ae243cb28'
>>>>> [ 0.273793] Key type big_key registered
>>>>> [ 0.273802] ima: No TPM chip found, activating TPM-bypass!
>>>>> [ 0.273805] ima: Allocated hash algorithm: sha1
>>>>> [ 0.273813] ima: No architecture policies found
>>>>> [ 0.273933] md: Waiting for all devices to be available before autodetect
>>>>> [ 0.273934] md: If you don't use raid, use raid=noautodetect
>>>>> [ 0.274074] md: Autodetecting RAID arrays.
>>>>> [ 0.274075] md: autorun ...
>>>>> [ 0.274076] md: ... autorun DONE.
>>>>> [ 0.274092] List of all partitions:
>>>>> [ 0.274093] No filesystem could mount root, tried:
>>>>> [ 0.274094]
>>>>> [ 0.274096] Kernel panic - not syncing: VFS: Unable to mount root fs on unkno
>>>>> wn-block(1,0)
>>>>> [ 0.274098] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #2
>>>>> [ 0.274099] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
>>>>> [ 0.274100] Call Trace:
>>>>> [ 0.274109] [<0000000000114302>] show_stack+0x8a/0xd0
>>>>> [ 0.274113] [<000000000057a1d2>] dump_stack+0x8a/0xb8
>>>>> [ 0.274116] [<0000000000147828>] panic+0x110/0x308
>>>>> [ 0.274121] [<0000000000c3d616>] mount_block_root+0x35e/0x360
>>>>> [ 0.274122] [<0000000000c3d824>] prepare_namespace+0x174/0x1b0
>>>>> [ 0.274124] [<0000000000c3d054>] kernel_init_freeable+0x2bc/0x2d0
>>>>> [ 0.274130] [<000000000086b5ea>] kernel_init+0x22/0x150
>>>>> [ 0.274133] [<00000000008759b0>] ret_from_fork+0x2c/0x30
>>>>> 00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 0010F444
>>>>> 00:
>>>>>
>>>