Re: [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue

From: Dongli Zhang
Date: Tue Oct 12 2021 - 11:50:56 EST


Hi Juergen,

On 10/12/21 1:47 AM, Juergen Gross wrote:
> On 12.10.21 09:24, Dongli Zhang wrote:
>> When the kdump/kexec is enabled at HVM VM side, to panic kernel will trap
>> to xen side with reason=soft_reset. As a result, the xen will reboot the VM
>> with the kdump kernel.
>>
>> Unfortunately, when the VM is panic with below command line ...
>>
>> "taskset -c 33 echo c > /proc/sysrq-trigger"
>>
>> ... the kdump kernel is panic at early stage ...
>>
>> PANIC: early exception 0x0e IP 10:ffffffffa8c66876 error 0 cr2 0x20
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc5xen #1
>> [    0.000000] Hardware name: Xen HVM domU
>> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
>> ... ...
>> [    0.000000] RSP: 0000:ffffffffaa203e20 EFLAGS: 00010082 ORIG_RAX:
>> 0000000000000000
>> [    0.000000] RAX: 0000000000000003 RBX: 0000000000010000 RCX: 00000000ffffdfff
>> [    0.000000] RDX: 0000000000000003 RSI: 00000000ffffdfff RDI: 0000000000000020
>> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
>> [    0.000000] R10: ffffffffaa203e00 R11: ffffffffaa203c70 R12: 0000000040000004
>> [    0.000000] R13: ffffffffaa203e5c R14: ffffffffaa203e58 R15: 0000000000000000
>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffaa95e000(0000)
>> knlGS:0000000000000000
>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.000000] CR2: 0000000000000020 CR3: 00000000ec9e0000 CR4: 00000000000406a0
>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    0.000000] Call Trace:
>> [    0.000000]  ? xen_init_time_common+0x11/0x55
>> [    0.000000]  ? xen_hvm_init_time_ops+0x23/0x45
>> [    0.000000]  ? xen_hvm_guest_init+0x214/0x251
>> [    0.000000]  ? 0xffffffffa8c00000
>> [    0.000000]  ? setup_arch+0x440/0xbd6
>> [    0.000000]  ? start_kernel+0x6a/0x689
>> [    0.000000]  ? secondary_startup_64_no_verify+0xc2/0xcb
>>
>> This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
>> embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
>> used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
>>
>>
>> The 1st patch is to fix the issue at VM kernel side. However, we may
>> observe clock drift at VM side due to the issue at xen hypervisor side.
>> This is because the pv vcpu_time_info is not updated when
>> VCPUOP_register_vcpu_info.
>>
>> The 2nd patch is to force_update_vcpu_system_time() at xen side when
>> VCPUOP_register_vcpu_info, to avoid the VM clock drift during kdump kernel
>> boot.
>
> Please don't mix patches for multiple projects in one series.
>
> In cases like this it is fine to mention the other project's patch
> verbally instead.
>

I will split the patchset in v2 and email to different projects.

The core ideas of this combined patchset are:

1. Fix at HVM domU side (kdump kernel panic)

2. Fix at Xen hypervisor side (clock drift issue in kdump kernel)

3. To report (or seek for help) that soft_reset does not work with mainline-xen
so that I am not able to test my patchset with the most recent mainline xen.

Thank you very much!

Dongli Zhang