Re: [PATCH v2 1/1] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
From: Dongli Zhang
Date: Thu Nov 04 2021 - 14:00:21 EST
Hi Boris,
On 11/1/21 10:34 AM, Boris Ostrovsky wrote:
>
> On 10/27/21 9:25 PM, Dongli Zhang wrote:
>> The sched_clock() can be used very early since
>> commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
>> with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from
>> 0"), kdump kernel in Xen HVM guest may panic at very early stage when
>> accessing &__this_cpu_read(xen_vcpu)->time as in below:
>>
>> setup_arch()
>> -> init_hypervisor_platform()
>> -> x86_init.hyper.init_platform = xen_hvm_guest_init()
>> -> xen_hvm_init_time_ops()
>> -> xen_clocksource_read()
>> -> src = &__this_cpu_read(xen_vcpu)->time;
>>
>> This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
>> embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
>> used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
>>
>> However, when Xen HVM guest panic on vcpu >= 32, since
>> xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when
>> vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.
>>
>> This patch delays xen_hvm_init_time_ops() to later in
>> xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is
>> registered when the boot vcpu is >= 32.
>>
>> Another option is to always delay xen_hvm_init_time_ops() for any vcpus
>> (including vcpu=0). Since to delay xen_hvm_init_time_ops() may lead to
>> clock backward issue,
>
>
> This is referring to
> https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg01516.html I
> assume?
No.
So far there are clock forward (e.g., from 0 to 65345) issue and clock backward
issue (e.g., from 2.432 to 0).
The clock forward issue can be resolved by above link to enforce clock update
during vcpu info registration. However, so far I am only able to reproduce clock
forward when "taskset -c 33 echo c > /proc/sysrq-trigger".
That is, I am not able to see any clock forward drift during regular boot (on
CPU=0), without the fix at hypervisor side.
The clock backward issue is because the native clock source is used if we delay
the initialization of xen clock source. We will see a backward when the source
is switched from native to xen.
>
>
>> it is preferred to avoid that for regular boot (The
>> pv_sched_clock=native_sched_clock() is used at the very beginning until
>> xen_sched_clock() is registered). That requires to adjust
>> xen_sched_clock_offset. That's why we only delay xen_hvm_init_time_ops()
>> for vcpu>=32.
>
>
> We delay only on VCPU>=32 because we want to avoid the clock going backwards due
> to hypervisor problem pointed to be the link above, not because we need to
> adjust xen_sched_clock_offset (which we could if we wanted).
I will add that.
Just to clarify that so far I am not able to reproduce the clock backward issue
during regular boot (on CPU=0), when the fix is not available at hypervisor side.
>
>
>>
>> This issue can be reproduced on purpose via below command at the guest
>> side when kdump/kexec is enabled:
>>
>> "taskset -c 33 echo c > /proc/sysrq-trigger"
>>
>> Reference:
>> https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg00571.html
>> Cc: Joe Jin <joe.jin@xxxxxxxxxx>
>> Signed-off-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
>> ---
>> Changed since v1:
>> - Add commit message to explain why xen_hvm_init_time_ops() is delayed
>> for any vcpus. (Suggested by Boris Ostrovsky)
>> - Add a comment in xen_hvm_smp_prepare_boot_cpu() referencing the related
>> code in xen_hvm_guest_init(). (suggested by Juergen Gross)
>>
>> arch/x86/xen/enlighten_hvm.c | 20 +++++++++++++++++++-
>> arch/x86/xen/smp_hvm.c | 8 ++++++++
>> 2 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
>> index e68ea5f4ad1c..7734dec52794 100644
>> --- a/arch/x86/xen/enlighten_hvm.c
>> +++ b/arch/x86/xen/enlighten_hvm.c
>> @@ -216,7 +216,25 @@ static void __init xen_hvm_guest_init(void)
>> WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_hvm, xen_cpu_dead_hvm));
>> xen_unplug_emulated_devices();
>> x86_init.irqs.intr_init = xen_init_IRQ;
>> - xen_hvm_init_time_ops();
>> +
>> + /*
>> + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'
>> + * and the VM would use them until xen_vcpu_setup() is used to
>> + * allocate/relocate them at arbitrary address.
>> + *
>> + * However, when Xen HVM guest panic on vcpu >= MAX_VIRT_CPUS,
>> + * per_cpu(xen_vcpu, cpu) is still NULL at this stage. To access
>> + * per_cpu(xen_vcpu, cpu) via xen_clocksource_read() would panic.
>> + *
>> + * Therefore we delay xen_hvm_init_time_ops() to
>> + * xen_hvm_smp_prepare_boot_cpu() when boot vcpu is >= MAX_VIRT_CPUS.
>> + */
>> + if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
>> + pr_info("Delay xen_hvm_init_time_ops() as kernel is running on
>> vcpu=%d\n",
>> + xen_vcpu_nr(0));
>> + else
>> + xen_hvm_init_time_ops();
>> +
>> xen_hvm_init_mmu_ops();
>> #ifdef CONFIG_KEXEC_CORE
>> diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
>> index 6ff3c887e0b9..f99043df8bb5 100644
>> --- a/arch/x86/xen/smp_hvm.c
>> +++ b/arch/x86/xen/smp_hvm.c
>> @@ -19,6 +19,14 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)
>> */
>> xen_vcpu_setup(0);
>> + /*
>> + * The xen_hvm_init_time_ops() is delayed from
>> + * xen_hvm_guest_init() to here to avoid panic when the kernel
>> + * boots from vcpu>=MAX_VIRT_CPUS (32).
>> + */
>
>
> How about
>
> /* Deferred call to xen_hvm_init_time_ops(). See comment in
> xen_hvm_guest_init() */
>
I will do that.
Thank you very much!
Dongli Zhang
>
> -boris
>
>
>
>> + if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
>> + xen_hvm_init_time_ops();
>> +
>> /*
>> * The alternative logic (which patches the unlock/lock) runs before
>> * the smp bootup up code is activated. Hence we need to set this up