Re: [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
From: Dongli Zhang
Date: Mon Oct 25 2021 - 01:21:15 EST
Hi Boris,
On 10/12/21 10:17 AM, Boris Ostrovsky wrote:
>
> On 10/12/21 3:24 AM, Dongli Zhang wrote:
>> The sched_clock() can be used very early since upstream
>> commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
>> with upstream commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock
>> time from 0"), kdump kernel in Xen HVM guest may panic at very early stage
>> when accessing &__this_cpu_read(xen_vcpu)->time as in below:
>
>
> Please drop "upstream". It's always upstream here.
>
>
>> +
>> + /*
>> + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'
>> + * and the VM would use them until xen_vcpu_setup() is used to
>> + * allocate/relocate them at arbitrary address.
>> + *
>> + * However, when Xen HVM guest panic on vcpu >= MAX_VIRT_CPUS,
>> + * per_cpu(xen_vcpu, cpu) is still NULL at this stage. To access
>> + * per_cpu(xen_vcpu, cpu) via xen_clocksource_read() would panic.
>> + *
>> + * Therefore we delay xen_hvm_init_time_ops() to
>> + * xen_hvm_smp_prepare_boot_cpu() when boot vcpu is >= MAX_VIRT_CPUS.
>> + */
>> + if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
>
>
> What about always deferring this when panicing? Would that work?
>
>
> Deciding whether to defer based on cpu number feels a bit awkward.
>
>
> -boris
>
I did some tests and I do not think this works well. I prefer to delay the
initialization only for VCPU >= 32.
This is the syslog if we always delay xen_hvm_init_time_ops(), regardless
whether VCPU >= 32.
[ 0.032372] Booting paravirtualized kernel on Xen HVM
[ 0.032376] clocksource: refined-jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 1910969940391419 ns
[ 0.037683] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:64
nr_node_ids:2
[ 0.041876] percpu: Embedded 49 pages/cpu s162968 r8192 d29544 u262144
--> There is a clock backwards from 0.041876 to 0.000010.
[ 0.000010] Built 2 zonelists, mobility grouping on. Total pages: 2015744
[ 0.000012] Policy zone: Normal
[ 0.000014] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-rc6xen+
root=UUID=2a5975ab-a059-4697-9aee-7a53ddfeea21 ro text console=ttyS0,115200n8
console=tty1 crashkernel=512M-:192M
This is because the initial pv_sched_clock is native_sched_clock(), and it
switches to xen_sched_clock() in xen_hvm_init_time_ops(). Is it fine to always
have a clock backward for non-kdump kernel?
To avoid the clock backward, we may register a dummy clocksource which always
returns 0, before xen_hvm_init_time_ops(). I do not think this is reasonable.
Thank you very much!
Dongli Zhang