Re: [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu

From: Ouyang Zhaowei (Charles)
Date: Tue Apr 28 2015 - 08:31:28 EST




On 2015.4.26 7:31, Boris Ostrovsky wrote:
>
> On 04/24/2015 05:30 AM, Ouyang Zhaowei (Charles) wrote:
>> If a PVOPS VM has multi-cpu the vcpu_info of cpu0 is the member of the structure HYPERVISOR_shared_info,
>> and the others is not, but after 'xl save -c/restore' the vcpu_info will be reinitialized,
>> the vcpu_info of all the vcpus will be considered as the member of HYPERVISOR_shared_info.
>> This will cause the cpu1 and other cpu keep receiving interrupts, and the cpu0 is waiting them to
>> finish the job.
>> So we do not reinit the vcpu_info when PVOPS vm is doing 'xl save -c/restore'.
>>
>> Signed-off-by: Charles Ouyang <ouyangzhaowei@xxxxxxxxxx>
>> ---
>> arch/x86/xen/suspend.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>> index d949769..b2bed45 100644
>> --- a/arch/x86/xen/suspend.c
>> +++ b/arch/x86/xen/suspend.c
>> @@ -32,7 +32,8 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>> {
>> #ifdef CONFIG_XEN_PVHVM
>> int cpu;
>> - xen_hvm_init_shared_info();
>> + if (!suspend_cancelled)
>> + xen_hvm_init_shared_info();
>> xen_callback_vector();
>> xen_unplug_emulated_devices();
>> if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
>
> Do we need to call other routines if suspend is canceled?
>
> Also, if suspend is canceled then we don't do xen_irq_resume() if that's what you meant by "vcpu_info will be reinitialized". Were you referring some other re-initialization?
>
Hi Boris,

Sorry I didn't make myself clear.

About the "vcpu_info reinitialize", I mean in the function "xen_hvm_init_shared_info()" the pointer "xen_vcpu" will be reset and all
point to HYPERVISOR_shared_info->vcpu_info[cpu].

void __ref xen_hvm_init_shared_info(void)
----
1702 * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
1703 * online but xen_hvm_init_shared_info is run at resume time too and
1704 * in that case multiple vcpus might be online. */
1705 for_each_online_cpu(cpu) {
1706 /* Leave it to be NULL. */
1707 if (cpu >= MAX_VIRT_CPUS)
1708 continue;
1709 per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
1710 }
1711 }


But on Xen boot the init function "xen_start_kernel" only set the cpu0 to point to HYPERVISOR_shared_info->vcpu_info[0]

asmlinkage __visible void __init xen_start_kernel(void)
----
1563 /* Don't do the full vcpu_info placement stuff until we have a
1564 possible map and a non-dummy shared_info. */
1565 per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
1566
1567 local_irq_disable();

Other cpus are set to point to "xen_vcpu_info" in function xen_vcpu_setup().

So after xl save -c/restore, the pointer xen_vcpu will be reset in function "xen_hvm_init_shared_info" and point to a wrong place.
This may cause all the cpus cannot handle irqs except cpu0, so IMHO it's not necessary to call xen_hvm_init_shared_info again if
suspend is canceled.

> (The patch itself looks like the right thing to do though).
>
> -boris
>
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/