Re: [PATCH] xen/pvhvm: Support more than 32 VCPUs when migrating (v3).
From: Konrad Rzeszutek Wilk
Date: Thu Jul 21 2016 - 10:14:15 EST
On Fri, Jul 10, 2015 at 02:57:51PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Jul 10, 2015 at 02:37:46PM -0400, Konrad Rzeszutek Wilk wrote:
> > When Xen migrates an HVM guest, by default its shared_info can
> > only hold up to 32 CPUs. As such the hypercall
> > VCPUOP_register_vcpu_info was introduced which allowed us to
> > setup per-page areas for VCPUs. This means we can boot PVHVM
> > guest with more than 32 VCPUs. During migration the per-cpu
> > structure is allocated freshly by the hypervisor (vcpu_info_mfn
> > is set to INVALID_MFN) so that the newly migrated guest
> > can make an VCPUOP_register_vcpu_info hypercall.
> >
> > Unfortunatly we end up triggering this condition in Xen:
> > /* Run this command on yourself or on other offline VCPUS. */
> > if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
> >
> > which means we are unable to setup the per-cpu VCPU structures
> > for running vCPUS. The Linux PV code paths make this work by
> > iterating over every vCPU with:
> >
> > 1) is target CPU up (VCPUOP_is_up hypercall?)
> > 2) if yes, then VCPUOP_down to pause it.
> > 3) VCPUOP_register_vcpu_info
> > 4) if it was down, then VCPUOP_up to bring it back up
> >
> > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> > not allowed on HVM guests we can't do this. However with the
> > Xen git commit f80c5623a126afc31e6bb9382268d579f0324a7a
> > ("xen/x86: allow HVM guests to use hypercalls to bring up vCPUs"")
>
> <sigh> I was in my local tree which was Roger's 'hvm_without_dm_v3'
> looking at patches and spotted this - and thought it was already in!
>
> Sorry about this patch - and please ignore it until the VCPU_op*
> can be used by HVM guests.
The corresponding patch is in Xen tree (192df6f9122ddebc21d0a632c10da3453aeee1c2)
Could folks take a look at the patch pls?
Without it you can't migrate an Linux guest with more than 32 vCPUs.
>
> > we can do this. As such first check if VCPUOP_is_up is actually
> > possible before trying this dance.
> >
> > As most of this dance code is done already in 'xen_setup_vcpu'
> > lets make it callable on both PV and HVM. This means moving one
> > of the checks out to 'xen_setup_runstate_info'.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > ---
> > arch/x86/xen/enlighten.c | 23 +++++++++++++++++------
> > arch/x86/xen/suspend.c | 7 +------
> > arch/x86/xen/time.c | 3 +++
> > 3 files changed, 21 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > index 46957ea..187dec6 100644
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -238,12 +238,23 @@ static void xen_vcpu_setup(int cpu)
> > void xen_vcpu_restore(void)
> > {
> > int cpu;
> > + bool vcpuops = true;
> > + const struct cpumask *mask;
> >
> > - for_each_possible_cpu(cpu) {
> > + mask = xen_pv_domain() ? cpu_possible_mask : cpu_online_mask;
> > +
> > + /* Only Xen 4.5 and higher supports this. */
> > + if (HYPERVISOR_vcpu_op(VCPUOP_is_up, smp_processor_id(), NULL) == -ENOSYS)
> > + vcpuops = false;
> > +
> > + for_each_cpu(cpu, mask) {
> > bool other_cpu = (cpu != smp_processor_id());
> > - bool is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL);
> > + bool is_up = false;
> >
> > - if (other_cpu && is_up &&
> > + if (vcpuops)
> > + is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL);
> > +
> > + if (vcpuops && other_cpu && is_up &&
> > HYPERVISOR_vcpu_op(VCPUOP_down, cpu, NULL))
> > BUG();
> >
> > @@ -252,7 +263,7 @@ void xen_vcpu_restore(void)
> > if (have_vcpu_info_placement)
> > xen_vcpu_setup(cpu);
> >
> > - if (other_cpu && is_up &&
> > + if (vcpuops && other_cpu && is_up &&
> > HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL))
> > BUG();
> > }
> > @@ -1704,8 +1715,8 @@ void __ref xen_hvm_init_shared_info(void)
> > * in that case multiple vcpus might be online. */
> > for_each_online_cpu(cpu) {
> > /* Leave it to be NULL. */
> > - if (cpu >= MAX_VIRT_CPUS)
> > - continue;
> > + if (cpu >= MAX_VIRT_CPUS && cpu <= NR_CPUS)
> > + per_cpu(xen_vcpu, cpu) = NULL; /* Triggers xen_vcpu_setup.*/
> > per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
> > }
> > }
> > diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> > index 53b4c08..cd66397 100644
> > --- a/arch/x86/xen/suspend.c
> > +++ b/arch/x86/xen/suspend.c
> > @@ -31,15 +31,10 @@ static void xen_pv_pre_suspend(void)
> > static void xen_hvm_post_suspend(int suspend_cancelled)
> > {
> > #ifdef CONFIG_XEN_PVHVM
> > - int cpu;
> > xen_hvm_init_shared_info();
> > xen_callback_vector();
> > xen_unplug_emulated_devices();
> > - if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> > - for_each_online_cpu(cpu) {
> > - xen_setup_runstate_info(cpu);
> > - }
> > - }
> > + xen_vcpu_restore();
> > #endif
> > }
> >
> > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> > index 55da33b..6882d0c 100644
> > --- a/arch/x86/xen/time.c
> > +++ b/arch/x86/xen/time.c
> > @@ -105,6 +105,9 @@ void xen_setup_runstate_info(int cpu)
> > {
> > struct vcpu_register_runstate_memory_area area;
> >
> > + if (xen_hvm_domain() && !(xen_feature(XENFEAT_hvm_safe_pvclock)))
> > + return;
> > +
> > area.addr.v = &per_cpu(xen_runstate, cpu);
> >
> > if (HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area,
> > --
> > 2.1.0
> >