Re: [PATCH] xen/pvhvm: Support more than 32 VCPUs when migrating (v3).
From: Boris Ostrovsky
Date: Thu Jul 21 2016 - 11:06:04 EST
On 07/21/2016 10:14 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jul 10, 2015 at 02:57:51PM -0400, Konrad Rzeszutek Wilk wrote:
>> On Fri, Jul 10, 2015 at 02:37:46PM -0400, Konrad Rzeszutek Wilk wrote:
>>> When Xen migrates an HVM guest, by default its shared_info can
>>> only hold up to 32 CPUs. As such the hypercall
>>> VCPUOP_register_vcpu_info was introduced which allowed us to
>>> setup per-page areas for VCPUs. This means we can boot PVHVM
>>> guest with more than 32 VCPUs. During migration the per-cpu
>>> structure is allocated freshly by the hypervisor (vcpu_info_mfn
>>> is set to INVALID_MFN) so that the newly migrated guest
>>> can make an VCPUOP_register_vcpu_info hypercall.
>>>
>>> Unfortunatly we end up triggering this condition in Xen:
>>> /* Run this command on yourself or on other offline VCPUS. */
>>> if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
>>>
>>> which means we are unable to setup the per-cpu VCPU structures
>>> for running vCPUS. The Linux PV code paths make this work by
>>> iterating over every vCPU with:
>>>
>>> 1) is target CPU up (VCPUOP_is_up hypercall?)
>>> 2) if yes, then VCPUOP_down to pause it.
>>> 3) VCPUOP_register_vcpu_info
>>> 4) if it was down, then VCPUOP_up to bring it back up
>>>
>>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
>>> not allowed on HVM guests we can't do this. However with the
>>> Xen git commit f80c5623a126afc31e6bb9382268d579f0324a7a
>>> ("xen/x86: allow HVM guests to use hypercalls to bring up vCPUs"")
>> <sigh> I was in my local tree which was Roger's 'hvm_without_dm_v3'
>> looking at patches and spotted this - and thought it was already in!
>>
>> Sorry about this patch - and please ignore it until the VCPU_op*
>> can be used by HVM guests.
> The corresponding patch is in Xen tree (192df6f9122ddebc21d0a632c10da3453aeee1c2)
>
> Could folks take a look at the patch pls?
>
> Without it you can't migrate an Linux guest with more than 32 vCPUs.
>
>>> we can do this. As such first check if VCPUOP_is_up is actually
>>> possible before trying this dance.
>>>
>>> As most of this dance code is done already in 'xen_setup_vcpu'
>>> lets make it callable on both PV and HVM. This means moving one
>>> of the checks out to 'xen_setup_runstate_info'.
>>>
>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>>> ---
>>> arch/x86/xen/enlighten.c | 23 +++++++++++++++++------
>>> arch/x86/xen/suspend.c | 7 +------
>>> arch/x86/xen/time.c | 3 +++
>>> 3 files changed, 21 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>> index 46957ea..187dec6 100644
>>> --- a/arch/x86/xen/enlighten.c
>>> +++ b/arch/x86/xen/enlighten.c
>>> @@ -238,12 +238,23 @@ static void xen_vcpu_setup(int cpu)
>>> void xen_vcpu_restore(void)
>>> {
>>> int cpu;
>>> + bool vcpuops = true;
>>> + const struct cpumask *mask;
>>>
>>> - for_each_possible_cpu(cpu) {
>>> + mask = xen_pv_domain() ? cpu_possible_mask : cpu_online_mask;
>>> +
>>> + /* Only Xen 4.5 and higher supports this. */
>>> + if (HYPERVISOR_vcpu_op(VCPUOP_is_up, smp_processor_id(), NULL) == -ENOSYS)
>>> + vcpuops = false;
>>> +
>>> + for_each_cpu(cpu, mask) {
>>> bool other_cpu = (cpu != smp_processor_id());
>>> - bool is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL);
>>> + bool is_up = false;
>>>
>>> - if (other_cpu && is_up &&
>>> + if (vcpuops)
>>> + is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL);
You can just say
is_up = (HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL) > 0);
and then you won't need vcpuops bool.
>>> +
>>> + if (vcpuops && other_cpu && is_up &&
>>> HYPERVISOR_vcpu_op(VCPUOP_down, cpu, NULL))
>>> BUG();
>>>
>>> @@ -252,7 +263,7 @@ void xen_vcpu_restore(void)
>>> if (have_vcpu_info_placement)
>>> xen_vcpu_setup(cpu);
>>>
>>> - if (other_cpu && is_up &&
>>> + if (vcpuops && other_cpu && is_up &&
>>> HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL))
>>> BUG();
>>> }
>>> @@ -1704,8 +1715,8 @@ void __ref xen_hvm_init_shared_info(void)
>>> * in that case multiple vcpus might be online. */
>>> for_each_online_cpu(cpu) {
>>> /* Leave it to be NULL. */
>>> - if (cpu >= MAX_VIRT_CPUS)
>>> - continue;
>>> + if (cpu >= MAX_VIRT_CPUS && cpu <= NR_CPUS)
>>> + per_cpu(xen_vcpu, cpu) = NULL; /* Triggers xen_vcpu_setup.*/
>>> per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
I don't think I understand this change.
Can you have cpu > NR_CPUS? And isn't per_cpu(xen_vcpu, cpu) NULL
already (as the comment at the top suggests)?
-boris