Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts

From: Paolo Bonzini
Date: Fri Jul 28 2017 - 02:28:16 EST


On 28/07/2017 04:31, Longpeng (Mike) wrote:
> Hi Paolo,
>
> On 2017/6/6 18:57, Paolo Bonzini wrote:
>
>> In some cases, for example involving hot-unplug of assigned
>> devices, pi_post_block can forget to remove the vCPU from the
>> blocked_vcpu_list. When this happens, the next call to
>> pi_pre_block corrupts the list.
>>
>> Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
>> and WARN instead of adding the element twice in the list. Second,
>> always do the list removal in pi_post_block if vcpu->pre_pcpu is
>> set (not -1).
>>
>> The new code keeps interrupts disabled for the whole duration of
>> pi_pre_block/pi_post_block. This is not strictly necessary, but
>> easier to follow. For the same reason, PI.ON is checked only
>> after the cmpxchg, and to handle it we just call the post-block
>> code. This removes duplication of the list removal code.
>>
>> Cc: Longpeng (Mike) <longpeng2@xxxxxxxxxx>
>> Cc: Huangweidong <weidong.huang@xxxxxxxxxx>
>> Cc: Gonglei <arei.gonglei@xxxxxxxxxx>
>> Cc: wangxin <wangxinxin.wang@xxxxxxxxxx>
>> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
>> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> ---
>> arch/x86/kvm/vmx.c | 62 ++++++++++++++++++++++--------------------------------
>> 1 file changed, 25 insertions(+), 37 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 747d16525b45..0f4714fe4908 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>> struct pi_desc old, new;
>> unsigned int dest;
>> - unsigned long flags;
>>
>> do {
>> old.control = new.control = pi_desc->control;
>> + WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
>> + "Wakeup handler not enabled while the VCPU is blocked\n");
>>
>> dest = cpu_physical_id(vcpu->cpu);
>>
>> @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>> } while (cmpxchg(&pi_desc->control, old.control,
>> new.control) != old.control);
>>
>> - if(vcpu->pre_pcpu != -1) {
>> - spin_lock_irqsave(
>> - &per_cpu(blocked_vcpu_on_cpu_lock,
>> - vcpu->pre_pcpu), flags);
>> + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
>
>
> __pi_post_block is only called by pi_post_block/pi_pre_block now, it seems that
> both of them would make sure "vcpu->pre_pcpu != -1" before __pi_post_block is
> called, so maybe the above check is useless, right?

It's because a WARN is better than a double-add. And even if the caller
broke the invariant you'd have to do the cmpxchg loop above to make
things not break too much.

Paolo