Re: [PATCH 2/4] KVM: x86: refactor PIT state inject_lock

From: Radim KrÄmÃÅ
Date: Thu Feb 04 2016 - 08:13:17 EST


2016-02-03 17:45+0100, Paolo Bonzini:
> On 03/02/2016 17:23, Radim KrÄmÃÅ wrote:
>> Following patches would be even uglier if inject_lock didn't go away.
>>
>> Patch changes the virtual wire comment to better describe our situation.
>>
>> Signed-off-by: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
>> ---
>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>> @@ -236,16 +236,13 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
>> {
>> struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
>> irq_ack_notifier);
>> - int value;
>>
>> - spin_lock(&ps->inject_lock);

(Our code doesn't need barrier between dereferencing the pointer and
reading its contents, and this bug is not possible happen on x86.)

>> + atomic_set(&ps->irq_ack, 1);
>
> smp_mb__before_atomic();

irq_ack has to be set before queueing pit_do_work, or could lose the
interrupt.
atomic_add_unless implies full barriers, so I think we're ok here.

>> if (atomic_add_unless(&ps->pending, -1, 0))
>> queue_kthread_work(&ps->pit->worker, &ps->pit->expired);
>> void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
>> @@ -323,6 +311,12 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
>> return HRTIMER_NORESTART;
>> }
>>
>> +static void kvm_pit_reset_reinject(struct kvm_pit *pit)
>> +{
>> + atomic_set(&pit->pit_state.pending, 0);
>
> smp_wmb()?

I don't think that we need to ensure the order of pending and irq_ack
because there isn't a dependency between these two. The reset is a slap
out of nowhere ... I think we'd need locks to make it behave correctly.

The worst that can happen is one virtual wire NMI injection when the
IOAPIC interrupt was in IRR and number of injections therefore won't
match. The race goes like this:

kvm_pit_ack_irq is running and we have pending > 0, so pit_do_work is
scheduled and executed. pit_do_work sets irq_ack from 1 to 0.
Then pit_timer_fn gets executed, increases pending and queues
pit_do_work. Before pit_do_work has a chance to run, we set pending=0
and irq_ack=1 in kvm_pit_reset_reinject. pit_do_work is executed, sets
irq_ack from 1 to 0, but injects only the NMI, because interrupt is
already in IRR. When the interrupt does EOI, we don't reinject it,
because pending==0.

Barriers can't solve this, locking would be pretty awkward and I think
that having one interrupt more or less is ok for the delay policy. :)

| + atomic_set(&pit->pit_state.irq_ack, 1);

Callers of kvm_pit_reset_reinject are providing barriers, so assignment
can't be reordered into inappropriate places, but it wouldn't hurt to
add barriers here.

> Looks safe otherwise. (Please also add a comment before the memory
> barriers to show the pairing).

Thanks, I'll comment what I can.