Re: [PART1 RFC 5/9] svm: Add VMEXIT handlers for AVIC

From: Radim KrÄmÃÅ
Date: Tue Feb 16 2016 - 13:06:28 EST


2016-02-16 17:56+0100, Paolo Bonzini:
> On 16/02/2016 15:13, Radim KrÄmÃÅ wrote:
>> Yeah, I think atomic there means that it won't race with other writes to
>> the same byte in IRR. We're fine as long as AVIC writes IRR before
>> checking IsRunning on every destination, which it seems to be.
>
> More precisely, if AVIC writes all IRRs (5.1) and ANDs all IsRunning
> flags before checking the result of the AND (6).
>
>> (It would, but I believe that AVIC designers made it sane and the spec
>> doesn't let me read it in a way that supports your theories.)
>
> I hope so as well, and you've probably convinced me. But I still think
> the code is wrong in this patch. Let's look at the spec that you pasted:

The code definitely is wrong. I'll be more specific when disagreeing,
sorry.

> This is where the following steps happen:

[I completely agree with the race presented here.]

> So perhaps it's enough to change KVM to _not_ modify IRR on an
> "incomplete IPI - target not running" vmexit, and instead only do
>
> kvm_make_request(KVM_REQ_EVENT, vcpu);
> kvm_vcpu_kick(vcpu);
>
> on the destination VCPUs. That would indeed be simply just be something
> to fix in the patches. Do you agree that this is a bug?

Yes. (We don't even need KVM_REQ_EVENT, because there should be nothing
to do, KVM just has to run the guest.)

> I'm curious about how often the AVIC VMEXIT fires.

>From a theoretical standpoint:

AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE: Not much; OS usually doesn't send
lowest priority IPIs (it's not even supported on Intel), NMI, INIT, ...
and the rest seems to be handled.

AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN: depends a lot on host load (and what
the guest does); most IPIs will trigger this on an over-committed host.

AVIC_INCMP_IPI_ERR_INV_TARGET: Almost never; only on guest OS bugs,
where the guest can trigger if it targets non-existing VCPUs.
(Btw. calling BUG() there is a bug.)

AVIC_INCMP_IPI_ERR_INV_BK_PAGE: It's a bug in KVM, so hopefully never.

> Suravee, can you add
> debugfs counters for the various incomplete IPI subcauses?

Good point, large value in any of those would point to a problem.

> And since we are at it, I'm curious about the following two steps at the
> end of 15.29.2.6.
>
> - on VMRUN the interrupt state is evaluated and the highest priority
> pending interrupt indicated in the IRR is delivered if interrupt masking
> and priority allow
>
> - Any doorbell signals received during VMRUN processing are recognized
> immediately after entering the guest
>
> Isn't step 1 exactly the same as evaluating the doorbell signals?

It is.

> Is
> the IRR evaluated only if the hypervisor had rang the doorbell, or
> unconditionally?

Unconditionally.
(Supporting evidence: current code doesn't send doorbell when the VCPU
is in host mode and I suppose that it works fine. :])

I think these two clauses cover a race on VMRUN:
when processing VMRUN, we might not consider the CPU to be in guest
mode, so these two disambiguate a case when VMRUN has already checked
for IRR (it was empty) and other CPU set IRR and issued doorbell before
VMRUN entered the guest. (The doorbell could be considered as lost
otherwise, because doorbells in host mode do nothing.)