Re: [PATCH 4/5] KVM: add __kvm_request_needs_mb
From: David Hildenbrand
Date: Thu Feb 23 2017 - 05:34:13 EST
Am 22.02.2017 um 20:57 schrieb Christian Borntraeger:
> On 02/22/2017 04:17 PM, Radim KrÄmÃÅ wrote:
>> while (vcpu->arch.sie_block->prog0c & PROG_IN_SIE)
>> And out of curiosity -- how many cycles does this loop usually take?
> A quick hack indicates something between 3 and 700ns.
>>> 2. Remote requests that don't need a sync
>>> E.g. KVM_REQ_ENABLE_IBS doesn't strictly need it, while
>>> KVM_REQ_DISABLE_IBS does.
>> A usual KVM request would kick the VCPU out of nested virt as well.
>> Shouldn't it be done for these as well?
> A common code function probably should. For some of the cases (again
> prefix page handling) we do not need it. For example if we unmap
> the guest prefix page, but guest^2 is running this causes no trouble
> as long as we handle the request before reentering guest^1. So
> not an easy answer.
The problem is, that doing synchronous requests on two sie control
blocks (vsie and "ordinary") is really really hard to implement. I had a
prototype, but it was just ugly. And there was no reason to do it,
because all current requests can live with nested guest being executed
(as it is really all just a matter of coordinating SIE control block
changes for our guest, not involving nested guests).
Also note, that VSIE uses these special request bits in the SIE control
block for its own purposes (to catch unmappings of the prefix page, but
this time on the nested address space). We don't want to replace this by
an ordinary request bit (because then we have to exit the VSIE loop much
I think the VSIE code should not care too much about request bits until
there is really a need for it (meaning: VCPU requests that cannot
tolerate the VSIE running).
Kicking the VSIE from time to time cannot harm. But there are no
guarantees about requests.
>>> 3. local requests
>>> E.g. KVM_REQ_TLB_FLUSH from kvm_s390_set_prefix()
>>> Of course, having a unified interface would be better.
>>> /* set the request and kick the CPU out of guest mode */
>>> kvm_set_request(req, vcpu);
>>> /* set the request, kick the CPU out of guest mode, wait until guest
>>> mode has been left and make sure the request will be handled before
>>> reentering guest mode */
>>> kvm_set_sync_request(req, vcpu);
>> Sounds good, I'll also add
>> kvm_set_self_request(req, vcpu);
or introduce some lightweight check (e.g. against (vcpu->cpu))
internally, that simply skips kicking/other stuff in case the vcpu is
>>> Same maybe even for multiple VCPUs (as there are then ways to speed it
>>> up, e.g. first kick all, then wait for all)
>>> This would require arch specific callbacks to
>>> 1. pre announce the request (e.g. set PROG_REQUEST on s390x)
>>> 2. kick the cpu (e.g. CPUSTAT_STOP_INT and later
>>> kvm_s390_vsie_kick(vcpu) on s390x)
>>> 3. check if still executing the guest (e.g. PROG_IN_SIE on s390x)
>>> This would only make sense if there are other use cases for sync
>>> requests. At least I remember that Power also has a faster way for
>>> kicking VCPUs, not involving SMP rescheds. I can't judge if this is a
>>> s390x only thing and is better be left as is :)
>>> At least vcpu_kick() could be quite easily made to work on s390x.
>>> Radim, are there also other users that need something like sync requests?
>> I think that ARM has a similar need when updating vgic, but relies on an
>> asumption that VCPUs are going to be out after kicking them with
>> (vgic_change_active_prepare in virt/kvm/arm/vgic/vgic-mmio.c)
>> Having synchronous requests in a common API should probably wait for the
>> completion of the request, not just for the kick, which would make race
>> handling simpler.
> This would be problematic for our prefix page handling due to locking.
And if I am not wrong, waiting for a request to be handled might take
forever (thinking about VCPUs in user space - maybe even stopped ones
that won't run again).
I think the general notion of synchronous VCPU requests should be: Make
sure that VCPU does not go into guest mode before handling the request.
This includes waiting for it to exit guest mode.