Re: [PATCH 4/5] KVM: add __kvm_request_needs_mb

From: David Hildenbrand
Date: Fri Feb 17 2017 - 05:13:16 EST



>> This is really complicated stuff, and the basic reason for it (if I
>> remember correctly) is that s390x does reenable all interrupts when
>> entering the sie (see kvm-s390.c:__vcpu_run()). So the fancy smp-based
>> kicks don't work (as it is otherwise just racy), and if I remember
>> correctly, SMP reschedule signals (s390x external calls) would be
>> slower. (Christian, please correct me if I'm wrong)
>
> No the reason was that there are some requests that need to be handled
> outside run SIE. For example one reason was the guest prefix page.
> This must be mapped read/write ALL THE TIME when a guest is running,
> otherwise the host might crash. So we have to exit SIE and make sure that
> it does not reenter, therefore we use the RELOAD_MMU request from a notifier
> that is called from page table functions, whenever memory management decides
> to unmap/write protect (dirty pages tracking, reference tracking, page migration
> or compaction...)
>
> SMP-based request wills kick out the guest, but for some thing like the
> one above it will be too late.

While what you said is 100% correct, I had something else in mind that
hindered using vcpu_kick() and especially kvm_make_all_cpus_request().
And I remember that being related to how preemption and
OUTSIDE_GUEST_MODE is handled. I think this boils down to what would
have to be implemented in kvm_arch_vcpu_should_kick().

x86 can track the guest state using vcpu->mode, because they can be sure
that the guest can't reschedule while in the critical guest entry/exit
section. This is not true for s390x, as preemption is enabled. That's
why vcpu->mode cannot be used in its current form to track if a VCPU is
in/oustide/exiting guest mode. And kvm_make_all_cpus_request() currently
relies on this setting.

For now, calling vcpu_kick() on s390x will result in a BUG().


On s390x, there are 3 use cases I see for requests:

1. Remote requests that need a sync

Make a request, wait until SIE has been left and make sure the request
will be processed before re-entering the SIE. e.g. KVM_REQ_RELOAD_MMU
notifier in mmu notifier you mentioned. Also KVM_REQ_DISABLE_IBS is a
candidate.

2. Remote requests that don't need a sync

E.g. KVM_REQ_ENABLE_IBS doesn't strictly need it, while
KVM_REQ_DISABLE_IBS does.

3. local requests

E.g. KVM_REQ_TLB_FLUSH from kvm_s390_set_prefix()


Of course, having a unified interface would be better.

/* set the request and kick the CPU out of guest mode */
kvm_set_request(req, vcpu);

/* set the request, kick the CPU out of guest mode, wait until guest
mode has been left and make sure the request will be handled before
reentering guest mode */
kvm_set_sync_request(req, vcpu);


Same maybe even for multiple VCPUs (as there are then ways to speed it
up, e.g. first kick all, then wait for all)

This would require arch specific callbacks to
1. pre announce the request (e.g. set PROG_REQUEST on s390x)
2. kick the cpu (e.g. CPUSTAT_STOP_INT and later
kvm_s390_vsie_kick(vcpu) on s390x)
3. check if still executing the guest (e.g. PROG_IN_SIE on s390x)

This would only make sense if there are other use cases for sync
requests. At least I remember that Power also has a faster way for
kicking VCPUs, not involving SMP rescheds. I can't judge if this is a
s390x only thing and is better be left as is :)

At least vcpu_kick() could be quite easily made to work on s390x.

Radim, are there also other users that need something like sync requests?

>
>
>> So this statement, is at least from a s390x point of view wrong. The
>> kvm_vcpu_kick() function would have to be rerouted to an appropriate
>> s390x implementation (or that whole smp and OUTSIDE_GUEST_MODE stuff
>> would have to be factored out).
>>
>


--
Thanks,

David