Re: [RFC PATCH 6/6] KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data()
From: Nikunj A. Dadhania
Date: Wed Jan 19 2022 - 23:25:08 EST
On 1/20/2022 12:22 AM, Maciej S. Szmigiero wrote:
> On 19.01.2022 07:33, Nikunj A. Dadhania wrote:
>> On 1/18/2022 8:30 PM, Maciej S. Szmigiero wrote:
>>> On 18.01.2022 12:06, Nikunj A Dadhania wrote:
>>>> +static struct page **sev_pin_memory_in_mmu(struct kvm *kvm, unsigned long addr,
>>>> + unsigned long size,
>>>> + unsigned long *npages)
>>>> +{
>>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> + struct kvm_vcpu *vcpu;
>>>> + struct page **pages;
>>>> + unsigned long i;
>>>> + u32 error_code;
>>>> + kvm_pfn_t pfn;
>>>> + int idx, ret = 0;
>>>> + gpa_t gpa;
>>>> + bool ro;
>>>> +
>>>> + pages = sev_alloc_pages(sev, addr, size, npages);
>>>> + if (IS_ERR(pages))
>>>> + return pages;
>>>> +
>>>> + vcpu = kvm_get_vcpu(kvm, 0);
>>>> + if (mutex_lock_killable(&vcpu->mutex)) {
>>>> + kvfree(pages);
>>>> + return ERR_PTR(-EINTR);
>>>> + }
>>>> +
>>>> + vcpu_load(vcpu);
>>>> + idx = srcu_read_lock(&kvm->srcu);
>>>> +
>>>> + kvm_mmu_load(vcpu);
>>>> +
>>>> + for (i = 0; i < *npages; i++, addr += PAGE_SIZE) {
>>>> + if (signal_pending(current)) {
>>>> + ret = -ERESTARTSYS;
>>>> + break;
>>>> + }
>>>> +
>>>> + if (need_resched())
>>>> + cond_resched();
>>>> +
>>>> + gpa = hva_to_gpa(kvm, addr, &ro);
>>>> + if (gpa == UNMAPPED_GVA) {
>>>> + ret = -EFAULT;
>>>> + break;
>>>> + }
>>>
>>> This function is going to have worst case O(n²) complexity if called with
>>> the whole VM memory (or O(n * log(n)) when hva_to_memslot() is modified
>>> to use kvm_for_each_memslot_in_hva_range()).
>>
>> I understand your concern and will address it. BTW, this is called for a small
>> fragment of VM memory( <10MB), that needs to be pinned before the guest execution
>> starts.
>
> I understand it is a relatively small memory area now, but a rewrite of
> this patch that makes use of kvm_for_each_memslot_in_hva_range() while
> taking care of other considerations (like overlapping hva) will also
> solve the performance issue.>
>>> That's really bad for something that can be done in O(n) time - look how
>>> kvm_for_each_memslot_in_gfn_range() does it over gfns.
>>>
>>
>> I saw one use of kvm_for_each_memslot_in_gfn_range() in __kvm_zap_rmaps(), and
>> that too calls slot_handle_level_range() which has a for_each_slot_rmap_range().
>> How would that be O(n) ?
>>
>> kvm_for_each_memslot_in_gfn_range() {
>> ...
>> slot_handle_level_range()
>> ...
>> }
>>
>> slot_handle_level_range() {
>> ...
>> for_each_slot_rmap_range() {
>> ...
>> }
>> ...
>> }
>
> kvm_for_each_memslot_in_gfn_range() iterates over gfns, which are unique,
> so at most one memslot is returned per gfn (and if a memslot covers
> multiple gfns in the requested range it will be returned just once).
>
> for_each_slot_rmap_range() then iterates over rmaps covering that
> *single* memslot: look at slot_rmap_walk_next() - the memslot under
> iteration is not advanced.
>
> So each memslot returned by kvm_for_each_memslot_in_gfn_range() is
> iterated over just once by the aforementioned macro.
>
>>> Besides performance considerations I can't see the code here taking into
>>> account the fact that a hva can map to multiple memslots (they an overlap
>>> in the host address space).
>>
>> You are right I was returning at the first match, looks like if I switch to using kvm_for_each_memslot_in_hva_range() it should take care of overlapping hva, is this understanding correct ?
>
> Let's say that the requested range of hva for sev_pin_memory_in_mmu() to
> handle is 0x1000 - 0x2000.
>
> If there are three memslots:
> 1: hva 0x1000 - 0x2000 -> gpa 0x1000 - 0x2000
> 2: hva 0x1000 - 0x2000 -> gpa 0x2000 - 0x3000
> 3: hva 0x2000 - 0x3000 -> gpa 0x3000 - 0x4000
>
> then kvm_for_each_memslot_in_hva_range() will return the first two,
> essentially covering the hva range of 0x1000 - 0x2000 twice.
>
> If such hva aliases are permitted the code has to be ready for this case
> and handle it sensibly:
> If you need to return just a single struct page per a hva AND / OR pin
> operations aren't idempotent then it has to keep track which hva were
> already processed.
>
> Another, and probably the easiest option would be to simply disallow
> such overlapping memslots in the requested range and make
> KVM_SEV_LAUNCH_UPDATE_DATA ioctl return something like EINVAL in this
> case - if that would be acceptable semantics for this ioctl.
>
> In any case, the main loop in sev_pin_memory_in_mmu() will probably
> need to be build around a kvm_for_each_memslot_in_hva_range() call,
> which will then solve the performance issue, too.
Sure, I already tried out and have the walk implemented using
kvm_for_each_memslot_in_hva_range() call.
Regards
Nikunj