Re: [PATCH] KVM: arm/arm64: fix unaligned hva start and end in handle_hva_to_gpa

From: Jia He
Date: Tue May 15 2018 - 13:38:07 EST


Hi Suzuki

On 5/15/2018 4:36 PM, Suzuki K Poulose Wrote:
>
> Hi Jia
>
> On 05/15/2018 03:03 AM, Jia He wrote:
>> Hi Suzuki
>>
>> I will merge the other thread into this, and add the necessary CC list
>>
>> That WARN_ON call trace is very easy to reproduce in my armv8a server after I
>> start 20 guests
>>
>> and run memhog in the host. Of course, ksm should be enabled
>>
>> For you question about my inject fault debug patch:
>
>
> Thanks for the patch, comments below.
>
>>
>
> ...
>
>> index 7f6a944..ab8545e 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
>> ÂÂ * destroying the VM), otherwise another faulting VCPU may come in and mess
>> ÂÂ * with things behind our backs.
>> ÂÂ */
>> +extern int trigger_by_ksm;
>> ÂÂstatic void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>> ÂÂ{
>> ÂÂÂÂÂÂÂÂ pgd_t *pgd;
>> ÂÂÂÂÂÂÂÂ phys_addr_t addr = start, end = start + size;
>> ÂÂÂÂÂÂÂÂ phys_addr_t next;
>>
>> +ÂÂÂÂÂÂ if(trigger_by_ksm) {
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ end -= 0x200;
>> +ÂÂÂÂÂÂ }
>> +
>> ÂÂÂÂÂÂÂÂ assert_spin_locked(&kvm->mmu_lock);
>> ÂÂÂÂÂÂÂÂ pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>> ÂÂÂÂÂÂÂÂ do {
>>
>> I need to point out that I never reproduced it without this debugging patch.
>
> That could trigger the panic iff your "size" <= 0x200, leading to the
> condition (end < start), which can make the loop go forever, as we
> do while(addr < end) and end up accessing something which may not be PGD entry
> and thus get a bad page with bad numbers all around. This case could be hit only
> with your change and the bug in the KSM which gives us an address near the page
> boundary.
No, I injected the fault on purpose to simulate the case when size is less than
PAGE_SIZE(eg. PAGE_SIZE-0x200=65024)
I ever got the panic info [1] *without* the debugging patch only once

[1] https://lkml.org/lkml/2018/5/9/992
>
> So, I think we can safely ignore the PANIC().
> More below.
>
>
>>>> Suzuki, thanks for the comments.
>>>>
>>>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042
>>>> The root cause is ksm will add some extra flags to indicate that the page
>>>> is in/not_in the stable tree. This makes address not be aligned with PAGE_SIZE.
>>> Thanks for the pointer. In the future, please Cc the people relevant to the
>>> discussion in the patches.
>>>
>>>> Â From arm kvm mmu point of view, do you think handle_hva_to_gpa still need
>>>> to handle
>>>> the unalignment case?
>>> I don't think we should do that. Had we done this, we would never have caught
>>> this bug
>>> in KSM. Eventually if some other new implementation comes up with the a new
>>> notifier
>>> consumer which doesn't check alignment and doesn't WARN, it could simply do
>>> the wrong
>>> thing. So I believe what we have is a good measure to make sure that things are
>>> in the right order.
>>>
>>>> IMO, the PAGE_SIZE alignment is still needed because we should not let the
>>>> bottom function
>>>> kvm_age_hva_handler to handle the exception. Please refer to the
>>>> implementation in X86 and
>>>> powerpc kvm_handle_hva_range(). They both aligned the hva with
>>>> hva_to_gfn_memslot.
>>>>
>>> Â From an API perspective, you are passed on a "start" and "end" address. So,
>>> you could potentially
>>> do the wrong thing if you align the "start" and "end". May be those handlers
>>> should also do the
>>> same thing as we do.
>
>> But handle_hva_to_gpa has partially adjusted the alignment possibly:
>> ÂÂÂ 1750ÂÂÂÂÂÂÂÂ kvm_for_each_memslot(memslot, slots) {
>> ÂÂÂ 1751ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned long hva_start, hva_end;
>> ÂÂÂ 1752ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ gfn_t gpa;
>> ÂÂÂ 1753
>> ÂÂÂ 1754ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ hva_start = max(start, memslot->userspace_addr);
>> ÂÂÂ 1755ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ hva_end = min(end, memslot->userspace_addr +
>> ÂÂÂ 1756ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (memslot->npages << PAGE_SHIFT));
>>
>> at line 1755, let us assume that end=0x12340200 and
>> memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000
>> Then, hva_start is not page_size aligned and hva_end is aligned, and the size
>> will be PAGE_SIZE-0x200,
>> just as what I had done in the inject fault debugging patch.
>
> Thats because we want to limit the handling of the hva/gpa range by memslot. So,
> we make sure we pass on the range within the given memslot
> to hva_to_gfn_memslot(). But we do iterate over the next memslot if the
> original range falls in to the next slot. So, in practice, there is no
> alignment/trimming of the range. Its just that we pass on the appropriate range
> for each slot.
>
Yes, I understand what the codes did in hva_to_gfn_memslot(). What I mean is
hva_end may be changed and (hva_end - hva_start) will not be same as the
parameter _size_ ?

>ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data);

Anyway, I have to admit that all the exceptions are originally caused by the
STABLE_FLAG in ksm code. What I want to discuss here is how to make arm kvm
handle the exception more gracefully.
--
Cheers,
Jia