Re: [PATCH v2 00/11] KVM: Support guest MAXPHYADDR < host MAXPHYADDR

From: Tom Lendacky
Date: Mon Jun 22 2020 - 13:57:23 EST


On 6/22/20 12:03 PM, Paolo Bonzini wrote:
> On 22/06/20 18:33, Tom Lendacky wrote:
>> I'm not a big fan of trapping #PF for this. Can't this have a performance
>> impact on the guest? If I'm not mistaken, Qemu will default to TCG
>> physical address size (40-bits), unless told otherwise, causing #PF to now
>> be trapped. Maybe libvirt defaults to matching host/guest CPU MAXPHYADDR?
>
> Yes, this is true. We should change it similar to how we handle TSC
> frequency (and having support for guest MAXPHYADDR < host MAXPHYADDR is
> a prerequisite).
>
>> In bare-metal, there's no guarantee a CPU will report all the faults in a
>> single PF error code. And because of race conditions, software can never
>> rely on that behavior. Whenever the OS thinks it has cured an error, it
>> must always be able to handle another #PF for the same access when it
>> retries because another processor could have modified the PTE in the
>> meantime.
>
> I agree, but I don't understand the relation to this patch. Can you
> explain?

I guess I'm trying to understand why RSVD has to be reported to the guest
on a #PF (vs an NPF) when there's no guarantee that it can receive that
error code today even when guest MAXPHYADDR == host MAXPHYADDR. That would
eliminate the need to trap #PF.

Thanks,
Tom

>
>> What's the purpose of reporting RSVD in the error code in the
>> guest in regards to live migration?
>>
>>> - if the page is accessible to the guest according to the permissions in
>>> the page table, it will cause a #NPF. Again, we need to trap it, check
>>> the guest physical address and inject a P|RSVD #PF if the guest physical
>>> address has any guest-reserved bits.
>>>
>>> The AMD specific issue happens in the second case. By the time the NPF
>>> vmexit occurs, the accessed and/or dirty bits have been set and this
>>> should not have happened before the RSVD page fault that we want to
>>> inject. On Intel processors, instead, EPT violations trigger before
>>> accessed and dirty bits are set. I cannot find an explicit mention of
>>> the intended behavior in either the
>>> Intel SDM or the AMD APM.
>>
>> Section 15.25.6 of the AMD APM volume 2 talks about page faults (nested vs
>> guest) and fault ordering. It does talk about setting guest A/D bits
>> during the walk, before an #NPF is taken. I don't see any way around that
>> given a virtual MAXPHYADDR in the guest being less than the host MAXPHYADDR.
>
> Right you are... Then this behavior cannot be implemented on AMD.
>
> Paolo
>