Re: [RFC KVM 18/27] kvm/isolation: function to copy page table entries for percpu buffer

From: Andy Lutomirski
Date: Tue May 14 2019 - 17:57:09 EST




> On May 14, 2019, at 2:06 PM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
>
>> On Tue, May 14, 2019 at 01:33:21PM -0700, Andy Lutomirski wrote:
>> On Tue, May 14, 2019 at 11:09 AM Sean Christopherson
>> <sean.j.christopherson@xxxxxxxxx> wrote:
>>> For IRQs it's somewhat feasible, but not for NMIs since NMIs are unblocked
>>> on VMX immediately after VM-Exit, i.e. there's no way to prevent an NMI
>>> from occuring while KVM's page tables are loaded.
>>>
>>> Back to Andy's question about enabling IRQs, the answer is "it depends".
>>> Exits due to INTR, NMI and #MC are considered high priority and are
>>> serviced before re-enabling IRQs and preemption[1]. All other exits are
>>> handled after IRQs and preemption are re-enabled.
>>>
>>> A decent number of exit handlers are quite short, e.g. CPUID, most RDMSR
>>> and WRMSR, any event-related exit, etc... But many exit handlers require
>>> significantly longer flows, e.g. EPT violations (page faults) and anything
>>> that requires extensive emulation, e.g. nested VMX. In short, leaving
>>> IRQs disabled across all exits is not practical.
>>>
>>> Before going down the path of figuring out how to handle the corner cases
>>> regarding kvm_mm, I think it makes sense to pinpoint exactly what exits
>>> are a) in the hot path for the use case (configuration) and b) can be
>>> handled fast enough that they can run with IRQs disabled. Generating that
>>> list might allow us to tightly bound the contents of kvm_mm and sidestep
>>> many of the corner cases, i.e. select VM-Exits are handle with IRQs
>>> disabled using KVM's mm, while "slow" VM-Exits go through the full context
>>> switch.
>>
>> I suspect that the context switch is a bit of a red herring. A
>> PCID-don't-flush CR3 write is IIRC under 300 cycles. Sure, it's slow,
>> but it's probably minor compared to the full cost of the vm exit. The
>> pain point is kicking the sibling thread.
>
> Speaking of PCIDs, a separate mm for KVM would mean consuming another
> ASID, which isn't good.

Iâm not sure we care. We have many logical address spaces (two per mm plus a few more). We have 4096 PCIDs, but we only use ten or so. And we have some undocumented number of *physical* ASIDs with some undocumented mechanism by which PCID maps to a physical ASID.

I donât suppose you know how many physical ASIDs we have? And how it interacts with the VPID stuff?