RE: [PATCH RFC 0/4] 5-level EPT
From: Li, Liang Z
Date: Mon Jan 16 2017 - 21:19:42 EST
> On 29/12/2016 10:25, Liang Li wrote:
> > x86-64 is currently limited physical address width to 46 bits, which
> > can support 64 TiB of memory. Some vendors require to support more for
> > some use case. Intel plans to extend the physical address width to
> > 52 bits in some of the future products.
> >
> > The current EPT implementation only supports 4 level page table, which
> > can support maximum 48 bits physical address width, so it's needed to
> > extend the EPT to 5 level to support 52 bits physical address width.
> >
> > This patchset has been tested in the SIMICS environment for 5 level
> > paging guest, which was patched with Kirill's patchset for enabling
> > 5 level page table, with both the EPT and shadow page support. I just
> > covered the booting process, the guest can boot successfully.
> >
> > Some parts of this patchset can be improved. Any comments on the
> > design or the patches would be appreciated.
>
> I will review the patches. They seem fairly straightforward.
>
> However, I am worried about the design of the 5-level page table feature
> with respect to migration.
>
> Processors that support the new LA57 mode can write 57-canonical/48-
> noncanonical linear addresses to some registers even when LA57 mode is
> inactive. This is true even of unprivileged instructions, in particular
> WRFSBASE/WRGSBASE.
>
> This is fairly bad because, if a guest performs such a write (because of a bug
> or because of malice), it will not be possible to migrate the virtual machine to
> a machine that lacks LA57 mode.
>
> Ordinarily, hypervisors trap CPUID to hide features that are only present in
> some processors of a heterogeneous cluster, and the hypervisor also traps
> for example CR4 writes to prevent enabling features that were masked away.
> In this case, however, the only way for the hypervisor to prevent the write
> would be to run the guest with
> CR4.FSGSBASE=0 and trap all executions of WRFSBASE/WRGSBASE. This
> might have negative effects on performance for workloads that use the
> instructions.
>
> Of course, this is a problem even without your patches. However, I think it
> should be addressed first. I am seriously thinking of blacklisting FSGSBASE
> completely on LA57 machines until the above is fixed in hardware.
>
> Paolo
The issue has already been forwarded to the hardware guys, still waiting for the feedback.
Thanks!
Liang