Re: [RFC v1 05/26] x86/traps: Add #VE support for TDX guest

From: Sean Christopherson
Date: Fri Feb 12 2021 - 16:37:52 EST


On Fri, Feb 12, 2021, Dave Hansen wrote:
> On 2/12/21 12:54 PM, Sean Christopherson wrote:
> > Ah, I see what you're thinking.
> >
> > Treating an EPT #VE as fatal was also considered as an option. IIUC it was
> > thought that finding every nook and cranny that could access a page, without
> > forcing the kernel to pre-accept huge swaths of memory, would be very difficult.
> > It'd be wonderful if that's not the case.
>
> We have to manually set up the page table entries for every physical
> page of memory (except for the hard-coded early stuff below 8MB or
> whatever). We *KNOW*, 100% before physical memory is accessed.
>
> There aren't nooks and crannies where memory is accessed. There are a
> few, very well-defined choke points which must be crossed before memory
> is accessed. Page table creation, bootmem and the core page allocator
> come to mind.

Heh, for me, that's two places too many beyond my knowledge domain to feel
comfortable putting a stake in the ground saying #VE isn't necessary.

Joking aside, I agree that treating EPT #VEs as fatal would be ideal, but from a
TDX architecture perspective, when considering all possible kernels, drivers,
configurations, etc..., it's risky to say that there will _never_ be a scenario
that "requires" #VE.

What about adding a property to the TD, e.g. via a flag set during TD creation,
that controls whether unaccepted accesses cause #VE or are, for all intents and
purposes, fatal? That would allow Linux to pursue treating EPT #VEs for private
GPAs as fatal, but would give us a safety and not prevent others from utilizing
#VEs.

I suspect it would also be helpful for debug, e.g. if the kernel manages to do
something stupid and maps memory it hasn't accepted, in which case debugging a
#VE in the guest is likely easier than an opaque EPT violation in the host.

> If Linux doesn't have a really good handle on which physical pages are
> accessed when, we've got bigger problems on our hands. Remember, we
> even have debugging mechanisms that unmap pages from the kernel when
> they're in the allocator. We know so well that nobody is accessing
> those physical addresses that we even tell hypervisors they can toss the
> page contents and remove the physical backing (guest free page hinting).