Re: [PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance

From: Mike Stunes
Date: Wed May 06 2020 - 14:08:57 EST




> On Apr 28, 2020, at 8:17 AM, Joerg Roedel <joro@xxxxxxxxxx> wrote:
>
> From: Mike Stunes <mstunes@xxxxxxxxxx>
>
> To avoid a future VMEXIT for a subsequent CPUID function, cache the
> results returned by CPUID into an xarray.
>
> [tl: coding standard changes, register zero extension]
>
> Signed-off-by: Mike Stunes <mstunes@xxxxxxxxxx>
> Signed-off-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
> [ jroedel@xxxxxxx: - Wrapped cache handling into vc_handle_cpuid_cached()
> - Used lower_32_bits() where applicable
> - Moved cache_index out of struct es_em_ctxt ]
> Co-developed-by: Joerg Roedel <jroedel@xxxxxxx>
> Signed-off-by: Joerg Roedel <jroedel@xxxxxxx>
> ---
> arch/x86/kernel/sev-es-shared.c | 12 ++--
> arch/x86/kernel/sev-es.c | 119 +++++++++++++++++++++++++++++++-
> 2 files changed, 124 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
> index 03095bc7b563..0303834d4811 100644
> --- a/arch/x86/kernel/sev-es.c
> +++ b/arch/x86/kernel/sev-es.c
> @@ -744,6 +758,91 @@ static enum es_result vc_handle_mmio(struct ghcb *ghcb,
> return ret;
> }
>
> +static unsigned long sev_es_get_cpuid_cache_index(struct es_em_ctxt *ctxt)
> +{
> + unsigned long hi, lo;
> +
> + /* Don't attempt to cache until the xarray is initialized */
> + if (!sev_es_cpuid_cache_initialized)
> + return ULONG_MAX;
> +
> + lo = lower_32_bits(ctxt->regs->ax);
> +
> + /*
> + * CPUID 0x0000000d requires both RCX and XCR0, so it can't be
> + * cached.
> + */
> + if (lo == 0x0000000d)
> + return ULONG_MAX;
> +
> + /*
> + * Some callers of CPUID don't always set RCX to zero for CPUID
> + * functions that don't require RCX, which can result in excessive
> + * cached values, so RCX needs to be manually zeroed for use as part
> + * of the cache index. Future CPUID values may need RCX, but since
> + * they can't be known, they must not be cached.
> + */
> + if (lo > 0x80000020)
> + return ULONG_MAX;

If the cache is shared across CPUs, do we also need to exclude function 0x1 because it contains the LAPIC ID? (Or is the cache per-CPU?)