Please no. xstate_required_size() requires multiple host CPUID calls, and glibc
and I think we should redo all or most of kvm_update_cpuid_runtime
the same way.
does CPUID.0xD.0x0 and CPUID.0xD.0x1 as part of its initialization, i.e. launching
a new userspace process in the guest will see additional performance overhread due
to KVM dynamically computing the XSAVE size instead of caching it based on vCPU
state. Nested virtualization would be especially painful as every one of those
"host" CPUID invocations will trigger and exit from L1=>L0.