One downside to that is that we'd need to do the VMSAVE on every iteration of vcpu_run(), as opposed to just once when we enter
from userspace via KVM_RUN. It ends up being a similar situation to
Andy's earlier suggestion of moving VMLOAD just after vmexit, but
in that case we were able to remove an MSR write to MSR_GS_BASE,
which cancelled out the overhead, but in this case I think it could
only cost us extra.
If you want to micro-optimize, there is a trick you could play: use
WRGSBASE if available. If X86_FEATURE_GSBASE is available, you could
use WRGSBASE to restore GSBASE and defer VMLOAD to vcpu_put(). This
would need benchmarking on Zen 3 to see if it’s worthwhile.