RFC, untested: handing of MSR immediates and MSRs on Xen

From: H. Peter Anvin
Date: Wed Oct 23 2024 - 17:32:21 EST


So the coming of WRMSRNS immediate and RDMSR immediate forms is now official in the latest edition (Oct 2024) of the Intel ISE document, see:

https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

I have been thinking about how to (a) leverage these instructions to the best effect and (b) get rid of the code overhead associated with Xen paravirtualization of a handful of MSRs. As it turns out, the vast majority of MSRs under Xen are simply passed through anyway; a handful (perf related) are handled differently, and a small number are ignored.

The immediate form of these instructions are primarily motivated by performance, not code size: by having the MSR number in an immediate, it is available *much* earlier in the pipeline, which allows the hardware much more leeway about how a particular MSR is handled.

Furthermore, we want to continue to minimize the overhead caused by the remaining users of paravirtualization. The only PV platform left that intercepts MSRs is Xen.

So, as per previous discussions what we want to do is:

- Have Xen handled by the normal alternatives patching;
- Use an assembly wrapper around the Xen-specific code;
- Allow Xen to invoke the standard error handler by adding a new
exception intercept type: EX_TYPE_INDIRECT. This exception type
takes a register (i.e. _ASM_EXTABLE_TYPE_REG) and then looks up
the exception handler at the address pointed to by that register.
This lets the Xen assembly wrapper deal with error by:

/* let CF be set on error here (any flag condition works) */
jc .L_error
ret
.L_error:
pop %rdx /* Drop return address */
sub $5,%rdx /* Rewind to the beginning of CALL instruction */
1: ud2 /* Any unconditionally trapping instruction */
_ASM_EXTABLE_TYPE_REG(1b, 1b /* unused */, EX_TYPE_INDIRECT, %rdx)

Rather than trying to explain the whole mechanism, I'm including a crude-and-totally-untested concept implementation for comments and hopefully, eventually, productization.

Note: I haven't added tracepoint handling yet. *Ideally* tracepoints would be patched over the main callsite instead of using a separate static_key() -- which also messes up register allocation due to the subsequent call. This is a general problem with tracepoints which perhaps is better handled separately.

-hpa


-hpa