Re: [PATCH v1 2/4] x86/fred: Write to FRED MSRs with wrmsrns()
From: Andrew Cooper
Date: Fri Jul 05 2024 - 06:30:38 EST
On 05/07/2024 10:44 am, Borislav Petkov wrote:
> On Thu, Jul 04, 2024 at 07:45:00PM -0700, H. Peter Anvin wrote:
>> Except that that would be more cleanly spelled wrmsrns()... let's just
>> abstract the whole thing away and let the user use wrmsrns() whenever
>> serialization is not necessary.
> Just don't make it more complex and unreadable than it has to be.
>
> cpu_feature_enabled() already is patching things for optimal perf so even if
> PeterZ prefers the alternative, I say it is ugly and, more importantly,
> unnecessary.
You cite perf. Look at the disassembly of the two approaches...
cpu_feature_enabled() might give you warm fuzzy feelings that you've
eekd out every ounce of performance, but it's an absolute disaster at a
code generation level by forcing the compiler to lay out both side and
preventing any kind of CSE. As I've reported before, count the number
of RDPKRU instructions in trivial-looking xsave handling functions for a
glimpse of the practical consequences.
Anyway, none of this is the complicated aspect. The complicated issue
is the paravirt wrmsr().
TGLX's complaint is that everyone turns on CONFIG_PARAVIRT, and the
paravirt hook for wmsr() is a code generation disaster WRT parameter
handling. I agree that it's not great, although it's got nothing on the
damage done by cpu_feature_enabled().
But, seeing as I've got everyone's attention, I'll repeat my proposal
for fixing this nicely, in the hope of any feedback on the 3rd posting...
The underlying problem is that parameter setup for the paravirt wrmsr()
follows a C calling convention, so the index/data are manifested into
%rdi/%rsi. Then, the out-line "native" hook shuffles the index/data
back into %ecx/%edx/%eax, and this cost is borne in all kernels.
Instead, the better way would be to have a hook with a non-standard
calling convention which happens to match the WRMSR instruction.
That way, the native, and simple paravirt paths inline to a single
instruction with no extraneous parameter shuffling, and the shuffling
cost is borne by PARAVIRT_XXL only, where a reg/reg move is nothing
compared to the hypercall involved.
The only complication is the extable #GP hook, but that's fine to place
at the paravirt site as long as the extable handler confirms the #GP
came from a WRMSR{NS,} and not a branch.
~Andrew