Re: [PATCH v1 2/4] x86/fred: Write to FRED MSRs with wrmsrns()

From: Xin Li
Date: Tue Jul 09 2024 - 09:59:41 EST


On 7/5/2024 6:45 AM, Borislav Petkov wrote:
On Fri, Jul 05, 2024 at 11:30:16AM +0100, Andrew Cooper wrote:
You cite perf.  Look at the disassembly of the two approaches...

cpu_feature_enabled() might give you warm fuzzy feelings that you've
eekd out every ounce of performance, but it's an absolute disaster at a
code generation level by forcing the compiler to lay out both side and
preventing any kind of CSE.  As I've reported before, count the number
of RDPKRU instructions in trivial-looking xsave handling functions for a
glimpse of the practical consequences.

Yes, I do cite perf because what you have above is not saying: "yes, this is
a fast path and doing an alternative is warranted." If that is the case, sure,
by all means. If not, make the C readable and ignore code generation. Who
cares.

Anyway, none of this is the complicated aspect.  The complicated issue
is the paravirt wrmsr().

TGLX's complaint is that everyone turns on CONFIG_PARAVIRT, and the
paravirt hook for wmsr() is a code generation disaster WRT parameter
handling.  I agree that it's not great, although it's got nothing on the
damage done by cpu_feature_enabled().


But, seeing as I've got everyone's attention, I'll repeat my proposal
for fixing this nicely, in the hope of any feedback on the 3rd posting...

The underlying problem is that parameter setup for the paravirt wrmsr()
follows a C calling convention, so the index/data are manifested into
%rdi/%rsi.  Then, the out-line "native" hook shuffles the index/data
back into %ecx/%edx/%eax, and this cost is borne in all kernels.

A handful of reg ops per a WRMSRNS? Meh, same argument as above. But...

Instead, the better way would be to have a hook with a non-standard
calling convention which happens to match the WRMSR instruction.

That way, the native, and simple paravirt paths inline to a single
instruction with no extraneous parameter shuffling, and the shuffling
cost is borne by PARAVIRT_XXL only, where a reg/reg move is nothing
compared to the hypercall involved.

The only complication is the extable #GP hook, but that's fine to place
at the paravirt site as long as the extable handler confirms the #GP
came from a WRMSR{NS,} and not a branch.

... yes, I'd gladly review patches which address that and make the whole deal
cleaner. I'm still sceptical those handful of regs shuffling ops would matter
in any benchmark but sure, if it can be done in a cleaner way, why not...

Unless I'm missing some use case where that overhead really matters. Then by
all means...


It looks that it no longer makes sense to include this patch in this
patchset; it is not something that can be done as a small cleanup.

Any objection?

Thanks!
Xin