Re: [PATCH v3] KVM: x86: use wrpkru directly in kvm_load_{guest|host}_xsave_state

From: Dave Hansen
Date: Wed May 12 2021 - 16:02:40 EST


On 5/12/21 12:41 AM, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 01:05:02PM -0400, Jon Kohler wrote:
>> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
>> index 8d33ad80704f..5bc4df3a4c27 100644
>> --- a/arch/x86/include/asm/fpu/internal.h
>> +++ b/arch/x86/include/asm/fpu/internal.h
>> @@ -583,7 +583,13 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
>> if (pk)
>> pkru_val = pk->pkru;
>> }
>> - __write_pkru(pkru_val);
>> +
>> + /*
>> + * WRPKRU is relatively expensive compared to RDPKRU.
>> + * Avoid WRPKRU when it would not change the value.
>> + */
>> + if (pkru_val != rdpkru())
>> + wrpkru(pkru_val);
> Just wondering; why aren't we having that in a per-cpu variable? The
> usual per-cpu MSR shadow approach avoids issuing any 'special' ops
> entirely.

It could be a per-cpu variable. When I wrote this originally I figured
that a rdpkru would be cheaper than a load from memory (even per-cpu
memory).

But, now that I think about it, assuming that 'prku_val' is in %rdi, doing:

cmp %gs:0x1234, %rdi

might end up being cheaper than clobbering a *pair* of GPRs with rdpkru:

xor %ecx,%ecx
rdpkru
cmp %rax, %rdi

I'm too lazy to go figure out what would be faster in practice, though.
Does anyone care?