Re: [PATCH v2] x86/vdso: Use RDPID in preference to LSL when available

From: Andy Lutomirski
Date: Thu Apr 21 2016 - 11:26:12 EST


On Thu, Apr 21, 2016 at 5:16 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Wed, Apr 20, 2016 at 06:16:01PM -0700, Andy Lutomirski wrote:
>> Also, it's time for someone to do UMIP. I'll see if I can convince
>> someone in KVM land to emulate it to make it easier to test.
>
> That'll be fun - we can simply set that bit in CR4 and see who screams
> :-P
>
>> Changes from v1:
>> - Remove rdpid() from special_instructions.h. (Was a leftover -- sorry.)
>>
>> arch/x86/include/asm/cpufeatures.h | 1 +
>> arch/x86/include/asm/vgtod.h | 7 ++++++-
>> 2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 7bfb6b70c745..beaf2fb601ee 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -279,6 +279,7 @@
>> /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
>> #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */
>> #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */
>> +#define X86_FEATURE_RDPID (16*32+ 22) /* RDPID instruction */
>>
>> /*
>> * BUG word(s)
>> diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
>> index e728699db774..3a01996db58f 100644
>> --- a/arch/x86/include/asm/vgtod.h
>> +++ b/arch/x86/include/asm/vgtod.h
>> @@ -89,8 +89,13 @@ static inline unsigned int __getcpu(void)
>> * works on all CPUs. This is volatile so that it orders
>> * correctly wrt barrier() and to keep gcc from cleverly
>> * hoisting it out of the calling function.
>> + *
>> + * If RDPID is available, use it.
>> */
>> - asm volatile ("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
>> + alternative_io ("lsl %[p],%[seg]",
>> + ".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
>
> AFAICT, 0xf8 is correct, if I'm reading the SDM right:
>
> bits [7:6] must be 11b for opcode group 9 and RDPID is in the 11b row,
> bits [5:3] are ModRM.reg and they need to be 111b for RDPID (0x7 column)
> and the last three [2:0] select the register and they must be 000b for
> rAX.
>
> HOWEVER, you need to make the asm output register constraint "=a"
> because you're specifying rAX as a destination register for RDPID.

Didn't I?

>
> Also, I'm wondering: should we supply that alternative in a separate
> inline function in special_instructions.h for wider use? I.e., something
> like read_cpu_num() or so...
>

I thought about it, and there were two reasons:

1. I don't think we want to use __getcpu in the kernel. LSL is fairly
slow, and we'd still need to mask off the node number.
raw_smp_processor_id(), in contrast, is a single load.

2. I have no way to benchmark this thing. I'm assuming the RDPID will
be faster than LSL, but that doesn't mean it's faster than a load.
(It could be -- it will save a cache line.)

So we might actually want something that does an alternative where the
two choices are the percpu load and RDPID ; AND, but that wouldn't end
up sharing code. But I'll leave that to someone with an actual
RDPID-supporting CPU :)

--Andy