Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue

From: Denys Vlasenko
Date: Mon Apr 27 2015 - 06:07:52 EST


On 04/27/2015 10:53 AM, Borislav Petkov wrote:
> On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote:
>>> +#define X86_BUG_CANONICAL_RCX X86_BUG(8) /* SYSRET #GPs when %RCX non-canonical */
>>
>> I think that "sysret" should appear in the name.
>
> Yeah, I thought about it too, will fix.
>
>> Oh no! My laptop is currently bug-free, and you're breaking it! :)
>
> Muahahahhahaha...
>
>>> +
>>> + /*
>>> + * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
>>> + * in kernel space. This essentially lets the user take over
>>> + * the kernel, since userspace controls RSP.
>>> + */
>>> + ALTERNATIVE "jmp 1f", "", X86_BUG_CANONICAL_RCX
>>> +
>>
>> I know it would be ugly, but would it be worth saving two bytes by
>> using ALTERNATIVE "jmp 1f", "shl ...", ...?
>>
>>> /* Change top 16 bits to be the sign-extension of 47th bit */
>>> shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
>>> sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
>>> @@ -432,6 +436,7 @@ syscall_return:
>>> cmpq %rcx, %r11
>>> jne opportunistic_sysret_failed
>
> You want to stick all 4 insns in the alternative? Yeah, it should work
> but it might even more unreadable than it is now.
>
> Btw, we can do this too:
>
> ALTERNATIVE "",
> "shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
> sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
> cmpq %rcx, %r11 \
> jne opportunistic_sysret_failed"
> X86_BUG_SYSRET_CANONICAL_RCX
>
> which will replace the 2-byte JMP with a lot of NOPs on AMD.

The instructions you want to NOP out are translated to these bytes:

2c2: 48 c1 e1 10 shl $0x10,%rcx
2c6: 48 c1 f9 10 sar $0x10,%rcx
2ca: 49 39 cb cmp %rcx,%r11
2cd: 75 5f jne 32e <opportunistic_sysret_failed>

According to http://instlatx64.atw.hu/
CPUs from both AMD and Intel are happy to eat "66,66,66,90" NOPs
with maximum throughput; more than three 66 prefixes slow decode down,
sometimes horrifically (from 3 insns per cycle to one insn per ~10 cycles).

Probably doing something like this

/* Only three 0x66 prefixes for NOP for fast decode on all CPUs */
ALTERNATIVE ".byte 0x66,0x66,0x66,0x90 \
.byte 0x66,0x66,0x66,0x90 \
.byte 0x66,0x66,0x66,0x90",
"shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
cmpq %rcx, %r11 \
jne opportunistic_sysret_failed"
X86_BUG_SYSRET_CANONICAL_RCX

would be better than letting ALTERNATIVE to generate 13 one-byte NOPs.

--
vda

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/