Re: [PATCH v2 0/6] x86: Clean up fast syscall return validation
From: H. Peter Anvin
Date: Fri Oct 06 2023 - 15:00:27 EST
On 10/5/23 13:20, Ingo Molnar wrote:
* Brian Gerst <brgerst@xxxxxxxxx> wrote:
Looking at the compiled output, the only suboptimal code appears to be
the canonical address test, where the C code uses the CL register for
the shifts instead of immediates.
180: e9 00 00 00 00 jmp 185 <do_syscall_64+0x85>
181: R_X86_64_PC32 .altinstr_aux-0x4
185: b9 07 00 00 00 mov $0x7,%ecx
18a: eb 05 jmp 191 <do_syscall_64+0x91>
18c: b9 10 00 00 00 mov $0x10,%ecx
191: 48 89 c2 mov %rax,%rdx
194: 48 d3 e2 shl %cl,%rdx
197: 48 d3 fa sar %cl,%rdx
19a: 48 39 d0 cmp %rdx,%rax
19d: 75 39 jne 1d8 <do_syscall_64+0xd8>
Yeah, it didn't look equivalent - so I guess we want a C equivalent for:
- ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \
- "shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57
instead of the pgtable_l5_enabled() runtime test that
__is_canonical_address() uses?
I don't think such a thing (without simply duplicate the above as an
alternative asm, which is obviously easy enough, and still allows the
compiler to pick the register used) would be possible without immediate
patching support[*].
Incidentally, this is a question for Uros: is there a reason this is a
mov to %ecx and not just %cl, which would save 3 bytes?
Incidentally, it is possible to save one instruction and use only *one*
alternative immediate:
leaq (%rax,%rax),%rdx
xorq %rax,%rdx
shrq $(63 - LA),%rdx # Yes, 63, not 64
# ZF=1 if canonical
This works because if bit [x] is set in the output, then bit [x] and
[x-1] in the input are different (bit [-1] considered to be zero); and
by definition a bit is canonical if and only if all the bits [63:LA] are
identical, thus bits [63:LA+1] in the output must all be zero.
The first two instructions are pure arithmetic and can thus be done in C:
bar = foo ^ (foo << 1);
... leaving only one instruction needing to be patched at runtime.
-hpa
[*] which is a whole bit of infrastructure that I know first-hand is
extremely complex[**] -- I had an almost-complete patchset done at one
time, but it has severely bitrotted. The good part is that it is
probably a lot easier to do today, with the alternatives-patching
callback routines available.
[**] the absolute best would of course be if such infrastructure could
be compiler-supported (probably as part as some really fancy support for
alternatives/static branch), but that would probably be a nightmare to
do in the compiler or a plugin.