Re: [GIT PULL] x86/asm changes for v5.6
From: Luck, Tony
Date: Wed Jan 29 2020 - 14:52:11 EST
On Wed, Jan 29, 2020 at 07:34:04PM +0100, Borislav Petkov wrote:
> > So I'm just hand-waving. Maybe there was some simpler explanation
> > (like me just picking the wrong instructions when I did the rough
> > conversion and simply breaking things with some stupid bug).
>
> Looks like it. So I did this:
>
That looked plausible enough to risk on my remote machine. See below.
> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 7ff00ea64e4f..a670d01570df 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove)
> cmp %rdi, %r8
> jg 2f
>
> - /* FSRM implies ERMS => no length checks, do the copy directly */
> -.Lmemmove_begin_forward:
> - ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
> - ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
> -
> /*
> - * movsq instruction have many startup latency
> - * so we handle small size by general register.
> - */
> - cmp $680, %rdx
> - jb 3f
> - /*
> - * movsq instruction is only good for aligned case.
> + * Three rep-string alternatives:
> + * - go to "movsq" for large regions where source and dest are
> + * mutually aligned (same in low 8 bits). "label 4"
> + * - plain rep-movsb for FSRM
> + * - rep-movs for > 32 byte for ERMS.
Maybe list the code paths in the same order that they appear in the
code below? So ERMS, then FSRM.
> */
> +.Lmemmove_begin_forward:
> + ALTERNATIVE_2 \
> + "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
> + "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS, \
> + "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM
>
> - cmpb %dil, %sil
> - je 4f
> 3:
> sub $0x20, %rdx
> /*
>
> ---
>
So the kernel booted. I grabbed the address of memmove from
/proc/kallysyms and used:
# objdump -d --start-address=0x{blah blah} /proc/kcore
and things look OK in there. It picked the FSRM option to simply
use "rep movsb". Seems to be padded with various NOPs after the
retq. Then the unpatched area starts with the "sub $0x20,%rdx".
ffffffff95946840: 48 89 f8 mov %rdi,%rax
ffffffff95946843: 48 39 fe cmp %rdi,%rsi
ffffffff95946846: 7d 0f jge 0xffffffff95946857
ffffffff95946848: 49 89 f0 mov %rsi,%r8
ffffffff9594684b: 49 01 d0 add %rdx,%r8
ffffffff9594684e: 49 39 f8 cmp %rdi,%r8
ffffffff95946851: 0f 8f a9 00 00 00 jg 0xffffffff95946900
ffffffff95946857: 48 89 d1 mov %rdx,%rcx
ffffffff9594685a: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
ffffffff9594685c: c3 retq
ffffffff9594685d: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
ffffffff95946864: 00
ffffffff95946865: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
ffffffff9594686c: 00
ffffffff9594686d: 66 90 xchg %ax,%ax
ffffffff9594686f: 48 83 ea 20 sub $0x20,%rdx
So:
Tested-by: Tony Luck <tony.luck@xxxxxxxxx>
-Tony