Re: [GIT PULL] x86/asm changes for v5.6

From: Damian Tometzki
Date: Thu Jan 30 2020 - 00:48:09 EST




Am 29.01.20 um 19:34 schrieb Borislav Petkov:
On Wed, Jan 29, 2020 at 09:40:58AM -0800, Linus Torvalds wrote:
On Wed, Jan 29, 2020 at 9:07 AM Luck, Tony <tony.luck@xxxxxxxxx> wrote:

This returns "3" ... not what we want. But swapping the ERMS/FSRM order
gets the correct version.

That actually makes sense, and is what I suspected (after I wrote the
patch) what would happen. It would just be good to have it explicitly
documented at the macro.

Like this?

---
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 13adca37c99a..d94bad03bcb4 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -164,6 +164,11 @@ static inline int alternatives_text_reserved(void *start, void *end)
ALTINSTR_REPLACEMENT(newinstr, feature, 1) \
".popsection\n"
+/*
+ * The patching happens in the natural order of first newinstr1 and then
+ * newinstr2, iff the respective feature bits are set. See apply_alternatives()
+ * for details.
+ */
#define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
OLDINSTR_2(oldinstr, 1, 2) \
".pushsection .altinstructions,\"a\"\n" \

That would be bad indeed, but I don't think it should matter
particularly for this case - it would have been bad before too.

I suspect there is some other issue going on, like the NOP padding
logic being confused.

Or the cmp $0x20 test missing in the default case, see below.

In particular, look here, this is the alt_instruction entries:

altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f,142b-141b
altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f,142b-141b

where the arguments are "orig alt feature orig_len alt_len pad_len" in
that order.

Notice how "pad_len" in both cases is the padding from the _original_
instruction (142b-141b).

<snip this which I'll take a look later so that we can sort out the
issue at hand first>

So I'm just hand-waving. Maybe there was some simpler explanation
(like me just picking the wrong instructions when I did the rough
conversion and simply breaking things with some stupid bug).

Looks like it. So I did this:

---
diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 7ff00ea64e4f..a670d01570df 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove)
cmp %rdi, %r8
jg 2f
- /* FSRM implies ERMS => no length checks, do the copy directly */
-.Lmemmove_begin_forward:
- ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
- ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
-
/*
- * movsq instruction have many startup latency
- * so we handle small size by general register.
- */
- cmp $680, %rdx
- jb 3f
- /*
- * movsq instruction is only good for aligned case.
+ * Three rep-string alternatives:
+ * - go to "movsq" for large regions where source and dest are
+ * mutually aligned (same in low 8 bits). "label 4"
+ * - plain rep-movsb for FSRM
+ * - rep-movs for > 32 byte for ERMS.
*/
+.Lmemmove_begin_forward:
+ ALTERNATIVE_2 \
+ "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
+ "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS, \
+ "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM
- cmpb %dil, %sil
- je 4f
3:
sub $0x20, %rdx
/*

---

Notice how the *first* option of the alternative, which is the default
one, has that gotten that additional "cmp $0x20, %rdx; jb 1f" test which
sends us down to the less than 32 bytes copy length.

Your original version didn't have it and here's what I saw:

So I stopped the guest just before that code and the call trace looked
like this:

#0 memmove () at arch/x86/lib/memmove_64.S:43
#1 0xffffffff824448c2 in memblock_insert_region (type=0xffffffff824a8298 <memblock+56>, idx=<optimized out>, base=0,
size=4096, nid=2, flags=MEMBLOCK_NONE) at mm/memblock.c:553
#2 0xffffffff824454f0 in memblock_add_range (type=0xffffffff824a8298 <memblock+56>, base=0, size=<optimized out>,
nid=73400320, flags=<optimized out>) at mm/memblock.c:641
#3 0xffffffff82445627 in memblock_reserve (base=0, size=4096) at mm/memblock.c:830
#4 0xffffffff823ff399 in setup_arch (cmdline_p=0xffffffff82003f28) at arch/x86/kernel/setup.c:798
#5 0xffffffff823f9ae1 in start_kernel () at init/main.c:598
#6 0xffffffff810000d4 in secondary_startup_64 () at arch/x86/kernel/head_64.S:242
#7 0x0000000000000000 in ?? ()

and count in rdx was:

rdx 0x18 24

Without that "cmp $0x20" test above, we do the "cmp $680, %rdx; jb 3f;" test
and we run into the following asm at label 3:

3:
sub $0x20, %rdx
/*
* We gobble 32 bytes forward in each loop.
*/

<--- right here %rdx is:

rdx 0xfffffffffffffff8 -8

and yeeehaaw, we're in the weeds and then end up triplefaulting at some
unmapped source address in %rsi or so.

So now I'm going to play all three variants with pen and paper to make
sure we're still sane.

Thx.

Hello together,

in my qemu env the system isnt coming up. I tried both with and without the changes from Borislav.

Best regards
Damian


0.605193] ------------[ cut here ]------------
[ 0.605933] General protection fault in user access. Non-canonical address?
[ 0.605948] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:77 ex_handler_uaccess+0x48/0x50
[ 0.606931] Modules linked in:
[ 0.606931] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0 #15
[ 0.606931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[ 0.606931] RIP: 0010:ex_handler_uaccess+0x48/0x50
[ 0.606931] Code: 83 c4 08 b8 01 00 00 00 5b c3 80 3d 78 75 50 01 00 75 dc 48 c7 c7 00 ea 1e 82 48 89 34 24 c6 05 64 75 50 01 01 e8 9e fd 00 00 <0f> 0b 48 8b 34 24 eb bd 80 3d 4f 75 50 01 00 55 48 89 fd 53 49
[ 0.606931] RSP: 0000:ffffc900000dbc30 EFLAGS: 00010286
[ 0.606931] RAX: 000000000000003f RBX: ffffffff82339d6c RCX: 0000000000000000
[ 0.606931] RDX: 0000000000000007 RSI: ffffffff8316dc5f RDI: ffffffff8316e05f
[ 0.606931] RBP: 000000000000000d R08: ffffffff8316dc20 R09: 0000000000029840
[ 0.606931] R10: 000000010196fab4 R11: 0000000000000001 R12: 0000000000000000
[ 0.606931] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 0.606931] FS: 0000000000000000(0000) GS:ffff88800f800000(0000) knlGS:0000000000000000
[ 0.606931] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.606931] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000006f0
[ 0.606931] Call Trace:
[ 0.606931] fixup_exception+0x3e/0x51
[ 0.606931] do_general_protection+0x9c/0x260
[ 0.606931] general_protection+0x28/0x30
[ 0.606931] RIP: 0010:copy_user_generic_string+0x2c/0x40
[ 0.606931] Code: 00 83 fa 08 72 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f
[ 0.606931] RSP: 0000:ffffc900000dbda0 EFLAGS: 00010246
[ 0.606931] RAX: 0000000000000000 RBX: ffff88801e488000 RCX: 0000000000000001
[ 0.606931] RDX: 0000000000000000 RSI: 009896808086a201 RDI: ffffc900000dbdc0
[ 0.606931] RBP: ffffffffffffffff R08: ffff88800f82d280 R09: 0000000000000073
[ 0.606931] R10: 0000000000000000 R11: ffffc900000dbd68 R12: 0000000000000dc0
[ 0.606931] R13: 00000000000000fe R14: ffffffff81252d01 R15: 009896808086a201
[ 0.606931] ? __register_sysctl_table+0x361/0x500
[ 0.606931] __probe_kernel_read+0x2e/0x60
[ 0.606931] __kmalloc+0x10b/0x230
[ 0.606931] __register_sysctl_table+0x361/0x500
[ 0.606931] ? kmem_cache_alloc_trace+0xee/0x1e0
[ 0.606931] __register_sysctl_paths+0x186/0x1b0
[ 0.606931] ? iomap_init+0x1b/0x1b
[ 0.606931] ? do_early_param+0x89/0x89
[ 0.606931] dquot_init+0x23/0x117
[ 0.606931] ? iomap_init+0x1b/0x1b
[ 0.606931] do_one_initcall+0x31/0x1b0
[ 0.606931] ? do_early_param+0x89/0x89
[ 0.606931] ? do_early_param+0x89/0x89
[ 0.606931] kernel_init_freeable+0x15b/0x1b3
[ 0.606931] ? rest_init+0x9a/0x9a
[ 0.606931] kernel_init+0x5/0xf6
[ 0.606931] ret_from_fork+0x35/0x40
[ 0.606931] ---[ end trace 8ee2a58282a5eb54 ]---
[ 0.650539] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC PTI