[RFC PATCH] Fix: x86 unaligned __memcpy to/from virtual memory

From: Mathieu Desnoyers
Date: Wed Jun 24 2015 - 12:16:15 EST


When trying to change memory allocation from kmalloc to vmalloc to
handle memory fragmentation for reallocation of a growing string within
a kernel module, our testsuite started to trigger kernel OOPS. It
triggers when the string is copied into a ring buffer using memcpy,
piece-wise.

Here is the OOPS:

[ 4078.314978] BUG: unable to handle kernel paging request at ffffc900038d995e
[ 4078.315824] IP: [<ffffffff81316f12>] __memcpy+0x12/0x20
[ 4078.315824] PGD 236c92067 PUD 236c93067 PMD bac0c067 PTE 0
[ 4078.315824] Oops: 0000 [#1] SMP
[ 4078.315824] Modules linked in: lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_types(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_kprobes(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) virtio_blk virtio_net virtio_pci virtio_ring virtio
[ 4078.315824] CPU: 5 PID: 4258 Comm: lttng-consumerd Tainted: G O 4.1.0 #7
[ 4078.315824] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 4078.315824] task: ffff8802350c3660 ti: ffff8800bae84000 task.ti: ffff8800bae84000
[ 4078.315824] RIP: 0010:[<ffffffff81316f12>] [<ffffffff81316f12>] __memcpy+0x12/0x20
[ 4078.315824] RSP: 0018:ffff8800bae87da0 EFLAGS: 00010246
[ 4078.315824] RAX: ffff880235439025 RBX: 0000000000000fd8 RCX: 00000000000001fb
[ 4078.315824] RDX: 0000000000000000 RSI: ffffc900038d995e RDI: ffff880235439025
[ 4078.315824] RBP: ffff8800bae87db8 R08: ffff8800bacecc00 R09: 0000000000008000
[ 4078.315824] R10: 0000000000000000 R11: 0000000000000246 R12: ffff8800bae87dc8
[ 4078.315824] R13: ffff88023466e800 R14: 0000000000000fd8 R15: 0000000000000fd8
[ 4078.315824] FS: 00007f5d3b1cc700(0000) GS:ffff8802372a0000(0000) knlGS:0000000000000000
[ 4078.315824] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4078.315824] CR2: ffffc900038d995e CR3: 00000000bb1ed000 CR4: 00000000000006e0
[ 4078.315824] Stack:
[ 4078.315824] ffffffffa01ac797 ffff8800bb5bd480 ffff8800bb5bd4d0 ffff8800bae87e48
[ 4078.315824] ffffffffa0073060 ffff88023466e800 0000000000000000 0000000000000fd8
[ 4078.315824] ffffffff00000001 ffff8800bacecc00 0000000000000fd8 0000000000008025
[ 4078.315824] Call Trace:
[ 4078.315824] [<ffffffffa01ac797>] ? lttng_event_write+0x87/0xb0 [lttng_ring_buffer_metadata_client]
[ 4078.315824] [<ffffffffa0073060>] lttng_metadata_output_channel+0xd0/0x120 [lttng_tracer]
[ 4078.315824] [<ffffffffa00755f9>] lttng_metadata_ring_buffer_ioctl+0x79/0xd0 [lttng_tracer]
[ 4078.315824] [<ffffffff8117ba10>] do_vfs_ioctl+0x2e0/0x4e0
[ 4078.315824] [<ffffffff812b35c7>] ? file_has_perm+0x87/0xa0
[ 4078.315824] [<ffffffff8117bc91>] SyS_ioctl+0x81/0xa0
[ 4078.315824] [<ffffffff810115d1>] ? syscall_trace_leave+0xd1/0xe0
[ 4078.315824] [<ffffffff818bbd37>] tracesys_phase2+0x84/0x89
[ 4078.315824] Code: 5b 5d c3 66 0f 1f 44 00 00 e8 6b fc ff ff eb e1 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
[ 4078.315824] RIP [<ffffffff81316f12>] __memcpy+0x12/0x20
[ 4078.315824] RSP <ffff8800bae87da0>
[ 4078.315824] CR2: ffffc900038d995e
[ 4078.315824] ---[ end trace a05b652829ceda48 ]---

This points to arch/x86/lib/memcpy_64.S:__memcpy rep movsq instruction.
This could be reproduced on my Lenovo x240 laptop (i7 CPU), and within a
virtual machine running on a Intel(R) Xeon(R) CPU E5-2630 v3 host.
Interestingly, with the VM having the rep_good flag (but not erms), the issue
triggers. However, if the VM has both rep_good and erms flags, the issue does
not trigger.

Moreover, if I call vmalloc_sync_all() just after each vmalloc()
allocation, the issue does not trigger.

It looks like there is some bad interaction between this implementation
of memcpy and vmalloc faults in the kernel, in cases where the source or
destination addresses are not aligned on multiples of 8 bytes. I'm not
sure if the right fix is to fix __memcpy or to look into this issue from
a vmalloc fault handler perspective.

This fix only covers x86-64. It would be interesting to check whether
x86-32 rep; movsl memcpy is also affected.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxxxxx>
CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
CC: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
CC: x86@xxxxxxxxxx
---
arch/x86/lib/memcpy_64.S | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index b046664..df1ba95 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -29,6 +29,15 @@ ENTRY(__memcpy)
ENTRY(memcpy)
ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \
"jmp memcpy_erms", X86_FEATURE_ERMS
+ /*
+ * Use memcpy_orig when the source or destination address is not
+ * aligned on a multiple of 8 bytes. This takes care of vmalloc
+ * fault issues with unaligned rep movsq accesses.
+ */
+ movq %rsi, %rax
+ orq %rdi, %rax
+ andl $7, %eax
+ jnz memcpy_orig

movq %rdi, %rax
movq %rdx, %rcx
--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/