Re: [RFC PATCH] Fix: x86 unaligned __memcpy to/from virtual memory

From: Mathieu Desnoyers
Date: Wed Jun 24 2015 - 14:49:29 EST


----- On Jun 24, 2015, at 1:00 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx wrote:

> On Wed, Jun 24, 2015 at 9:14 AM, Mathieu Desnoyers
> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>> When trying to change memory allocation from kmalloc to vmalloc to
>> handle memory fragmentation for reallocation of a growing string within
>> a kernel module, our testsuite started to trigger kernel OOPS. It
>> triggers when the string is copied into a ring buffer using memcpy,
>> piece-wise.
>
> I hate your patch, just because it doesn't make sense. The "when
> non-aligned, don't do movsq" might make sense for performance, but it
> does *not* make sense for correctness.
>
> Why would "rep movsq" trigger the oops, but memcpy_orig not? I think
> the fundamental bug is something else.
>
> I don't see *what* the bug is, though.
>
> Very odd.
>
> x86 people, can you see anything there? It does look like
> vmalloc_fault() *should* have triggered, so why didn't it? The address
> is definitely in the VMALLOC_START/END range, and the error code is
> 0000, so how come didn't vmalloc_fault() handle this?
>
>> This points to arch/x86/lib/memcpy_64.S:__memcpy rep movsq instruction.
>> This could be reproduced on my Lenovo x240 laptop (i7 CPU), and within a
>> virtual machine running on a Intel(R) Xeon(R) CPU E5-2630 v3 host.
>> Interestingly, with the VM having the rep_good flag (but not erms), the issue
>> triggers. However, if the VM has both rep_good and erms flags, the issue does
>> not trigger.
>
> With ERMS, I think we end up using just "rep movsb" instead. But there
> should be absolutely no difference in fault patterns.
>
> I see the QEMU part, is this just regular kvm?

Yes, this is just regular kvm.

> Could you add a debug
> printk to the vmalloc_fault() caller and then reproduce the oops? It
> shouldn't trigger enough to be a horrible logging problem.

Here is the output. I added the printk just after the initial range
check within vmalloc_fault. What is weird is that the fault happens
on an aligned source address. It's the destination which is unaligned.
Let me know if you need more info.

[ 53.084521] DEBUG: vmalloc_fault at address 0xffffc9000746e000
[ 53.085460] BUG: unable to handle kernel paging request at ffffc9000746e000
[ 53.085460] IP:
[ 53.090220] [<ffffffff81316f12>] __memcpy+0x12/0x20
[ 53.090220] PGD 236c92067 PUD 236c93067 PMD 22e840067 PTE 0
[ 53.090220] Oops: 0000 [#1] SMP
[ 53.090220] Modules linked in: lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_types(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_kprobes(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) virtio_blk virtio_net virtio_pci virtio_ring virtio [last unloaded: lttng_statedump]
[ 53.090220] CPU: 4 PID: 3532 Comm: lttng-consumerd Tainted: G O 4.1.0+ #10
[ 53.090220] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 53.090220] task: ffff880235355aa0 ti: ffff8800bb6d0000 task.ti: ffff8800bb6d0000
[ 53.090220] RIP: 0010:[<ffffffff81316f12>] [<ffffffff81316f12>] __memcpy+0x12/0x20
[ 53.090220] RSP: 0018:ffff8800bb6d3da0 EFLAGS: 00010206
[ 53.090220] RAX: ffff8802355b3025 RBX: 0000000000000fdb RCX: 00000000000001fb
[ 53.090220] RDX: 0000000000000003 RSI: ffffc9000746e000 RDI: ffff8802355b3025
[ 53.090220] RBP: ffff8800bb6d3db8 R08: ffff880231cd7200 R09: 0000000000000025
[ 53.090220] R10: 0000000000000000 R11: 0000000000001000 R12: ffff8800bb6d3dc8
[ 53.090220] R13: ffff88022e437400 R14: 0000000000000fdb R15: 0000000000000fdb
[ 53.090220] FS: 00007f24d8bbc700(0000) GS:ffff880237280000(0000) knlGS:0000000000000000
[ 53.090220] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 53.090220] CR2: ffffc9000746e000 CR3: 00000000ba6d6000 CR4: 00000000000006e0
[ 53.090220] Stack:
[ 53.090220] ffffffffa05ac797 ffff8802334fb300 ffff8802334fb350 ffff8800bb6d3e48
[ 53.090220] ffffffffa0473060 ffff88022e437400 0000000000000000 0000000000000fdb
[ 53.090220] ffffffff00000001 ffff880231cd7200 0000000000000fdb 0000000000000025
[ 53.090220] Call Trace:
[ 53.090220] [<ffffffffa05ac797>] ? lttng_event_write+0x87/0xb0 [lttng_ring_buffer_metadata_client]
[ 53.090220] [<ffffffffa0473060>] lttng_metadata_output_channel+0xd0/0x120 [lttng_tracer]
[ 53.090220] [<ffffffffa04755f9>] lttng_metadata_ring_buffer_ioctl+0x79/0xd0 [lttng_tracer]
[ 53.090220] [<ffffffff8117ba10>] do_vfs_ioctl+0x2e0/0x4e0
[ 53.090220] [<ffffffff812b35c7>] ? file_has_perm+0x87/0xa0
[ 53.090220] [<ffffffff8117bc91>] SyS_ioctl+0x81/0xa0
[ 53.090220] [<ffffffff818bbd37>] tracesys_phase2+0x84/0x89
[ 53.090220] Code: 5b 5d c3 66 0f 1f 44 00 00 e8 6b fc ff ff eb e1 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
[ 53.090220] RIP [<ffffffff81316f12>] __memcpy+0x12/0x20
[ 53.090220] RSP <ffff8800bb6d3da0>
[ 53.090220] CR2: ffffc9000746e000
[ 53.090220] ---[ end trace 850d7bf1b41647ee ]---



--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/