Re: [PATCH v2 5/7] x86/mm, tracing: Fix CR2 corruption
From: Eiichi Tsukata
Date: Tue Jul 09 2019 - 01:17:20 EST
On 2019/07/08 18:42, Eiichi Tsukata wrote:
>
>
> On 2019/07/08 17:58, Eiichi Tsukata wrote:
>
>>
>> By the way, is there possibility that the WARNING(#GP in execve(2)) which Steven
>> previously hit? : https://lore.kernel.org/lkml/20190321095502.47b51356@xxxxxxxxxxxxxxxxxx/
>>
>> Even if there were, it will *Not* be the bug introduced by this patch series,
>> but the bug revealed by this series.
>>
>
> I reproduced with the kernel:
> - https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=x86/core&id=057b48d544b384c47ed921f616128b802ab1fdc3
>
> Reproducer:
>
> # echo 1 > events/preemptirq/irq_disable/enable
> # echo userstacktrace > trace_options
> # service sshd restart
>
> [ 20.338200] ------------[ cut here ]------------
> [ 20.339471] General protection fault in user access. Non-canonical address?
> [ 20.339488] WARNING: CPU: 1 PID: 1957 at arch/x86/mm/extable.c:126 ex_handler_uaccess+0x52/0x60
> [ 20.342550] Modules linked in:
> [ 20.343209] CPU: 1 PID: 1957 Comm: systemctl Tainted: G W 5.2.0-rc7+ #48
> [ 20.344935] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
> [ 20.346688] RIP: 0010:ex_handler_uaccess+0x52/0x60
> [ 20.347667] Code: c4 08 b8 01 00 00 00 5b 5d c3 80 3d b6 8c 94 01 00 75 db 48 c7 c7 b8 e4 6f aa 48 89 75 f0 c6 05 ad
> [ 20.351508] RSP: 0018:ffffb28940453688 EFLAGS: 00010082
> [ 20.352707] RAX: 0000000000000000 RBX: ffffffffaa2023c0 RCX: 0000000000000001
> [ 20.354478] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffaa7dab8d
> [ 20.355706] RBP: ffffb28940453698 R08: 0000000000000000 R09: 0000000000000001
> [ 20.356665] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb28940453728
> [ 20.357594] R13: 0000000000000000 R14: 000000000000000d R15: 0000000000000000
> [ 20.358876] FS: 00007fec9fa248c0(0000) GS:ffff921d3d800000(0000) knlGS:0000000000000000
> [ 20.360573] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 20.361792] CR2: 0000000000000004 CR3: 0000000074d4e000 CR4: 00000000000006e0
> [ 20.363300] Call Trace:
> [ 20.363830] fixup_exception+0x4a/0x61
> [ 20.364637] __idt_do_general_protection+0x65/0x1d0
> [ 20.365684] do_general_protection+0x1e/0x30
> [ 20.366654] general_protection+0x1e/0x30
> [ 20.367513] RIP: 0010:arch_stack_walk_user+0x7a/0x100
> [ 20.368583] Code: 00 01 1f 00 0f 85 8d 00 00 00 49 8b 87 98 16 00 00 48 83 e8 10 49 39 c6 77 32 41 83 87 d0 15 00 04
> [ 20.372797] RSP: 0018:ffffb289404537d0 EFLAGS: 00010046
> [ 20.374027] RAX: 0000000000000000 RBX: ffff921d383bcb00 RCX: 0074726174736572
> [ 20.375674] RDX: ffff921d38393b84 RSI: 7265732e64687373 RDI: ffffb28940453818
> [ 20.377179] RBP: ffffb28940453808 R08: 0000000000000001 R09: ffff921d3d169c00
> [ 20.378681] R10: 0000000000000b60 R11: ffff921d38393b70 R12: ffffb28940453818
> [ 20.380195] R13: ffffb28940453f58 R14: 0074726174736572 R15: ffff921d383bcb00
> [ 20.381703] ? profile_setup.cold+0x9b/0x9b
> [ 20.382604] stack_trace_save_user+0x60/0x7d
> [ 20.383520] trace_buffer_unlock_commit_regs+0x17b/0x220
> [ 20.384686] trace_event_buffer_commit+0x6b/0x200
> [ 20.385741] trace_event_raw_event_preemptirq_template+0x7b/0xc0
> [ 20.387067] ? get_page_from_freelist+0x10a/0x13b0
> [ 20.388129] ? __alloc_pages_nodemask+0x178/0x380
> [ 20.389132] trace_hardirqs_off+0xc6/0x100
> [ 20.390006] get_page_from_freelist+0x10a/0x13b0
> [ 20.390997] ? kvm_clock_read+0x18/0x30
> [ 20.391813] ? sched_clock+0x9/0x10
> [ 20.392647] __alloc_pages_nodemask+0x178/0x380
> [ 20.393706] alloc_pages_current+0x87/0xe0
> [ 20.394609] __page_cache_alloc+0xcd/0x110
> [ 20.395491] __do_page_cache_readahead+0xa1/0x1a0
> [ 20.396547] ondemand_readahead+0x220/0x540
> [ 20.397486] page_cache_sync_readahead+0x35/0x50
> [ 20.398511] generic_file_read_iter+0x8d8/0xbd0
> [ 20.399335] ? __might_sleep+0x4b/0x80
> [ 20.400110] ext4_file_read_iter+0x35/0x40
> [ 20.400937] new_sync_read+0x10f/0x1a0
> [ 20.401833] __vfs_read+0x29/0x40
> [ 20.402608] vfs_read+0xc0/0x170
> [ 20.403319] kernel_read+0x31/0x50
The cause was obvious as Linus already said in:
https://lore.kernel.org/lkml/CAHk-=whvxJjBBOmQSsrB-xHkc6xm8zGHsBRgpxh14UsEY_g+nw@xxxxxxxxxxxxxx/
stack_trace_save_user() is called after set_fs(KERNEL_DS).
I don't know why systemctl alyways hit this, but user space process can make up stack
as it like, so we have to check it by ourselves?
ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
{
mm_segment_t old_fs;
ssize_t result;
old_fs = get_fs();
set_fs(KERNEL_DS);
/* The cast to a user pointer is valid due to the set_fs() */
result = vfs_read(file, (void __user *)buf, count, pos);
set_fs(old_fs);
return result;
}
EXPORT_SYMBOL(kernel_read);
> [ 20.404128] prepare_binprm+0x150/0x190
> [ 20.404766] __do_execve_file.isra.0+0x4c0/0xaa0
> [ 20.405559] __x64_sys_execve+0x39/0x50
> [ 20.406173] do_syscall_64+0x55/0x1b0
> [ 20.406762] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 20.407703] RIP: 0033:0x7fec9f0ee647
> [ 20.408441] Code: ff 76 e0 f7 d8 64 41 89 01 eb d8 0f 1f 84 00 00 00 00 00 f7 d8 64 41 89 01 eb d7 0f 1f 84 00 00 08
> [ 20.411636] RSP: 002b:00007ffe7f316228 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
> [ 20.412686] RAX: ffffffffffffffda RBX: 00007fec9f8105a4 RCX: 00007fec9f0ee647
> [ 20.414047] RDX: 00007ffe7f3167d8 RSI: 00007ffe7f316230 RDI: 00007fec9f73cea0
> [ 20.415048] RBP: 00007ffe7f316480 R08: 0000000000000030 R09: 0000000000000001
> [ 20.416076] R10: 00007ffe7f316498 R11: 0000000000000202 R12: 00007fec9f73cea0
> [ 20.417556] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
> [ 20.419032] irq event stamp: 832
> [ 20.419493] hardirqs last enabled at (831): [<ffffffffa94b9f2e>] __find_get_block+0xde/0x360
> [ 20.420649] hardirqs last disabled at (832): [<ffffffffa9305ad2>] rcu_irq_enter_irqson+0x12/0x30
> [ 20.422179] softirqs last enabled at (436): [<ffffffffaa20037f>] __do_softirq+0x37f/0x472
> [ 20.423286] softirqs last disabled at (291): [<ffffffffa927bce3>] irq_exit+0xb3/0xd0
> [ 20.424296] ---[ end trace cdb84a67edcdc420 ]---
>