Re: BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:LINE

From: Paolo Bonzini
Date: Mon Nov 06 2017 - 11:14:49 EST


On 06/11/2017 17:01, David Hildenbrand wrote:
> On 06.11.2017 16:10, Nick Desaulniers wrote:
>> Does it have to be stack allocated?
>
> We can't use kmalloc and friends in emulate.c. We would have to
> introduce new emulator callbacks.
>
> a) for malloc and free. hmmm.
> b) for carrying out the fxrstr/fixup.
>
> Paolo, what do you suggest?

You can use kmalloc. Any userspace user of emulate.c would have to
write a wrapper. But I'm not sure it's useful... maybe the
asm_safe+memcpy could be moved to a separate noinline function, so that
segmented_read_std is invoked with a leaner stack.

Paolo

>>
>> On Nov 6, 2017 3:52 AM, "David Hildenbrand" <david@xxxxxxxxxx
>> <mailto:david@xxxxxxxxxx>> wrote:
>>
>> On 31.10.2017 12:34, syzbot wrote:
>> > Hello,
>> >
>> > syzkaller hit the following crash on
>> > 91dfed74eabcdae9378131546c446442c29bf769
>> >
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> <http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master>
>> > compiler: gcc (GCC) 7.1.1 20170620
>> > .config is attached
>> > Raw console output is attached.
>> > C reproducer is attached
>> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> > for information about syzkaller reproducers
>> >
>> >
>> > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109
>> > 2 locks held by syzkaller879109/2909:
>> >Â Â#0:Â (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>]
>> vcpu_load+0x1c/0x70
>> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:154
>> >Â Â#1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest
>> > arch/x86/kvm/x86.c:6983 [inline]
>> >Â Â#1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run
>> > arch/x86/kvm/x86.c:7061 [inline]
>> >Â Â#1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>]
>> > kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222
>> > CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted
>> 4.13.0-rc4-next-20170811
>> > #1
>> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>> 01/01/2011
>> > Call Trace:
>> >Â Â__dump_stack lib/dump_stack.c:16 [inline]
>> >Â Âdump_stack+0x194/0x257 lib/dump_stack.c:52
>> >Â Â___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014
>> >Â Â__might_sleep+0x95/0x190 kernel/sched/core.c:5967
>> >Â Â__might_fault+0xab/0x1d0 mm/memory.c:4383
>> >Â Â__copy_from_user include/linux/uaccess.h:71 [inline]
>> >Â Â__kvm_read_guest_page+0x58/0xa0
>> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771
>> >Â Âkvm_vcpu_read_guest_page+0x44/0x60
>> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791
>> >Â Âkvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407
>> >Â Âkvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466
>> >Â Âsegmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819
>> >Â Âem_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022
>>
>>
>> In em_fxrstor, we do a get_fpu(), which in return disables preemption.
>>
>> With preempt_disable(), we do a
>> segmented_read_std()->kvm_vcpu_read_guest_page(), triggering the
>> warning.
>>
>> >Â Âx86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471
>> >Â Âx86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698
>> >Â Âkvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854
>> >Â Âhandle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400
>> >Â Âvmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718
>> >Â Âvcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline]
>> >Â Âvcpu_run arch/x86/kvm/x86.c:7061 [inline]
>> >Â Âkvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222
>> >Â Âkvm_vcpu_ioctl+0x64c/0x1010
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591
>> >Â Âvfs_ioctl fs/ioctl.c:45 [inline]
>> >Â Âdo_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
>> >Â ÂSYSC_ioctl fs/ioctl.c:700 [inline]
>> >Â ÂSyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>> >Â Âentry_SYSCALL_64_fastpath+0x1f/0xbe
>>
>> I don't really see a way to avoid two fxstate variables. Unloading the
>> fpu in between the fxstore/fxrstr could lead to host values getting
>> overwritten. Loading/saving the fpu in kvm_arch_vcpu_ioctl_run() would
>> most probably also not work, as the relevant portions of fxregs_state
>> would not get saved/restored. So the preemption would still be needed.
>>
>>
>> So all I can offer for now is the following (untested, can send as
>> proper patch if needed):
>>
>>
>> From f32d06c8c621c6d68e073e9bdb81a6280b6c9544 Mon Sep 17 00:00:00 2001
>> From: David Hildenbrand <david@xxxxxxxxxx <mailto:david@xxxxxxxxxx>>
>> Date: Mon, 6 Nov 2017 12:35:39 +0100
>> Subject: [PATCH v1] KVM: x86: fix em_fxstor sleeping while in atomic
>>
>> Commit 9d643f63128b tried to optimize the stack size, but introduced a
>> guest memory access which might sleep while in atomic.
>>
>> Let's undo that part of the commit but keep the cleanups.
>>
>> Reported by syzbot:
>>
>> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109
>> 2 locks held by syzkaller879109/2909:
>> Â #0:Â (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>]
>> vcpu_load+0x1c/0x70
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:154
>> Â #1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest
>> arch/x86/kvm/x86.c:6983 [inline]
>> Â #1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run
>> arch/x86/kvm/x86.c:7061 [inline]
>> Â #1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>]
>> kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222
>> CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted
>> 4.13.0-rc4-next-20170811
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>> 01/01/2011
>> Call Trace:
>> Â __dump_stack lib/dump_stack.c:16 [inline]
>> Â dump_stack+0x194/0x257 lib/dump_stack.c:52
>> Â ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014
>> Â __might_sleep+0x95/0x190 kernel/sched/core.c:5967
>> Â __might_fault+0xab/0x1d0 mm/memory.c:4383
>> Â __copy_from_user include/linux/uaccess.h:71 [inline]
>> Â __kvm_read_guest_page+0x58/0xa0
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771
>> Â kvm_vcpu_read_guest_page+0x44/0x60
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791
>> Â kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407
>> Â kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466
>> Â segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819
>> Â em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022
>> Â x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471
>> Â x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698
>> Â kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854
>> Â handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400
>> Â vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718
>> Â vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline]
>> Â vcpu_run arch/x86/kvm/x86.c:7061 [inline]
>> Â kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222
>> Â kvm_vcpu_ioctl+0x64c/0x1010
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591
>> Â vfs_ioctl fs/ioctl.c:45 [inline]
>> Â do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
>> Â SYSC_ioctl fs/ioctl.c:700 [inline]
>> Â SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>> Â entry_SYSCALL_64_fastpath+0x1f/0xbe
>> RIP: 0033:0x437fc9
>> RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
>> RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9
>> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
>> RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000
>> R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000
>> R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000
>>
>> Fixes: 9d643f63128b ("KVM: x86: avoid large stack allocations in
>> em_fxrstor")
>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx
>> <mailto:david@xxxxxxxxxx>>
>> ---
>> Âarch/x86/kvm/emulate.c | 16 +++++++++-------
>> Â1 file changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index fb0055953fbc..d87f01a2d6f4 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -4002,7 +4002,7 @@ static int em_fxsave(struct x86_emulate_ctxt
>> *ctxt)
>>
>> Âstatic int em_fxrstor(struct x86_emulate_ctxt *ctxt)
>> Â{
>> -Â Â Â Âstruct fxregs_state fx_state;
>> +Â Â Â Âstruct fxregs_state fx_state, fx_old;
>> Â Â Â Â int rc;
>> Â Â Â Â size_t size;
>>
>> @@ -4010,19 +4010,21 @@ static int em_fxrstor(struct
>> x86_emulate_ctxt *ctxt)
>> Â Â Â Â if (rc != X86EMUL_CONTINUE)
>> Â Â Â Â Â Â Â Â return rc;
>>
>> +Â Â Â Âsize = fxstate_size(ctxt);
>> +Â Â Â Ârc = segmented_read_std(ctxt, ctxt->memop.addr.mem,
>> &fx_state, size);
>> +Â Â Â Âif (rc != X86EMUL_CONTINUE)
>> +Â Â Â Â Â Â Â Âreturn rc;
>> +
>> Â Â Â Â ctxt->ops->get_fpu(ctxt);
>>
>> -Â Â Â Âsize = fxstate_size(ctxt);
>> Â Â Â Â if (size < __fxstate_size(16)) {
>> -Â Â Â Â Â Â Â Ârc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state));
>> +Â Â Â Â Â Â Â Ârc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_old));
>> Â Â Â Â Â Â Â Â if (rc != X86EMUL_CONTINUE)
>> Â Â Â Â Â Â Â Â Â Â Â Â goto out;
>> +Â Â Â Â Â Â Â Âmemcpy(((void *)&fx_state) + size, ((void *)&fx_old)
>> + size,
>> +Â Â Â Â Â Â Â Â Â Â Â __fxstate_size(16) - size);
>> Â Â Â Â }
>>
>> -Â Â Â Ârc = segmented_read_std(ctxt, ctxt->memop.addr.mem,
>> &fx_state, size);
>> -Â Â Â Âif (rc != X86EMUL_CONTINUE)
>> -Â Â Â Â Â Â Â Âgoto out;
>> -
>> Â Â Â Â if (fx_state.mxcsr >> 16) {
>> Â Â Â Â Â Â Â Â rc = emulate_gp(ctxt, 0);
>> Â Â Â Â Â Â Â Â goto out;
>> --
>> 2.13.6
>>
>>
>>
>> --
>>
>> Thanks,
>>
>> David / dhildenb
>>
>
>