Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping

From: Jan Kiszka
Date: Thu Mar 18 2021 - 08:22:22 EST

On 16.03.21 13:34, Maxim Levitsky wrote:
> On Tue, 2021-03-16 at 12:27 +0100, Jan Kiszka wrote:
>> On 16.03.21 11:59, Maxim Levitsky wrote:
>>> On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
>>>> On 16.03.21 00:37, Sean Christopherson wrote:
>>>>> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>>>>>> This change greatly helps with two issues:
>>>>>> * Resuming from a breakpoint is much more reliable.
>>>>>> When resuming execution from a breakpoint, with interrupts enabled, more often
>>>>>> than not, KVM would inject an interrupt and make the CPU jump immediately to
>>>>>> the interrupt handler and eventually return to the breakpoint, to trigger it
>>>>>> again.
>>>>>> From the user point of view it looks like the CPU never executed a
>>>>>> single instruction and in some cases that can even prevent forward progress,
>>>>>> for example, when the breakpoint is placed by an automated script
>>>>>> (e.g lx-symbols), which does something in response to the breakpoint and then
>>>>>> continues the guest automatically.
>>>>>> If the script execution takes enough time for another interrupt to arrive,
>>>>>> the guest will be stuck on the same breakpoint RIP forever.
>>>>>> * Normal single stepping is much more predictable, since it won't land the
>>>>>> debugger into an interrupt handler, so it is much more usable.
>>>>>> (If entry to an interrupt handler is desired, the user can still place a
>>>>>> breakpoint at it and resume the guest, which won't activate this workaround
>>>>>> and let the gdb still stop at the interrupt handler)
>>>>>> Since this change is only active when guest is debugged, it won't affect
>>>>>> KVM running normal 'production' VMs.
>>>>>> Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
>>>>>> Tested-by: Stefano Garzarella <sgarzare@xxxxxxxxxx>
>>>>>> ---
>>>>>> arch/x86/kvm/x86.c | 6 ++++++
>>>>>> 1 file changed, 6 insertions(+)
>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>> index a9d95f90a0487..b75d990fcf12b 100644
>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>> @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>>>>>> can_inject = false;
>>>>>> }
>>>>>> + /*
>>>>>> + * Don't inject interrupts while single stepping to make guest debug easier
>>>>>> + */
>>>>>> + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>>>>>> + return;
>>>>> Is this something userspace can deal with? E.g. disable IRQs and/or set NMI
>>>>> blocking at the start of single-stepping, unwind at the end? Deviating this far
>>>>> from architectural behavior will end in tears at some point.
>>>> Does this happen to address this suspicious workaround in the kernel?
>>>> /*
>>>> * The kernel doesn't use TF single-step outside of:
>>>> *
>>>> * - Kprobes, consumed through kprobe_debug_handler()
>>>> * - KGDB, consumed through notify_debug()
>>>> *
>>>> * So if we get here with DR_STEP set, something is wonky.
>>>> *
>>>> * A known way to trigger this is through QEMU's GDB stub,
>>>> * which leaks #DB into the guest and causes IST recursion.
>>>> */
>>>> if (WARN_ON_ONCE(dr6 & DR_STEP))
>>>> regs->flags &= ~X86_EFLAGS_TF;
>>>> (arch/x86/kernel/traps.c, exc_debug_kernel)
>>>> I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
>>>> yeah, question to myself as well, dancing around broken guest debugging
>>>> for a long time while trying to fix other issues...
>>> To be honest I didn't see that warning even once, but I can imagine KVM
>>> leaking #DB due to bugs in that code. That area historically didn't receive
>>> much attention since it can only be triggered by
>>> KVM_GET/SET_GUEST_DEBUG which isn't used in production.
>> I've triggered it recently while debugging a guest, that's why I got
>> aware of the code path. Long ago, all this used to work (soft BPs,
>> single-stepping etc.)
>>> The only issue that I on the other hand did
>>> see which is mostly gdb fault is that it fails to remove a software breakpoint
>>> when resuming over it, if that breakpoint's python handler messes up
>>> with gdb's symbols, which is what lx-symbols does.
>>> And that despite the fact that lx-symbol doesn't mess with the object
>>> (that is the kernel) where the breakpoint is defined.
>>> Just adding/removing one symbol file is enough to trigger this issue.
>>> Since lx-symbols already works this around when it reloads all symbols,
>>> I extended that workaround to happen also when loading/unloading
>>> only a single symbol file.
>> You have no issue with interactive debugging when NOT using gdb scripts
>> / lx-symbol?
> To be honest I don't use guest debugging that much,
> so I probably missed some issues.
> Now that I fixed lx-symbols though I'll probably use
> guest debugging much more.
> I will keep an eye on any issues that I find.
> The main push to fix lx-symbols actually came
> from me wanting to understand if there is something
> broken with KVM's guest debugging knowing that
> lx-symbols crashes the guest when module is loaded
> after lx-symbols was executed.
> That lx-symbols related guest crash I traced to issue
> with gdb as I explained, and the lack of blocking of the interrupts
> on single step is not a bug but more a missing feature
> that should be implemented to make single step easier to use.
> Another issue which isn't a bug is that you can't place a software
> breakpoint if kernel is not loaded (since there is no code in memory)
> or if the kernel haven't done basic paging initialization
> (since there is no paging yet to know where to place the breakpoint).
> Hardware breakpoints work for this fine though.
> So in summary I haven't found any major issues with KVM's guest debug
> yet.
> If I do notice issues with guest debug, I will try to isolate
> and debug them.
> For the issue that you mentioned, do you have a way to reproduce it?

To pick this up again, I did some experiments and was easily able to
reproduce the main problem:

- checkout linux master (1df27313f50 yesterday)
- build a fitting host kernel with KVM
- build a target kernel with defconfig + CONFIG_KVM_GUEST +
CONFIG_DEBUG_INFO, no gdb scripts for now
- boot the guest

qemu-system-x86_64 linux.img -enable-kvm -smp 4 -s
-kernel bzImage -append "console=ttyS0 root=... nokaslr"

- gdb vmlinux
- tar rem :1234
- b __x64_sys_execve
- continue whenever you hit the breakpoint, and the guest will
eventually hang
- or stepi, and you may end up in the interrupt handler
- specifically doing that on the serial console or setting the bp in
early boot exposes the issue

I've also seen

WARNING: CPU: 3 PID: 751 at ../arch/x86/kernel/traps.c:915

from time to time.

When I apply this patch to the host, the problems are gone.

Interestingly, my OpenSUSE 5.3.18-lp152.66-default does not show this
behavior and /seems/ stable from quick testing. Not sure if there were
changes in upstream between that baseline and head or if Suse is
carrying a local fix.

In any case, I think we need the following:

- default disabling of event injections when guest debugging injected
- possibly some control interface to allow events BUT then avoids any
TF injection to prevent guest state corruptions
- exposure of that interface to the gdb frontend, maybe by making it
available via the QEMU monitor (monitor ...)
- a for kvm-unit-tests to trigger the above corner cases


Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux