Re: [PATCH v2] arm64:kgdb: Fix kernel single-stepping

From: Corey Minyard
Date: Thu Feb 20 2020 - 16:30:46 EST


On Thu, Feb 20, 2020 at 10:30:38AM -0600, Corey Minyard wrote:
> On Thu, Feb 20, 2020 at 02:22:14PM +0000, Will Deacon wrote:
> > On Wed, Feb 19, 2020 at 09:24:03AM -0600, minyard@xxxxxxx wrote:

snip...

> > > After fixing this and doing some more testing, I ran into another issue:
> > >
> > > * Kernel enables the pt_regs single step
> > > * Kernel returns from the exception with ERET.
> > > * An interrupt or page fault happens on the instruction, causing the
> > > instruction to not be run, but the exception handler runs.
> >
> > This sounds like you've broken debug; we should take the step exception
> > in the exception handler. That's the way this is supposed to work.
>
> Ok, here is the disconnect, I think. If that is the case, then what I'm
> seeing is working like it should. That doesn't work with gdb, though,
> gdb expects to be able to single-step and get to the next instruction.
> The scenario I mentioned at the top of this email.
>
> Let me look at this a bit more. I'll look at this on qemu and maybe a
> pi.
>

Ok, this is the disconnect. I was assuming that single step would stop
at the next instruction after returning from an exception. qemu works
the same way the hardware I have does. So I'm assuming arm64 doesn't
clear PTRACE.SS on an exception, even though that seems to be what the
manual says.

You can reproduce this by setting up kgdb on the kernel and hooking up
gdb, setting a breakpoint somewhere that has interrupts enabled, then
doing a "continue". It will hit the same breakpoint again and again
because the PC doesn't get advanced by the single step and the timer
interrupt is always going to be pending. I can do a more detailed set
of instructions with qemu, if you like.

I looked at kprobes a bit. I don't think kprobes will have a problem
with this particular issue, it disables interrupts while single
stepping and doesn't allow a probe on any instruction that would modify
the interrupt settings. I didn't look at page faults, but I assume that
it also won't allow a probe where there can be a page fault.

If a single-step is enabled and an exception occurs before the
instruction is executed, the single step is happening in the exception
handler when debug is re-enabled. This what you are saying is how it is
supposed to work.

That's not what gdb is expecting, and that's not how x86 works, at
least. I looked at ARM and MIPS and they don't even do single steps in
the kernel debugger. PPC seems to work like x86 from code examination
and since our testers haven't reported this bug on that architecture.

I can do a patch that works sort of like kprobes, disabling interrupts
and simulating a single-step if the instruction modifies the daif bits.
Then you couldn't single step across an instruction that did a page
fault, but that's probably not a huge restriction.

I could modify the patch I have now to ifdef it out unless kgdb is
enabled.

I can do a patch that just pulls single step support out of the kgdb
interface for ARM64, since it doesn't work as gdb expects, anyway. And
that's what ARM does.

It doesn't really matter to me. I'm just trying to fix a bug that was
reported to me, and trying to get it upstream as a good citizen. I don't
use kgdb.

-corey