Re: [RFC PATCH 0/2] arm64 kgdb fixes for single stepping

From: Corey Minyard
Date: Thu Feb 13 2020 - 10:57:44 EST


On Thu, Feb 13, 2020 at 10:10:58AM +0000, Will Deacon wrote:
> On Wed, Feb 12, 2020 at 09:11:29PM -0600, minyard@xxxxxxx wrote:
> > I got a bug report about using kgdb on arm64, and it turns out it was
> > fairly broken. Patch 2 has a description of what was going on. I am
> > using a Marvell 8100 board.
> >
> > The following patches fix the problem, but probably not in the
> > best way. They are what I hacked out to show the problems.
> >
> > I am not quite sure how this will interact with kprobes and hardware
> > breakpoints which use the same code, but they would have been broken,
> > too, so this is not making them any worse.
>
> This should all be handled by kgdb itself, not by changing the low-level
> debug exception handling. For example, the '&kgdb_step_hook' can take
> care of re-arming the step state machine and kgdb can also simply disable
> interrupts during the step if it doesn't want to step into the handler.

How can kgdb disable the SS bit in MDSRC, or re-enable it on the right
CPU, without doing this in the exception handling?

I'm actually thinking that this may be a hardware bug. Looking at the
ARMv8 manual, it looks like PSTATE.SS should be set to 0 if the
processor takes an exception. That's definitely not happening; if I do
an instruction step from, say, sys_sync(), it gets the single-step trap
on the instruction after the PSTATE.D bit is disabled in el1_irq.

Even so, I think the migration issue is still a problem. If you do an
eret set up for single-step, and interrupts are on, and you get a timer
interrupt, it could migrate the task to a different CPU if
PREEMPT_ENABLE is set, right? If so, the MDSRC.SS bit will be set on
the wrong CPU and the single step trap won't happen. That will break
kprobes, too.

You mention turning off interrupts in kgdb when single-stepping, which
you could do and it would solve this problem. But it wouldn't solve the
problem of taking a paging exception, which you want to take in this
case. And you could still migrate on a paging exception. So I don't
think disabling interrupts is a good solution.

I don't see a solution besides clearing MDSCR.SS on an el1 exception
entry and conditionally setting it on an el1 exception return. It might
be better to have a thread flag to do this instead of depending on the
setting of that bit; I'm not sure how expensive accessing the MDSRC
register is.

Setting SPSR.SS on subsequent single steps is definitely an issue, but I
can split that out into a separate patch.

-corey

>
> Will