Re: [PATCH v3 3/3] kernel: debug: Centralize dbg_[de]activate_sw_breakpoints

From: Doug Anderson
Date: Wed Sep 16 2020 - 19:34:36 EST


Hi,

On Tue, Sep 15, 2020 at 6:45 AM Daniel Thompson
<daniel.thompson@xxxxxxxxxx> wrote:
>
> On Mon, Sep 14, 2020 at 05:13:28PM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Mon, Sep 14, 2020 at 6:02 AM Daniel Thompson
> > <daniel.thompson@xxxxxxxxxx> wrote:
> > >
> > > During debug trap execution we expect dbg_deactivate_sw_breakpoints()
> > > to be paired with an dbg_activate_sw_breakpoint(). Currently although
> > > the calls are paired correctly they are needlessly smeared across three
> > > different functions. Worse this also results in code to drive polled I/O
> > > being called with breakpoints activated which, in turn, needlessly
> > > increases the set of functions that will recursively trap if breakpointed.
> > >
> > > Fix this by moving the activation of breakpoints into the debug core.
> > >
> > > Signed-off-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx>
> > > ---
> > > kernel/debug/debug_core.c | 2 ++
> > > kernel/debug/gdbstub.c | 1 -
> > > kernel/debug/kdb/kdb_debugger.c | 2 --
> > > 3 files changed, 2 insertions(+), 3 deletions(-)
> >
> > I like the idea, but previously the kgdb_arch_handle_exception() was
> > always called after the SW breakpoints were activated. Are you sure
> > it's OK to swap those two orders across all architectures?
>
> Pretty sure, yes.
>
> However, given the poor attention to detail I demonstrated in patch 2/3,
> I figure I'd better write out the full chain of reasoning if I want
> you to trust me ;-) .
>
> kgdb_arch_handle_exception() is already called frequently with
> breakpoints disabled since it is basically a fallback that is called
> whenever the architecture neutral parts of the gdb packet processing
> cannot cope.
>
> So your question becomes more specific: is it OK to swap orders when the
> architecture code is handling a (c)ontinue or (s)tep packet[1]?
>
> The reason the architecture neutral part cannot cope is because it
> because *if* gdb has been instructed that the PC must be changed before
> resuming then the architecture neutral code does not know how to effect
> this. In other words the call to kgdb_arch_handle_exception() will
> boil down to doing whatever the architecture equivalent of writing to
> linux_regs->pc actually is (or return an error).
>
> Updating the PC is safe whether or not breakpoints are deactivated
> and activating the breakpoints if the architecture code actually
> censors the resume is a bug (which we just fixed).

OK, fair enough. Without digging into all the arch code, I was
assuming that some arch somewhere could be playing tricks on us.
After all, the arch code is told about all the breakpoints
(kgdb_arch_set_breakpoint) so, in theory, it could be doing something
funky (not applying the breakpoint until it sees the "continue" or
something?).

I guess the one thing that seems the most plausible that an
architecture might be doing is doing something special if it is told
to "continue" or "single step" an address that had a breakpoint on it.
I guess I could imagine that on some architectures this might require
special handling (could it be somehow illegal to run the CPU in step
mode over a breakpoint instruction?). ...or maybe if it was using
hardware breakpoints those don't trigger properly if you're continuing
to the address of the breakpoint? I'm making stuff up here and
presumably none of this is true, but it's what I was worried about.

>From a quick glance, I don't _think_ anyone is doing this. Presumably
today they'd either need a) a funky implementation for
kgdb_arch_set_breakpoint() or they'd need b) code somewhere which read
memory and looked for "gdb_bpt_instr".

a) Looking at kgdb_arch_set_breakpoint(), I don't see anything too
funky. They all look like nearly the same code of writing the
breakpoint to memory, perhaps taking into account locked .text space
by using a special patch routine.

b) Looking at users of "gdb_bpt_instr" I do see some funkiness in
"h8300" and "microblaze", but nothing that looks like it fits the bill
of what we're looking for.

In any case, even if someone was doing something funky, it would be
possible to adapt the code to the new way of the world. You'd just
check the PC when applying breakpoints rather than checking the
breakpoints when continuing. So I'm going to go ahead and say:

Reviewed-by: Douglas Anderson <dianders@xxxxxxxxxxxx>

...and I guess if it really truly causes problems for someone we'll
figure it out.


Feel free to ignore the below. I wrote it up but realized it probably
wasn't useful but couldn't bear to delete it. :-P

One somewhat plausible example (I don't think this happens, so feel
free to let your eyes glaze over and ignore). Assume that the kernel
has a mix of ARM and Thumb code. When told to set a breakpoint at an
address the kernel wouldn't know whether the address refers to ARM or
Thumb code. In the past I've solved this type of problem by using an
instruction as a breakpoint that is an illegal instruction when
interpreted any which way (the instruction is illegal as an ARM
instruction and both halves illegal as a Thumb instruction). Right
when we're about to continue, though, we actually have extra
information about the PC we're going to continue to. If we know we're
about to continue in Thumb mode and the address we're going to
continue on is the 2nd half of a breakpoint instruction we suddenly
know that the breakpoint should have been a Thumb-mode instruction and
we could fix it up.

AKA:

1. kernel is asked to set a breakpoint at 0x12340000. We don't know
if this is arm or thumb so we set a 32-bit (ARM) breakpoint at
0x12340000

2. We're told to continue in thumb mode at address 0x12340002

3. We suddenly know that the breakpoint at 0x12340000 should have been
a 16-bit (Thumb) breakpoint, not a 32-bit (ARM) breakpoint, so we
could fix it up before continuing.

OK, that probably was just confusing, and, like I said, I don't think
kdb in Linux does that. ;-)

-Doug