Re: [PATCH WIP] x86/kgdb: trampolines for shadowed instructions

From: Daniel Thompson
Date: Wed Aug 14 2024 - 06:29:53 EST


On Wed, Aug 14, 2024 at 10:51:41AM +0200, Florian Rommel wrote:
> Experimental implementation of Thomas' trampoline idea.
> Link: https://lore.kernel.org/all/87wmkkpf5v.ffs@tglx/
>
> Every breakpoint gets a trampoline entry containing the original
> instruction (with a corrected displacement if necessary) and a jump
> back into the function to the logically subsequent instruction.
> With this, KGDB can skip a still present debug trap for a removed
> breakpoint by routing the control flow through the trampoline.
>
> In this experimental implementation, the actual removal of the debug
> trap instructions is completely disabled. So, all trap instructions
> planted by KGDB currently remain in the code, and KGDB will skip all
> these breakpoints through the trampolines when they are in the removed
> state. There is not yet a dedicated breakpoint state for the
> to-be-removed-but-still-present breakpoints.
>
> Inspect the trampolines via:
> (gdb) x/16i kgdb_trampoline_page
>
> Signed-off-by: Florian Rommel <mail@xxxxxxxxxxxx>
> ---
> I have been experimenting a bit with Thomas' idea of the trampoline
> approach. It seems to work so far but of course has a lot of rough
> edges.

Interesting. Perhaps a dumb question but is this trampoline code doing
the same thing as the out-of-line single-step code in kprobes?


> I am stuck for now, as I am not sure about the best way to implement
> the safe context where KGDB finally removes the int3 instructions.

I think this would actually fit really nicely into the debug core code.

Firstly dbg_deactivate_sw_breakpoints() should strictly maintain
BP_ACTIVE and BP_SET states (e.g. if the kgdb_arch_remove_breakpoint()
fails then do not transition from BP_ACTIVE and BP_SET). There would be
a little bit of extra logic to clean things up here (each equality check
on BP_SET needs to be reviewed) but the resulting state tracking is more
correct.

Once we've done that we can add a new state BP_REMOVE_PENDING and
arrange for this to be set by dbg_remove_sw_break() if the breakpoint is
BP_ACTIVE. At this stage we can also arrange for
dbg_deactivate_sw_breakpoints() to handle BP_REMOVE_PENDING to
BP_REMOVE transitions by calling kgdb_arch_remove_breakpoint().

That's enough to eventuallyremove the int3 instructions but it relies
on entering the debug trap handler and there's no limit on how long
could take before that happens. For that reason I think the core should
also attempt to transition BP_REMOVE_PENDING breakpoints to BP_REMOVE
after kgdb_skipexception() returns true. That means if we keep trapping
on a disabled breakpoint eventually we will hit a window where the
text_mutex is free and clean things up.


Daniel.