Re: [PATCH 0/2] arm64: kgdb/kdb: Fix pending single-step debugging issues

From: Doug Anderson
Date: Mon Apr 11 2022 - 20:09:36 EST


Hi,

On Mon, Apr 11, 2022 at 2:38 AM Sumit Garg <sumit.garg@xxxxxxxxxx> wrote:
>
> This patch-set reworks pending fixes from Wei's series [1] to make
> single-step debugging via kgdb/kdb on arm64 work as expected. There was
> a prior discussion on ML [2] regarding if we should keep the interrupts
> enabled during single-stepping but it turns out that in case of kgdb, it
> is risky to enable interrupts as sometimes a resume after single
> stepping an interrupt handler leads to following unbalanced locking
> issue:
>
> [ 300.328300] WARNING: bad unlock balance detected!
> [ 300.328608] 5.18.0-rc1-00016-g3e732ebf7316-dirty #6 Not tainted
> [ 300.329058] -------------------------------------
> [ 300.329298] sh/173 is trying to release lock (dbg_slave_lock) at:
> [ 300.329718] [<ffffd57c951c016c>] kgdb_cpu_enter+0x7ac/0x820
> [ 300.330029] but there are no more locks to release!
> [ 300.330265]
> [ 300.330265] other info that might help us debug this:
> [ 300.330668] 4 locks held by sh/173:
> [ 300.330891] #0: ffff4f5e454d8438 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x98/0x204
> [ 300.331735] #1: ffffd57c973bc2f0 (dbg_slave_lock){+.+.}-{2:2}, at: kgdb_cpu_enter+0x5b4/0x820
> [ 300.332259] #2: ffffd57c973a9460 (rcu_read_lock){....}-{1:2}, at: kgdb_cpu_enter+0xe0/0x820
> [ 300.332717] #3: ffffd57c973bc2a8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x1ec/0x820
>
> So, I choose to keep interrupts disabled specifically for kgdb. This
> series has been rebased to Linux 5.18-rc1 and I have dropped Doug's
> review and test tags as there is significant rework involved.

Hmmmm. I guess it's really up to Will here, but re-reading his
previous email made it pretty clear that he wasn't willing to land a
solution that he wasn't willing to land a solution that left
interrupts disabled during step. He also pointed out some things that
would actually be broken, like single-stepping over a call to
irqs_disabled() or single stepping over something that caused an
exception where the exception handler needed interrupts enabled.

I thought he had a proposal at:

https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck

...that was supposed to make all the problems go away and it was just
that nobody had time to implement his proposal?


> [1] https://lore.kernel.org/all/20200509214159.19680-1-liwei391@xxxxxxxxxx/
> [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@xxxxxxxxxxxxxx/
>
> Sumit Garg (2):
> arm64: kgdb: Fix incorrect single stepping into the irq handler
> arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
>
> arch/arm64/include/asm/debug-monitors.h | 1 +
> arch/arm64/kernel/debug-monitors.c | 5 ++++
> arch/arm64/kernel/kgdb.c | 35 +++++++++++++++++++++++--
> 3 files changed, 39 insertions(+), 2 deletions(-)
>
> --
> 2.25.1
>