Re: [PATCH 0/2] arm64: kgdb/kdb: Fix pending single-step debugging issues
From: Sumit Garg
Date: Wed Apr 13 2022 - 03:03:47 EST
Hi Doug,
Thanks for looking into this patch-set.
On Tue, 12 Apr 2022 at 05:39, Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Mon, Apr 11, 2022 at 2:38 AM Sumit Garg <sumit.garg@xxxxxxxxxx> wrote:
> >
> > This patch-set reworks pending fixes from Wei's series [1] to make
> > single-step debugging via kgdb/kdb on arm64 work as expected. There was
> > a prior discussion on ML [2] regarding if we should keep the interrupts
> > enabled during single-stepping but it turns out that in case of kgdb, it
> > is risky to enable interrupts as sometimes a resume after single
> > stepping an interrupt handler leads to following unbalanced locking
> > issue:
> >
> > [ 300.328300] WARNING: bad unlock balance detected!
> > [ 300.328608] 5.18.0-rc1-00016-g3e732ebf7316-dirty #6 Not tainted
> > [ 300.329058] -------------------------------------
> > [ 300.329298] sh/173 is trying to release lock (dbg_slave_lock) at:
> > [ 300.329718] [<ffffd57c951c016c>] kgdb_cpu_enter+0x7ac/0x820
> > [ 300.330029] but there are no more locks to release!
> > [ 300.330265]
> > [ 300.330265] other info that might help us debug this:
> > [ 300.330668] 4 locks held by sh/173:
> > [ 300.330891] #0: ffff4f5e454d8438 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x98/0x204
> > [ 300.331735] #1: ffffd57c973bc2f0 (dbg_slave_lock){+.+.}-{2:2}, at: kgdb_cpu_enter+0x5b4/0x820
> > [ 300.332259] #2: ffffd57c973a9460 (rcu_read_lock){....}-{1:2}, at: kgdb_cpu_enter+0xe0/0x820
> > [ 300.332717] #3: ffffd57c973bc2a8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x1ec/0x820
> >
> > So, I choose to keep interrupts disabled specifically for kgdb. This
> > series has been rebased to Linux 5.18-rc1 and I have dropped Doug's
> > review and test tags as there is significant rework involved.
>
> Hmmmm. I guess it's really up to Will here, but re-reading his
> previous email made it pretty clear that he wasn't willing to land a
> solution that he wasn't willing to land a solution that left
> interrupts disabled during step. He also pointed out some things that
> would actually be broken, like single-stepping over a call to
> irqs_disabled() or single stepping over something that caused an
> exception where the exception handler needed interrupts enabled.
>
> I thought he had a proposal at:
>
> https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
>
> ...that was supposed to make all the problems go away and it was just
> that nobody had time to implement his proposal?
>
So I took a shot at Will's proposal as a replacement of patch #1 in v2
[1]. I hope that it is aligned with Will's thinking.
[1] https://lkml.org/lkml/2022/4/13/136
-Sumit
>
> > [1] https://lore.kernel.org/all/20200509214159.19680-1-liwei391@xxxxxxxxxx/
> > [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@xxxxxxxxxxxxxx/
> >
> > Sumit Garg (2):
> > arm64: kgdb: Fix incorrect single stepping into the irq handler
> > arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
> >
> > arch/arm64/include/asm/debug-monitors.h | 1 +
> > arch/arm64/kernel/debug-monitors.c | 5 ++++
> > arch/arm64/kernel/kgdb.c | 35 +++++++++++++++++++++++--
> > 3 files changed, 39 insertions(+), 2 deletions(-)
> >
> > --
> > 2.25.1
> >