Re: [PATCH v3 0/2] arm64: Fix pending single-step debugging issues

From: Sumit Garg
Date: Mon Jul 11 2022 - 08:44:48 EST


Hi Doug,

On Sat, 2 Jul 2022 at 03:44, Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Tue, May 10, 2022 at 11:05 PM Sumit Garg <sumit.garg@xxxxxxxxxx> wrote:
> >
> > This patch-set reworks pending fixes from Wei's series [1] to make
> > single-step debugging via kgdb/kdb on arm64 work as expected. There was
> > a prior discussion on ML [2] regarding if we should keep the interrupts
> > enabled during single-stepping. So patch #1 follows suggestion from Will
> > [3] to not disable interrupts during single stepping but rather skip
> > single stepping within interrupt handler.
> >
> > [1] https://lore.kernel.org/all/20200509214159.19680-1-liwei391@xxxxxxxxxx/
> > [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@xxxxxxxxxxxxxx/
> > [3] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
> >
> > Changes in v3:
> > - Reword commit descriptions as per Daniel's suggestions.
> >
> > Changes in v2:
> > - Replace patch #1 to rather follow Will's suggestion.
> >
> > Sumit Garg (2):
> > arm64: entry: Skip single stepping into interrupt handlers
> > arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
> >
> > arch/arm64/include/asm/debug-monitors.h | 1 +
> > arch/arm64/kernel/debug-monitors.c | 5 +++++
> > arch/arm64/kernel/entry-common.c | 18 +++++++++++++++++-
> > arch/arm64/kernel/kgdb.c | 2 ++
> > 4 files changed, 25 insertions(+), 1 deletion(-)
>
> Sorry it took so long for me to respond. I kept dreaming that I'd find
> the time to really dig deep into this to understand it fully and I'm
> finally giving up on it.

No worries and apologies on my part as well as I had to find some time
to reproduce the issue that you have reported below.

> I'm going to hope that Will and/or Catalin
> knows this area of the code well and can give it a good review. If not
> then I'll strive harder to make the time...
>
> In any case, I poked around with this a bunch and it definitely
> improved the stepping behavior a whole lot. I still got one case where
> gdb hit an assertion while I was stepping, but I could believe that
> was a problem with gdb? I couldn't reproduce it. Thus I can at least
> give:
>
> Tested-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
>

Thanks for the testing.

> I'll also note that I _think_ I remember that with Wei's series that
> the gdb function "call" started working. I tried that here and it
> didn't seem so happy. To keep things simple, I created a dummy
> function in my kernel that looked like:
>
> void doug_test(void)
> {
> pr_info("testing, 1 2 3\n");
> }
>
> I broke into the debugger by echoing "g" to /proc/sysrq-trigger and
> then tried "call doug_test()". I guess my printout actually printed
> but it wasn't so happy after that. Seems like it somehow ended up
> returning to a bogus address after the call which then caused a crash.
>

I am able to reproduce this issue on my setup as well. But it doesn't
seem to be a regression caused by this patch-set over Wei's series. As
I could reproduce this issue with v1 [1] patch-set as well which was
just a forward port of pending patches from Wei's series to the latest
upstream.

Maybe it's a different regression caused by other changes? BTW, do you
remember the kernel version you tested with Wei's series applied?

[1] https://lore.kernel.org/linux-arm-kernel/20220411093819.1012583-1-sumit.garg@xxxxxxxxxx/T/

-Sumit

> testing, 1 2 3
> BUG: sleeping function called from invalid context at
> arch/arm64/mm/fault.c:593
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3393, name: bash
> preempt_count: 0, expected: 0
> RCU nest depth: 1, expected: 0
> CPU: 6 PID: 3393 Comm: bash Not tainted 5.19.0-rc4+ #3
> dbec0bdb8582e447bccdcf2e70d7fe04477b1aac
> Hardware name: Google Herobrine (rev1+) (DT)
> Call trace:
> dump_backtrace+0xf0/0x110
> show_stack+0x24/0x70
> dump_stack_lvl+0x64/0x7c
> dump_stack+0x18/0x38
> __might_resched+0x144/0x154
> __might_sleep+0x54/0x84
> do_page_fault+0x1d4/0x42c
> do_mem_abort+0x4c/0xb0
> el1_abort+0x3c/0x5c
> el1h_64_sync_handler+0x4c/0xc4
> el1h_64_sync+0x64/0x68
> 0xffffffc008000000
> __handle_sysrq+0x15c/0x184
> write_sysrq_trigger+0x94/0x128
> proc_reg_write+0xbc/0xec
> vfs_write+0xf0/0x2c8
> ksys_write+0x84/0xf0
> __arm64_sys_write+0x28/0x34
> invoke_syscall+0x4c/0x120
> el0_svc_common+0x94/0xfc
> do_el0_svc+0x38/0xc0
> el0_svc+0x2c/0x7c
> el0t_64_sync_handler+0x48/0x114
> el0t_64_sync+0x18c/0x190
> Unable to handle kernel execute from non-executable memory at
> virtual address ffffffc008000000
> Mem abort info:
> ESR = 0x000000008600000f
> EC = 0x21: IABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x0f: level 3 permission fault
> swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000082863000
> [ffffffc008000000] pgd=100000027ffff003, p4d=100000027ffff003,
> pud=100000027ffff003, pmd=100000027fffe003, pte=00680001001c3703
> Internal error: Oops: 8600000f [#1] PREEMPT SMP
>
> I'm not sure if that's a sign that something is missing with your patch or not.
>
> -Doug