Re: [PATCH 3/3] arm64: debug: Remove rcu_read_lock from debug exception

From: James Morse
Date: Fri Jul 19 2019 - 04:42:21 EST


Hi,

On 7/18/19 3:31 PM, Masami Hiramatsu wrote:
On Thu, 18 Jul 2019 10:20:23 +0100
Mark Rutland <mark.rutland@xxxxxxx> wrote:

On Wed, Jul 17, 2019 at 11:22:15PM -0700, Paul E. McKenney wrote:
On Thu, Jul 18, 2019 at 02:43:58PM +0900, Masami Hiramatsu wrote:
Remove rcu_read_lock()/rcu_read_unlock() from debug exception
handlers since the software breakpoint can be hit on idle task.

Why precisely do we need to elide these? Are we seeing warnings today?

Yes, unfortunately, or fortunately. Naresh reported that warns when
ftracetest ran. I confirmed that happens if I probe on default_idle_call too.

/sys/kernel/debug/tracing # echo p default_idle_call >> kprobe_events
/sys/kernel/debug/tracing # echo 1 > events/kprobes/enable
/sys/kernel/debug/tracing # [ 135.122237]
[ 135.125035] =============================
[ 135.125310] WARNING: suspicious RCU usage

[ 135.132224] Call trace:
[ 135.132491] dump_backtrace+0x0/0x140
[ 135.132806] show_stack+0x24/0x30
[ 135.133133] dump_stack+0xc4/0x10c
[ 135.133726] lockdep_rcu_suspicious+0xf8/0x108
[ 135.134171] call_break_hook+0x170/0x178
[ 135.134486] brk_handler+0x28/0x68
[ 135.134792] do_debug_exception+0x90/0x150
[ 135.135051] el1_dbg+0x18/0x8c
[ 135.135260] default_idle_call+0x0/0x44
[ 135.135516] cpu_startup_entry+0x2c/0x30
[ 135.135815] rest_init+0x1b0/0x280
[ 135.136044] arch_call_rest_init+0x14/0x1c
[ 135.136305] start_kernel+0x4d4/0x500

The exception entry and exit use irq_enter() and irq_exit(), in this
case, correct? Otherwise RCU will be ignoring this CPU.

This is missing today, which sounds like the underlying bug.

Agreed. I'm not so familier with how debug exception is handled on arm64,
would it be a kind of NMI or IRQ?

Debug exceptions can interrupt both SError (think: machine check) and pseudo-NMI, which both in turn interrupt interrupt-masked code. So they are a kind of NMI. But, be careful not to call 'nmi_enter()' twice, see do_serror() for how we work around this...


Anyway, it seems that normal irqs are also not calling irq_enter/exit
except for arch/arm64/kernel/smp.c
drivers/irqchip/irq-gic.c:gic_handle_irq() either calls handle_domain_irq() or handle_IPI(). The enter/exit calls live in those functions.


Thanks,

James