Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)

From: Steven Rostedt
Date: Fri Nov 20 2020 - 13:58:31 EST


On Fri, 20 Nov 2020 19:17:37 +0100
Marco Elver <elver@xxxxxxxxxx> wrote:

> | # cat /sys/kernel/tracing/recursed_functions
> | trace_selftest_test_recursion_func+0x34/0x48: trace_selftest_dynamic_test_func+0x4/0x28
> | el1_irq+0xc0/0x180: gic_handle_irq+0x4/0x108
> | gic_handle_irq+0x70/0x108: __handle_domain_irq+0x4/0x130
> | __handle_domain_irq+0x7c/0x130: irq_enter+0x4/0x28
> | trace_rcu_dyntick+0x168/0x190: rcu_read_lock_sched_held+0x4/0x98
> | rcu_read_lock_sched_held+0x30/0x98: rcu_read_lock_held_common+0x4/0x88
> | rcu_read_lock_held_common+0x50/0x88: rcu_lockdep_current_cpu_online+0x4/0xd0
> | irq_enter+0x1c/0x28: irq_enter_rcu+0x4/0xa8
> | irq_enter_rcu+0x3c/0xa8: irqtime_account_irq+0x4/0x198
> | irq_enter_rcu+0x44/0xa8: preempt_count_add+0x4/0x1a0
> | trace_hardirqs_off+0x254/0x2d8: __srcu_read_lock+0x4/0xa0
> | trace_hardirqs_off+0x25c/0x2d8: rcu_irq_enter_irqson+0x4/0x78
> | trace_rcu_dyntick+0xd8/0x190: __traceiter_rcu_dyntick+0x4/0x80
> | trace_hardirqs_off+0x294/0x2d8: rcu_irq_exit_irqson+0x4/0x78
> | trace_hardirqs_off+0x2a0/0x2d8: __srcu_read_unlock+0x4/0x88

These look normal. They happen when an interrupt occurs while tracing
something with interrupts enabled, and the interrupt traces a function
before it sets the "preempt_count" to reflect that its in a new context.

That is:

normal_context:
func_A();
trace_function();
<interrupt>
irq_enter();
trace_function()
if (int_interrupt())
[returns false]

set_preempt_count (in interrupt)

And the recursion detection is tricked into thinking it recursed in the
same context. The lastest code handles this by allowing one level of
recursion:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b02414c8f045ab3b9afc816c3735bc98c5c3d262

-- Steve