Re: [PATCH] kgdb: Avoid suspicious RCU usage warning

From: Daniel Thompson
Date: Tue May 19 2020 - 06:41:58 EST


On Thu, May 07, 2020 at 03:53:58PM -0700, Douglas Anderson wrote:
> At times when I'm using kgdb I see a splat on my console about
> suspicious RCU usage. I managed to come up with a case that could
> reproduce this that looked like this:
>
> WARNING: suspicious RCU usage
> 5.7.0-rc4+ #609 Not tainted
> -----------------------------
> kernel/pid.c:395 find_task_by_pid_ns() needs rcu_read_lock() protection!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 2, debug_locks = 1
> 3 locks held by swapper/0/1:
> #0: ffffff81b6b8e988 (&dev->mutex){....}-{3:3}, at: __device_attach+0x40/0x13c
> #1: ffffffd01109e9e8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x20c/0x7ac
> #2: ffffffd01109ea90 (dbg_slave_lock){....}-{2:2}, at: kgdb_cpu_enter+0x3ec/0x7ac
>
> stack backtrace:
> CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #609
> Hardware name: Google Cheza (rev3+) (DT)
> Call trace:
> dump_backtrace+0x0/0x1b8
> show_stack+0x1c/0x24
> dump_stack+0xd4/0x134
> lockdep_rcu_suspicious+0xf0/0x100
> find_task_by_pid_ns+0x5c/0x80
> getthread+0x8c/0xb0
> gdb_serial_stub+0x9d4/0xd04
> kgdb_cpu_enter+0x284/0x7ac
> kgdb_handle_exception+0x174/0x20c
> kgdb_brk_fn+0x24/0x30
> call_break_hook+0x6c/0x7c
> brk_handler+0x20/0x5c
> do_debug_exception+0x1c8/0x22c
> el1_sync_handler+0x3c/0xe4
> el1_sync+0x7c/0x100
> rpmh_rsc_probe+0x38/0x420
> platform_drv_probe+0x94/0xb4
> really_probe+0x134/0x300
> driver_probe_device+0x68/0x100
> __device_attach_driver+0x90/0xa8
> bus_for_each_drv+0x84/0xcc
> __device_attach+0xb4/0x13c
> device_initial_probe+0x18/0x20
> bus_probe_device+0x38/0x98
> device_add+0x38c/0x420
>
> If I understand properly we should just be able to blanket kgdb under
> one big RCU read lock and the problem should go away. We'll add it to
> the beast-of-a-function known as kgdb_cpu_enter().
>
> With this I no longer get any splats and things seem to work fine.
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>

In principle this looks OK but I'm curious why we don't cuddle these
calls up to the local interrupt locking (and also whether we want to
keep hold of the lock during stepping). If nothing else that would make
review easier.


Daniel.


> ---
>
> kernel/debug/debug_core.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 2b7c9b67931d..5155cf06731b 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -564,6 +564,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
> int online_cpus = num_online_cpus();
> u64 time_left;
>
> + rcu_read_lock();
> +
> kgdb_info[ks->cpu].enter_kgdb++;
> kgdb_info[ks->cpu].exception_state |= exception_state;
>
> @@ -635,6 +637,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
> atomic_dec(&slaves_in_kgdb);
> dbg_touch_watchdogs();
> local_irq_restore(flags);
> + rcu_read_unlock();
> return 0;
> }
> cpu_relax();
> @@ -773,6 +776,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
> dbg_touch_watchdogs();
> local_irq_restore(flags);
>
> + rcu_read_unlock();
> +
> return kgdb_info[cpu].ret_state;
> }
>
> --
> 2.26.2.645.ge9eca65c58-goog
>