Re: [PATCH] random32: Use rcuidle variant for tracepoint

From: Marco Elver
Date: Fri Aug 21 2020 - 11:36:13 EST


On Fri, Aug 21, 2020 at 08:06AM -0700, Eric Dumazet wrote:
> On Fri, Aug 21, 2020 at 1:59 AM <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Aug 21, 2020 at 08:30:43AM +0200, Marco Elver wrote:
> > > With KCSAN enabled, prandom_u32() may be called from any context,
> > > including idle CPUs.
> > >
> > > Therefore, switch to using trace_prandom_u32_rcuidle(), to avoid various
> > > issues due to recursion and lockdep warnings when KCSAN and tracing is
> > > enabled.
> >
> > At some point we're going to have to introduce noinstr to idle as well.
> > But until that time this should indeed cure things.
>
> I do not understand what the issue is. This _rcuidle() is kind of opaque ;)
>
> Would this alternative patch work, or is it something more fundamental ?

There are 2 problems:

1. Recursion due to ending up in lockdep from the tracepoint. I need to
solve this either way. One way is to use _rcuidle() variant, which
doesn't call into lockdep.

2. Somehow running into trouble because we use tracing from an idle CPU.
At least that's what I gathered from the documentation -- but you'd have
to wait for Peter or Steven to get a better explanation.

> Thanks !
>
> diff --git a/lib/random32.c b/lib/random32.c
> index 932345323af092a93fc2690b0ebbf4f7485ae4f3..17af2d1631e5ab6e02ad1e9288af7e007bed6d5f
> 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -83,9 +83,10 @@ u32 prandom_u32(void)
> u32 res;
>
> res = prandom_u32_state(state);
> - trace_prandom_u32(res);
> put_cpu_var(net_rand_state);
>
> + trace_prandom_u32(res);
> +
> return res;
> }
> EXPORT_SYMBOL(prandom_u32);

That unfortunately still gets me the same warning:

| ------------[ cut here ]------------
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 4 PID: 1861 at kernel/locking/lockdep.c:4875 check_flags.part.0+0x157/0x160 kernel/locking/lockdep.c:4875
| Modules linked in:
| CPU: 4 PID: 1861 Comm: kworker/u16:4 Not tainted 5.9.0-rc1+ #24
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
| RIP: 0010:check_flags.part.0+0x157/0x160 kernel/locking/lockdep.c:4875
| Code: c0 0f 84 70 5d 00 00 44 8b 0d fd 11 5f 06 45 85 c9 0f 85 60 5d 00 00 48 c7 c6 3e d0 f4 86 48 c7 c7 b2 49 f3 86 e8 8d 49 f6 ff <0f> 0b e9 46 5d 00 00 66 90 41 57 41 56 49 89 fe 41 55 41 89 d5 41
| RSP: 0000:ffffc900034bfcb0 EFLAGS: 00010082
| RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8136161c
| RDX: ffff88881a9dcb00 RSI: ffffffff81363835 RDI: 0000000000000006
| RBP: ffffc900034bfd00 R08: 0000000000000000 R09: 0000ffffffffffff
| R10: 0000000000000104 R11: 0000ffff874efd6b R12: ffffffff874f26c0
| R13: 0000000000000244 R14: 0000000000000000 R15: 0000000000000046
| FS: 0000000000000000(0000) GS:ffff88881fc00000(0000) knlGS:0000000000000000
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 0000000000000000 CR3: 0000000007489001 CR4: 0000000000770ee0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
| PKRU: 55555554
| Call Trace:
| check_flags kernel/locking/lockdep.c:4871 [inline]
| lock_is_held_type+0x42/0x100 kernel/locking/lockdep.c:5042
| lock_is_held include/linux/lockdep.h:267 [inline]
| rcu_read_lock_sched_held+0x41/0x80 kernel/rcu/update.c:136
| trace_prandom_u32 include/trace/events/random.h:310 [inline]
| prandom_u32+0x1bb/0x200 lib/random32.c:86
| prandom_u32_max include/linux/prandom.h:46 [inline]
| reset_kcsan_skip kernel/kcsan/core.c:277 [inline]
| kcsan_setup_watchpoint+0x9b/0x600 kernel/kcsan/core.c:424
| perf_lock_task_context+0x5e3/0x6e0 kernel/events/core.c:1491
| perf_pin_task_context kernel/events/core.c:1506 [inline]
| perf_event_exit_task_context kernel/events/core.c:12284 [inline]
| perf_event_exit_task+0x1e2/0x910 kernel/events/core.c:12364
| do_exit+0x70e/0x18b0 kernel/exit.c:815
| call_usermodehelper_exec_async+0x2e2/0x2f0 kernel/umh.c:114
| ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
| irq event stamp: 107
| hardirqs last enabled at (107): [<ffffffff815532ab>] perf_lock_task_context+0x5db/0x6e0 kernel/events/core.c:1491
| hardirqs last disabled at (106): [<ffffffff81552f12>] perf_lock_task_context+0x242/0x6e0 kernel/events/core.c:1459
| softirqs last enabled at (0): [<ffffffff8129b95e>] copy_process+0xe9e/0x3970 kernel/fork.c:2004
| softirqs last disabled at (0): [<0000000000000000>] 0x0
| ---[ end trace a3058d9b157af5c4 ]---
| possible reason: unannotated irqs-off.
| irq event stamp: 107
| hardirqs last enabled at (107): [<ffffffff815532ab>] perf_lock_task_context+0x5db/0x6e0 kernel/events/core.c:1491
| hardirqs last disabled at (106): [<ffffffff81552f12>] perf_lock_task_context+0x242/0x6e0 kernel/events/core.c:1459
| softirqs last enabled at (0): [<ffffffff8129b95e>] copy_process+0xe9e/0x3970 kernel/fork.c:2004
| softirqs last disabled at (0): [<0000000000000000>] 0x0

I also have a patch which avoids the problem entirely by not using
prandom_u32(): https://lkml.kernel.org/r/20200821123126.3121494-1-elver@xxxxxxxxxx
But that patch will likely only make it into the next merge window
(because of other conflicts).

So, if the _rcuidle() variant here doesn't break your usecase, there
should be no harm in using the _rcuidle() variant. This also lifts the
restriction on where prandom_u32() is usable to what it was before,
which should be any context.

Steven, Peter: What's the downside to of _rcuidle()?

Thanks,
-- Marco