Re: [PATCH v4 01/10] kcsan: Add Kernel Concurrency Sanitizer infrastructure

From: Paul E. McKenney
Date: Tue Jan 14 2020 - 14:22:36 EST


On Tue, Jan 14, 2020 at 06:08:29AM -0500, Qian Cai wrote:
>
>
> > On Jan 6, 2020, at 7:47 AM, Marco Elver <elver@xxxxxxxxxx> wrote:
> >
> > Thanks, I'll look into KCSAN + lockdep compatibility. It's probably
> > missing some KCSAN_SANITIZE := n in some Makefile.
>
> Can I have a update on fixing this? It looks like more of a problem that kcsan_setup_watchpoint() will disable IRQs and then dive into the page allocator where it would complain because it might sleep.
>
> BTW, I saw Paul sent a pull request for 5.6 but it is ugly to have everybody could trigger a deadlock (sleep function called in atomic context) like this during boot once this hits the mainline not to mention about only recently it is possible to test this feature (thanks to warning ratelimit) with the existing debugging options because it was unable to boot due to the brokenness with debug_pagealloc as mentioned in this thread, so this does sounds like it needs more soak time for the mainline to me.

Just so I understand... Does this problem happen even in CONFIG_KCSAN=n
kernels?

I have been running extensive CONFIG_KSCAN=y rcutorture tests for quite
awhile now, so even if this only happens for CONFIG_KSCAN=y, it is not
like it affects everyone.

Yes, it should be fixed, and Marco does have a patch on the way.

Thanx, Paul

> 0000000000000400
> [ 13.416814][ T1] Call Trace:
> [ 13.416814][ T1] lock_is_held_type+0x66/0x160
> [ 13.416814][ T1] ___might_sleep+0xc1/0x1d0
> [ 13.416814][ T1] __might_sleep+0x5b/0xa0
> [ 13.416814][ T1] slab_pre_alloc_hook+0x7b/0xa0
> [ 13.416814][ T1] __kmalloc_node+0x60/0x300
> [ 13.416814 T1] ? alloc_cpumask_var_node+0x44/0x70
> [ 13.416814][ T1] ? topology_phys_to_logical_die+0x7e/0x180
> [ 13.416814][ T1] alloc_cpumask_var_node+0x44/0x70
> [ 13.416814][ T1] zalloc_cpumask_var+0x2a/0x40
> [ 13.416814][ T1] native_smp_prepare_cpus+0x246/0x425
> [ 13.416814][ T1] kernel_init_freeable+0x1b8/0x496
> [ 13.416814][ T1] ? rest_init+0x381/0x381
> [ 13.416814][ T1] kernel_init+0x18/0x17f
> [ 13.416814][ T1] ? rest_init+0x381/0x381
> [ 13.416814][ T1] ret_from_fork+0x3a/0x50
> [ 13.416814][ T1] irq event stamp: 910
> [ 13.416814][ T1] hardirqs last enabled at (909): [<ffffffff8d1240f3>] _raw_write_unlock_irqrestore+0x53/0x57
> [ 13.416814][ T1] hardirqs last disabled at (910): [<ffffffff8c8bba76>] kcsan_setup_watchpoint+0x96/0x460
> [ 13.416814][ T1] softirqs last enabled at (0): [<ffffffff8c6b697a>] copy_process+0x11fa/0x34f0
> [ 13.416814][ T1] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 13.416814][ T1] ---[ end trace 7d1df66da055aa92 ]---
> [ 13.416814][ T1] possible reason: unannotated irqs-on.
> [ 13.416814][ent stamp: 910
> [ 13.416814][ T1] hardirqs last enabled at (909): [<ffffffff8d1240f3>] _raw_write_unlock_irqrestore+0x53/0x57
> [ 13.416814][ T1] hardirqs last disabled at (910): [<ffffffff8c8bba76>] kcsan_setup_watchpoint+0x96/0x460
> [ 13.416814][ T1] softirqs last enabled at (0): [<ffffffff8c6b697a>] copy_process+0x11fa/0x34f0
> [ 13.416814][ T1] softirqs last disabled at (0): [<0000000000000000>] 0x0