Re: [BUG] lockdep splat with kernfs lockdep annotations and slab mutex from drm patch??

From: Chris Wilson
Date: Thu Jul 11 2019 - 02:18:12 EST


Quoting Steven Rostedt (2019-07-11 03:57:20)
> On Fri, 14 Jun 2019 08:38:37 -0700
> Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> > Hello,
> >
> > On Fri, Jun 14, 2019 at 04:08:33PM +0100, Chris Wilson wrote:
> > > #ifdef CONFIG_MEMCG
> > > if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
> > > struct kmem_cache *c;
> > >
> > > mutex_lock(&slab_mutex);
> > >
> > > so it happens to hit the error + FULL case with the additional slabcaches?
> > >
> > > Anyway, according to lockdep, it is dangerous to use the slab_mutex inside
> > > slab_attr_store().
> >
> > Didn't really look into the code but it looks like slab_mutex is held
> > while trying to remove sysfs files. sysfs file removal flushes
> > on-going accesses, so if a file operation then tries to grab a mutex
> > which is held during removal, it leads to a deadlock.
> >
>
> Looks like this never got fixed and now this bug is in 5.2.

git blame gives

commit 107dab5c92d5f9c3afe962036e47c207363255c7
Author: Glauber Costa <glommer@xxxxxxxxxxxxx>
Date: Tue Dec 18 14:23:05 2012 -0800

slub: slub-specific propagation changes

for adding the mutex underneath sysfs read, and I think

commit d50d82faa0c964e31f7a946ba8aba7c715ca7ab0
Author: Mikulas Patocka <mpatocka@xxxxxxxxxx>
Date: Wed Jun 27 23:26:09 2018 -0700

slub: fix failure when we delete and create a slab cache

added the sysfs removal underneath the slab_mutex.

> Just got this:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.2.0-test #15 Not tainted
> ------------------------------------------------------
> slub_cpu_partia/899 is trying to acquire lock:
> 000000000f6f2dd7 (slab_mutex){+.+.}, at: slab_attr_store+0x6d/0xe0
>
> but task is already holding lock:
> 00000000b23ffe3d (kn->count#160){++++}, at: kernfs_fop_write+0x125/0x230
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (kn->count#160){++++}:
> __kernfs_remove+0x413/0x4a0
> kernfs_remove_by_name_ns+0x40/0x80
> sysfs_slab_add+0x1b5/0x2f0
> __kmem_cache_create+0x511/0x560
> create_cache+0xcd/0x1f0
> kmem_cache_create_usercopy+0x18a/0x240
> kmem_cache_create+0x12/0x20
> is_active_nid+0xdb/0x230 [snd_hda_codec_generic]
> snd_hda_get_path_idx+0x55/0x80 [snd_hda_codec_generic]
> get_nid_path+0xc/0x170 [snd_hda_codec_generic]
> do_one_initcall+0xa2/0x394
> do_init_module+0xfd/0x370
> load_module+0x38c6/0x3bd0
> __do_sys_finit_module+0x11a/0x1b0
> do_syscall_64+0x68/0x250
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (slab_mutex){+.+.}:
> lock_acquire+0xbd/0x1d0
> __mutex_lock+0xfc/0xb70
> slab_attr_store+0x6d/0xe0
> kernfs_fop_write+0x170/0x230
> vfs_write+0xe1/0x240
> ksys_write+0xba/0x150
> do_syscall_64+0x68/0x250
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(kn->count#160);
> lock(slab_mutex);
> lock(kn->count#160);
> lock(slab_mutex);
>
> *** DEADLOCK ***
>
>
>
> Attached is a config and the full dmesg.
>
> -- Steve
>