On Apr 27, 2020, at 7:56 PM, Waiman Long <longman@xxxxxxxxxx> wrote:Can you show the full splat?
It turns out that switching from slab_mutex to memcg_cache_ids_sem in
slab_attr_store() does not completely eliminate circular locking dependency
as shown by the following lockdep splat when the system is shut down:
[ 2095.079697] Chain exists of:
[ 2095.079697] kn->count#278 --> memcg_cache_ids_sem --> slab_mutex
[ 2095.079697]
[ 2095.090278] Possible unsafe locking scenario:
[ 2095.090278]
[ 2095.096227] CPU0 CPU1
[ 2095.100779] ---- ----
[ 2095.105331] lock(slab_mutex);
[ 2095.108486] lock(memcg_cache_ids_sem);
[ 2095.114961] lock(slab_mutex);
[ 2095.120649] lock(kn->count#278);
[ 2095.124068]
[ 2095.124068] *** DEADLOCK ***
To eliminate this possibility, we have to use trylock to acquireThe code looks a bit hacky and probably not that robust. Since it is the shutdown path which is not all that important without lockdep, maybe you could drop this single patch for now until there is a better solution?
memcg_cache_ids_sem. Unlikely slab_mutex which can be acquired in
many places, the memcg_cache_ids_sem write lock is only acquired
in memcg_alloc_cache_id() to double the size of memcg_nr_cache_ids.
So the chance of successive calls to memcg_alloc_cache_id() within
a short time is pretty low. As a result, we can retry the read lock
acquisition a few times if the first attempt fails.
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>