Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()

From: Qian Cai
Date: Wed Sep 18 2019 - 16:00:04 EST


On Tue, 2019-09-17 at 09:16 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-09-16 17:31:34 [-0400], Qian Cai wrote:
> â
> > get_random_u64() is also busted.
>
> â
> > [ÂÂ753.486588]ÂÂPossible unsafe locking scenario:
> >
> > [ÂÂ753.493890]ÂÂÂÂÂÂÂÂCPU0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂCPU1
> > [ÂÂ753.499108]ÂÂÂÂÂÂÂÂ----ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ----
> > [ÂÂ753.504324]ÂÂÂlock(batched_entropy_u64.lock);
> > [ÂÂ753.509372]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂlock(&(&zone->lock)->rlock);
> > [ÂÂ753.516675]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂlock(batched_entropy_u64.lock);
> > [ÂÂ753.524238]ÂÂÂlock(random_write_wait.lock);
> > [ÂÂ753.529113]Â
> > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ*** DEADLOCK ***
>
> This is the same scenario as the previous one in regard to the
> batched_entropy_â.lock.

The commit 383776fa7527 ("locking/lockdep: Handle statically initialized percpu
locks properly") which increased the chance of false positives for percpu locks
significantly especially for large systems like in those examples since it makes
all of them the same class. Once there happens a false positive, lockdep will
become useless.

In reality, each percpu lock is a different lock as we have seen in those
examples where each CPU only take a local one. The only thing that should worry
about is the path that another CPU could take a non-local percpu lock. For
example, invalidate_batched_entropy() which is a for_each_possible_cpu() call.
Is there any other place that another CPU could take a non-local percpu lock but
not a for_each_possible_cpu() or similar iterator?

Even before the above commit, if the system is running long enough, it could
still catch a deadlock from those percpu lock iterators since it will register
each percpu lock usage in lockdep

Overall, it sounds to me the side-effects of commit 383776fa7527 outweight the
benefits, and should be reverted. Do I miss anything?