Re: [PATCH 3/3] mm/slub: Fix potential deadlock problem in slab_attr_store()

From: Waiman Long
Date: Mon Feb 10 2020 - 17:16:36 EST


On 2/10/20 5:03 PM, Andrew Morton wrote:
> On Mon, 10 Feb 2020 15:46:51 -0500 Waiman Long <longman@xxxxxxxxxx> wrote:
>
>> In order to fix this circular lock dependency problem, we need to do a
>> mutex_trylock(&slab_mutex) in slab_attr_store() for CPU0 above. A simple
>> trylock, however, is easy to fail causing user dissatisfaction. So the
>> new mutex_timed_lock() function is now used to do a trylock with a
>> 100ms timeout.
>>
>> ...
>>
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -5536,7 +5536,12 @@ static ssize_t slab_attr_store(struct kobject *kobj,
>> if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
>> struct kmem_cache *c;
>>
>> - mutex_lock(&slab_mutex);
>> + /*
>> + * Timeout after 100ms
>> + */
>> + if (mutex_timed_lock(&slab_mutex, 100) < 0)
>> + return -EBUSY;
>> +
> Oh dear. Surely there's a better fix here. Does slab really need to
> hold slab_mutex while creating that sysfs file? Why?
>
> If the issue is two threads trying to create the same sysfs file
> (unlikely, given that both will need to have created the same cache)
> then can we add a new mutex specifically for this purpose?
>
> Or something else.
>
Well, the current code iterates all the memory cgroups to set the same
value in all of them. I believe the reason for holding the slab mutex is
to make sure that memcg hierarchy is stable during this iteration
process. Of course, we can argue if the attribute value should be
duplicated in all memcg's.

Cheers,
Longman