Re: [PATCH RESEND -mm 01/12] memcg: flush cache creation works before memcg cache destruction

From: Vladimir Davydov
Date: Tue Mar 18 2014 - 04:14:53 EST


On 03/17/2014 08:07 PM, Michal Hocko wrote:
> On Thu 13-03-14 19:06:39, Vladimir Davydov wrote:
>> When we get to memcg cache destruction, either from the root cache
>> destruction path or when turning memcg offline, there still might be
>> memcg cache creation works pending that was scheduled before we
>> initiated destruction. We need to flush them before starting to destroy
>> memcg caches, otherwise we can get a leaked kmem cache or, even worse,
>> an attempt to use after free.
> How can we use-after-free? Even if there is a pending work item to
> create a new cache then we keep the css reference for the memcg and
> release it from the worker (memcg_create_cache_work_func). So although
> this can race with memcg offlining the memcg itself will be still alive.

There are actually two issues:

1) When we destroy a root cache using kmem_cache_destroy(), we should
ensure all pending memcg creation works for this root cache are over,
otherwise a work could be executed after the root cache is destroyed
resulting in use-after-free.

2) Memcg offline. In this case use-after-free is impossible in a memcg
creation work handler, because, as you mentioned, the work holds the css
reference. However, we still have to synchronize against pending
requests, otherwise a work handler can be executed after we destroyed
the caches corresponding to the memcg being offlined resulting in a
kmem_cache leak.

Thanks.

>
>> Signed-off-by: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>
>> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
>> Cc: Michal Hocko <mhocko@xxxxxxx>
>> Cc: Glauber Costa <glommer@xxxxxxxxx>
>> ---
>> mm/memcontrol.c | 32 +++++++++++++++++++++++++++++++-
>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 9d489a9e7701..b183aaf1b616 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2904,6 +2904,7 @@ static DEFINE_MUTEX(set_limit_mutex);
>>
>> #ifdef CONFIG_MEMCG_KMEM
>> static DEFINE_MUTEX(activate_kmem_mutex);
>> +static struct workqueue_struct *memcg_cache_create_wq;
>>
>> static inline bool memcg_can_account_kmem(struct mem_cgroup *memcg)
>> {
>> @@ -3327,6 +3328,15 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>> int i, failed = 0;
>>
>> /*
>> + * Since the cache is being destroyed, it shouldn't be allocated from
>> + * any more, and therefore no new memcg cache creation works could be
>> + * scheduled. However, there still might be pending works scheduled
>> + * before the cache destruction was initiated. Flush them before
>> + * destroying child caches to avoid nasty races.
>> + */
>> + flush_workqueue(memcg_cache_create_wq);
>> +
>> + /*
>> * If the cache is being destroyed, we trust that there is no one else
>> * requesting objects from it. Even if there are, the sanity checks in
>> * kmem_cache_destroy should caught this ill-case.
>> @@ -3374,6 +3384,15 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
>> if (!memcg_kmem_is_active(memcg))
>> return;
>>
>> + /*
>> + * By the time we get here, the cgroup must be empty. That said no new
>> + * allocations can happen from its caches, and therefore no new memcg
>> + * cache creation works can be scheduled. However, there still might be
>> + * pending works scheduled before the cgroup was turned offline. Flush
>> + * them before destroying memcg caches to avoid nasty races.
>> + */
>> + flush_workqueue(memcg_cache_create_wq);
>> +
>> mutex_lock(&memcg->slab_caches_mutex);
>> list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
>> cachep = memcg_params_to_cache(params);
>> @@ -3418,7 +3437,7 @@ static void __memcg_create_cache_enqueue(struct mem_cgroup *memcg,
>> cw->cachep = cachep;
>>
>> INIT_WORK(&cw->work, memcg_create_cache_work_func);
>> - schedule_work(&cw->work);
>> + queue_work(memcg_cache_create_wq, &cw->work);
>> }
>>
>> static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
>> @@ -3621,10 +3640,20 @@ void __memcg_kmem_uncharge_pages(struct page *page, int order)
>> VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
>> memcg_uncharge_kmem(memcg, PAGE_SIZE << order);
>> }
>> +
>> +static void __init memcg_kmem_init(void)
>> +{
>> + memcg_cache_create_wq = alloc_workqueue("memcg_cache_create", 0, 1);
>> + BUG_ON(!memcg_cache_create_wq);
>> +}
>> #else
>> static inline void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
>> {
>> }
>> +
>> +static void __init memcg_kmem_init(void)
>> +{
>> +}
>> #endif /* CONFIG_MEMCG_KMEM */
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> @@ -7181,6 +7210,7 @@ static int __init mem_cgroup_init(void)
>> enable_swap_cgroup();
>> mem_cgroup_soft_limit_tree_init();
>> memcg_stock_init();
>> + memcg_kmem_init();
>> return 0;
>> }
>> subsys_initcall(mem_cgroup_init);
>> --
>> 1.7.10.4
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/