Re: [PATCH] mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()

From: Shakeel Butt
Date: Wed Jul 28 2021 - 10:10:44 EST


+Roman

On Wed, Jul 28, 2021 at 6:23 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Wed 28-07-21 17:13:48, Wang Hai wrote:
> > When I use kfree_rcu() to free a large memory allocated by
> > kmalloc_node(), the following dump occurs.
> >
> > BUG: kernel NULL pointer dereference, address: 0000000000000020
> > [...]
> > Oops: 0000 [#1] SMP
> > [...]
> > Workqueue: events kfree_rcu_work
> > RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
> > RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
> > RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
> > [...]
> > Call Trace:
> > kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
> > kfree_bulk include/linux/slab.h:413 [inline]
> > kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
> > process_one_work+0x207/0x530 kernel/workqueue.c:2276
> > worker_thread+0x320/0x610 kernel/workqueue.c:2422
> > kthread+0x13d/0x160 kernel/kthread.c:313
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> >
> > When kmalloc_node() a large memory, page is allocated, not slab,
> > so when freeing memory via kfree_rcu(), this large memory should not
> > be used by memcg_slab_free_hook(), because memcg_slab_free_hook() is
> > is used for slab.
> >
> > So in this case, there is no need to do anything with this large
> > page in memcg_slab_free_hook(), just skip it.
> >
> > Fixes: 270c6a71460e ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
>
> Are you sure that this commit is really breaking the code. Unless I have
> missed something there shouldn't be any real change wrt. large
> allocations here. page_has_obj_cgroups is just a different name for what
> what page_objcgs is giving us.

Actually they are different. For MEMCG_DATA_KMEM page,
page_has_obj_cgroups() will return false while page_objcgs() on
non-VM_DEBUG kernels will return "struct obj_cgroup *" instead of
"struct obj_cgroup **".

>
> I haven't studied the kfree_rcu part but isn't the problem its use of
> kmem_cache_free_bulk or isn't the problem right there in the bulk free?
>

SLUB's kmem_cache_free_bulk() is doing two passes. In first pass
uncharges all the objects and in the second pass frees them to the
slub infra. There is nothing wrong with that. It is just that SLUB
mixes page (for higher order) and slab allocations and requires
special handling.

> > Signed-off-by: Wang Hai <wanghai38@xxxxxxxxxx>
> > ---
> > mm/slab.h | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 67e06637ff2e..247d3f9c21f7 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -339,15 +339,20 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
> > continue;
> >
> > page = virt_to_head_page(p[i]);
> > + if (!s_orig) {
> > + if (unlikely(!PageSlab(page))) {
> > + BUG_ON(!PageCompound(page));
>
> BUG_ON is not really a good idea here. Why should we crash the kernel
> just because of an unexpected page showing up. Leaking it would be more
> appropriate (the same would apply to kfree btw). I would just warn
> here.

The simplest fix would be to not call memcg_slab_free_hook() in
kmem_cache_free_bulk() because slab_free() will call
memcg_slab_free_hook() for individual objects. Not sure if there will
be any performance impact.

> Also don't we need any hookd here. Looking at kfree path it does
> call kfree_hook. Why is that not needed here?

These are called later in build_detached_freelist().