Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
From: Uladzislau Rezki
Date: Tue Oct 08 2024 - 09:07:53 EST
On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
> On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> >
> > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@xxxxxxxxx> wrote:
> > > >
> > > > > Ben Greear reports following splat:
> > > > > ------------[ cut here ]------------
> > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > > > ...
> > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > > > > codetag_unload_module+0x19b/0x2a0
> > > > > ? codetag_load_module+0x80/0x80
> > > > >
> > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > > > operation is likely still pending by the time alloc_tag checks for leaks.
> > > > >
> > > > > Wait for outstanding kfree_rcu operations to complete before checking
> > > > > resolves this warning.
> > > > >
> > > > > Reproducer:
> > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > > > rmmod nft_chain_nat
> > > > > rmmod nf_nat # will WARN.
> > > > >
> > > > > ...
> > > > >
> > > > > --- a/lib/codetag.c
> > > > > +++ b/lib/codetag.c
> > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > > > > if (!mod)
> > > > > return true;
> > > > >
> > > > > + kvfree_rcu_barrier();
> > > > > +
> > > > > mutex_lock(&codetag_lock);
> > > > > list_for_each_entry(cttype, &codetag_types, link) {
> > > > > struct codetag_module *found = NULL;
> > > >
> > > > It's always hard to determine why a thing like this is present, so a
> > > > comment is helpful:
> > > >
> > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > > > +++ a/lib/codetag.c
> > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> > > > if (!mod)
> > > > return true;
> > > >
> > > > + /* await any module's kfree_rcu() operations to complete */
> > > > kvfree_rcu_barrier();
> > > >
> > > > mutex_lock(&codetag_lock);
> > > > _
> > > >
> > > > But I do wonder whether this is in the correct place.
> > > >
> > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > > > should properly be done by the core module handling code?
> > >
> > > I don't think core module code cares about kfree_rcu()s being complete
> > > before the module is unloaded.
> > > Allocation tagging OTOH cares because it is about to destroy tags
> > > which will be accessed when kfree() actually happens, therefore a
> > > strict ordering is important.
> > >
> > > >
> > > > free_module() does a full-on synchronize_rcu() prior to freeing the
> > > > module memory itself and I think codetag_unload_module() could be
> > > > called after that?
> > >
> > > I think we could move codetag_unload_module() after synchronize_rcu()
> > > inside free_module() but according to the reply in
> > > https://lore.kernel.org/all/20241007112904.GA27104@xxxxxxxxxxxxx/
> > > synchronize_rcu() does not help. I'm not quite sure why...
> > >
> > It is because, synchronize_rcu() is used for a bit different things,
> > i.e. it is about a GP completion. Offloading objects can span several
> > GPs.
>
> Ah, thanks! Now that I looked into the patch that recently added
> kvfree_rcu_barrier() I understand that a bit better.
>
> >
> > > Note that once I'm done upstreaming
> > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@xxxxxxxxxx/,
> > > this change will not be needed and I'm planning to remove this call,
> > > however this change is useful for backporting. It should be sent to
> > > stable@xxxxxxxxxxxxxxx # v6.10+
> > >
> > The kvfree_rcu_barrier() has been added into v6.12:
> >
> > <snip>
> > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
> > next-20240912
> > next-20240919
> > next-20240920
> > next-20241002
> > v6.12-rc1
> > urezki@pc638:~/data/raid0/coding/linux.git$
> > <snip>
> >
> > For 6.10+, it implies that the mentioned commit should be backported also.
>
> I see. I guess for pre-6.12 we would use rcu_barrier() instead of
> kvfree_rcu_barrier()?
>
For kernels < 6.12, unfortunately not :) If you have a possibility to
switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
which waits for all callbacks to be completed.
Or backport kvfree_rcu_barrier(). It __should__ be easy since there
were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.
--
Uladzislau Rezki