Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

From: Suren Baghdasaryan
Date: Tue Oct 15 2024 - 12:23:20 EST


On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
>
> On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote:
> > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> > >
> > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
> > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > Ben Greear reports following splat:
> > > > > > > > ------------[ cut here ]------------
> > > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > > > > > > ...
> > > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > > > > > > > codetag_unload_module+0x19b/0x2a0
> > > > > > > > ? codetag_load_module+0x80/0x80
> > > > > > > >
> > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks.
> > > > > > > >
> > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking
> > > > > > > > resolves this warning.
> > > > > > > >
> > > > > > > > Reproducer:
> > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > > > > > > rmmod nft_chain_nat
> > > > > > > > rmmod nf_nat # will WARN.
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > --- a/lib/codetag.c
> > > > > > > > +++ b/lib/codetag.c
> > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > > > > > > > if (!mod)
> > > > > > > > return true;
> > > > > > > >
> > > > > > > > + kvfree_rcu_barrier();
> > > > > > > > +
> > > > > > > > mutex_lock(&codetag_lock);
> > > > > > > > list_for_each_entry(cttype, &codetag_types, link) {
> > > > > > > > struct codetag_module *found = NULL;
> > > > > > >
> > > > > > > It's always hard to determine why a thing like this is present, so a
> > > > > > > comment is helpful:
> > > > > > >
> > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > > > > > > +++ a/lib/codetag.c
> > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> > > > > > > if (!mod)
> > > > > > > return true;
> > > > > > >
> > > > > > > + /* await any module's kfree_rcu() operations to complete */
> > > > > > > kvfree_rcu_barrier();
> > > > > > >
> > > > > > > mutex_lock(&codetag_lock);
> > > > > > > _
> > > > > > >
> > > > > > > But I do wonder whether this is in the correct place.
> > > > > > >
> > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > > > > > > should properly be done by the core module handling code?
> > > > > >
> > > > > > I don't think core module code cares about kfree_rcu()s being complete
> > > > > > before the module is unloaded.
> > > > > > Allocation tagging OTOH cares because it is about to destroy tags
> > > > > > which will be accessed when kfree() actually happens, therefore a
> > > > > > strict ordering is important.
> > > > > >
> > > > > > >
> > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the
> > > > > > > module memory itself and I think codetag_unload_module() could be
> > > > > > > called after that?
> > > > > >
> > > > > > I think we could move codetag_unload_module() after synchronize_rcu()
> > > > > > inside free_module() but according to the reply in
> > > > > > https://lore.kernel.org/all/20241007112904.GA27104@xxxxxxxxxxxxx/
> > > > > > synchronize_rcu() does not help. I'm not quite sure why...
> > > > > >
> > > > > It is because, synchronize_rcu() is used for a bit different things,
> > > > > i.e. it is about a GP completion. Offloading objects can span several
> > > > > GPs.
> > > >
> > > > Ah, thanks! Now that I looked into the patch that recently added
> > > > kvfree_rcu_barrier() I understand that a bit better.
> > > >
> > > > >
> > > > > > Note that once I'm done upstreaming
> > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@xxxxxxxxxx/,
> > > > > > this change will not be needed and I'm planning to remove this call,
> > > > > > however this change is useful for backporting. It should be sent to
> > > > > > stable@xxxxxxxxxxxxxxx # v6.10+
> > > > > >
> > > > > The kvfree_rcu_barrier() has been added into v6.12:
> > > > >
> > > > > <snip>
> > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
> > > > > next-20240912
> > > > > next-20240919
> > > > > next-20240920
> > > > > next-20241002
> > > > > v6.12-rc1
> > > > > urezki@pc638:~/data/raid0/coding/linux.git$
> > > > > <snip>
> > > > >
> > > > > For 6.10+, it implies that the mentioned commit should be backported also.
> > > >
> > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of
> > > > kvfree_rcu_barrier()?
> > > >
> > > For kernels < 6.12, unfortunately not :) If you have a possibility to
> > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
> > > which waits for all callbacks to be completed.
> >
> > We do not control the call-site inside the module, so this would not
> > be possible.
> >
> > >
> > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there
> > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.
> >
> > That sounds better. I'll take a stab at it. Thanks!

I prepared backports for stable 6.10.y and 6.11.y branches which
contain this patch along with the prerequisite "rcu/kvfree: Add
kvfree_rcu_barrier() API" patch. I think that's the only prerequisite
required. However, since this patch is not yet in Linus' tree, I'll
wait for it to get there before sending backports to stable (per Greg
KH's recommendation). Please let me know if you think it's more
critical and should be posted earlier.
Thanks,
Suren.

> >
> You are welcome!
>
> --
> Uladzislau Rezki
>