Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

From: Al Viro
Date: Tue Apr 30 2019 - 00:00:53 EST


On Mon, Apr 29, 2019 at 08:37:29PM -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019, 20:09 Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> >
> > ... except that this callback can (and always could) get executed after
> > freeing struct super_block.
> >
>
> Ugh.
>
> That food looks nasty. Shouldn't the super block freeing wait for the
> filesystem to be all done instead? Do a rcu synchronization or something?
>
> Adding that pointer looks really wrong to me. I'd much rather delay the sb
> freeing. Is there some reason that can't be done that I'm missing?

Where would you put that synchronize_rcu()? Doing that before ->put_super()
is too early - inode references might be dropped in there. OTOH, doing
that after that point means that while struct super_block itself will be
there, any number of data structures hanging from it might be not.

So we are still very limited in what we can do inside ->free_inode()
instance *and* we get bunch of synchronize_rcu() for no good reason.

Note that for normal lockless accesses (lockless ->d_revalidate(), ->d_hash(),
etc.) we are just fine with having struct super_block freeing RCU-delayed
(along with any data structures we might need) - the superblock had
been seen at some point after we'd taken rcu_read_lock(), so its
freeing won't happen until we drop it. So we don't need synchronize_rcu()
for that.

Here the problem is that we are dealing with another RCU callback;
synchronize_rcu() would be needed for it, but it will only protect that
intermediate dereference of ->i_sb; any rcu-delayed stuff scheduled
from inside ->put_super() would not be ordered wrt ->free_inode().
And if we are doing that just for the sake of that one dereference,
we might as well do it before scheduling i_callback().

PS: we *are* guaranteed that module will still be there (unregister_filesystem()
does synchronize_rcu() and rcu_barrier() is done before kmem_cache_destroy()
in assorted exit_foo_fs()).