Re: [PATCH v6 5/7] kernfs: use i_lock to protect concurrent inode updates
From: Ian Kent
Date: Sun Jun 13 2021 - 21:32:37 EST
On Sat, 2021-06-12 at 01:45 +0000, Al Viro wrote:
> On Wed, Jun 09, 2021 at 04:51:22PM +0800, Ian Kent wrote:
> > The inode operations .permission() and .getattr() use the kernfs
> > node
> > write lock but all that's needed is to keep the rb tree stable
> > while
> > updating the inode attributes as well as protecting the update
> > itself
> > against concurrent changes.
>
> Huh? Where does it access the rbtree at all? Confused...
>
> > diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> > index 3b01e9e61f14e..6728ecd81eb37 100644
> > --- a/fs/kernfs/inode.c
> > +++ b/fs/kernfs/inode.c
> > @@ -172,6 +172,7 @@ static void kernfs_refresh_inode(struct
> > kernfs_node *kn, struct inode *inode)
> > {
> > struct kernfs_iattrs *attrs = kn->iattr;
> >
> > + spin_lock(&inode->i_lock);
> > inode->i_mode = kn->mode;
> > if (attrs)
> > /*
> > @@ -182,6 +183,7 @@ static void kernfs_refresh_inode(struct
> > kernfs_node *kn, struct inode *inode)
> >
> > if (kernfs_type(kn) == KERNFS_DIR)
> > set_nlink(inode, kn->dir.subdirs + 2);
> > + spin_unlock(&inode->i_lock);
> > }
>
> Even more so - just what are you serializing here? That code
> synchronizes inode
> metadata with those in kernfs_node. Suppose you've got two threads
> doing
> ->permission(); the first one gets through kernfs_refresh_inode() and
> goes into
> generic_permission(). No locks are held, so kernfs_refresh_inode()
> from another
> thread can run in parallel with generic_permission().
>
> If that's not a problem, why two kernfs_refresh_inode() done in
> parallel would
> be a problem?
>
> Thread 1:
> permission
> done refresh, all locks released now
> Thread 2:
> change metadata in kernfs_node
> Thread 2:
> permission
> goes into refresh, copying metadata into inode
> Thread 1:
> generic_permission()
> No locks in common between the last two operations, so
> we generic_permission() might see partially updated metadata.
> Either we don't give a fuck (in which case I don't understand
> what purpose does that ->i_lock serve) *or* we need the exclusion
> to cover a wider area.
This didn't occur to me, obviously.
It seems to me this can happen with the original code too although
using a mutex might reduce the likelihood of it happening.
Still ->permission() is meant to be a read-only function so the VFS
shouldn't need to care about it.
Do you have any suggestions on how to handle this.
Perhaps the only way is to ensure the inode is updated only in
functions that are expected to do this.
Ian