Re: [PATCH v2 1/5] 9p: Fix refcounting during full path walks for fid lookups

From: Christian Schoenebeck
Date: Wed Jun 01 2022 - 10:29:31 EST


On Dienstag, 31. Mai 2022 16:28:29 CEST Tyler Hicks wrote:
> On 2022-05-30 19:14:43, Christian Schoenebeck wrote:
> > On Freitag, 27. Mai 2022 01:59:59 CEST Tyler Hicks wrote:
> > > Decrement the refcount of the parent dentry's fid after walking
> > > each path component during a full path walk for a lookup. Failure to do
> > > so can lead to fids that are not clunked until the filesystem is
> > >
> > > unmounted, as indicated by this warning:
> > > 9pnet: found fid 3 not clunked
> >
> > That explains why I saw so many fids not being clunked with recent Linux
> > kernel versions while doing some 9p protocol debugging with QEMU recently.
>
> In addition to this refcounting bug, there's another one that I noticed
> while running fstests. My series does not fix it and I haven't had a
> chance to look into it more. The generic/531 test triggers it.
>
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/tests/generic/5
> 31
> > > The improper refcounting after walking resulted in open(2) returning
> > > -EIO on any directories underneath the mount point when using the virtio
> > > transport. When using the fd transport, there's no apparent issue until
> > > the filesytem is unmounted and the warning above is emitted to the logs.
> >
> > Actually I never saw that open() = -EIO error. Do you have a reproducer?
>
> The reproducer that I have is binary only (fairly large and runs a bunch
> of different tests) and is used to regression test the Windows Subsystem
> for Linux 2 (WSL2) host <-> guest filesystem sharing. Now that I think
> about it, I'm not sure if the open() = -EIO error happens with other 9p
> servers.
>
> I can try to tease out the exact sequence of filesystem operations from
> this test binary but it might take me a bit. It looks like it has to do
> with switching UIDs, which could make sense because different users may
> not be connected to the filesystem yet (the conditional block that does
> p9_client_attach() and v9fs_fid_add()).
>
> > > In some cases, the user may not yet be attached to the filesystem and a
> > > new root fid, associated with the user, is created and attached to the
> > > root dentry before the full path walk is performed. Increment the new
> > > root fid's refcount to two in that situation so that it can be safely
> > > decremented to one after it is used for the walk operation. The new fid
> > > will still be attached to the root dentry when
> > > v9fs_fid_lookup_with_uid() returns so a final refcount of one is
> > > correct/expected.
> > >
> > > Fixes: 6636b6dcc3db ("9p: add refcount to p9_fid struct")
> > > Cc: stable@xxxxxxxxxxxxxxx
> > > Signed-off-by: Tyler Hicks <tyhicks@xxxxxxxxxxxxxxxxxxx>
> > > ---
> > >
> > > fs/9p/fid.c | 17 +++++------------
> > > 1 file changed, 5 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/fs/9p/fid.c b/fs/9p/fid.c
> > > index 79df61fe0e59..5a469b79c1ee 100644
> > > --- a/fs/9p/fid.c
> > > +++ b/fs/9p/fid.c
> > > @@ -152,7 +152,7 @@ static struct p9_fid
> > > *v9fs_fid_lookup_with_uid(struct
> > > dentry *dentry, const unsigned char **wnames, *uname;
> > >
> > > int i, n, l, clone, access;
> > > struct v9fs_session_info *v9ses;
> > >
> > > - struct p9_fid *fid, *old_fid = NULL;
> > > + struct p9_fid *fid, *old_fid;
> > >
> > > v9ses = v9fs_dentry2v9ses(dentry);
> > > access = v9ses->flags & V9FS_ACCESS_MASK;
> > >
> > > @@ -194,13 +194,12 @@ static struct p9_fid
> > > *v9fs_fid_lookup_with_uid(struct
> > > dentry *dentry, if (IS_ERR(fid))
> > >
> > > return fid;
> > >
> > > + refcount_inc(&fid->count);
> > >
> > > v9fs_fid_add(dentry->d_sb->s_root, fid);
> > >
> > > }
> > > /* If we are root ourself just return that */
> > >
> > > - if (dentry->d_sb->s_root == dentry) {
> > > - refcount_inc(&fid->count);
> > > + if (dentry->d_sb->s_root == dentry)
> > >
> > > return fid;
> > >
> > > - }
> >
> > Hmm, wouldn't it then be possible that the root fid is returned with
> > refcount being 2 here?
>
> Yes and I think that's correct. One refcount taken for adding the root
> fid to the root dentry and another refcount taken for the original
> purpose of the lookup.
>
> Reverting this portion of the change and re-testing with the reproducer
> triggers a refcount underflow.

Right, I still have some knowledge gaps in the kernel's 9p code base. I was
actually rather confused about p9_client_clunk() which I just realized
actually does the refcount decrement and then conditionally sends out the
'Tclunk' message on refcount zero only.

So yes, it looks fine to me:

Reviewed-by: Christian Schoenebeck <linux_oss@xxxxxxxxxxxxx>

> > > /*
> > >
> > > * Do a multipath walk with attached root.
> > > * When walking parent we need to make sure we
> > >
> > > @@ -212,6 +211,7 @@ static struct p9_fid
> > > *v9fs_fid_lookup_with_uid(struct
> > > dentry *dentry, fid = ERR_PTR(n);
> > >
> > > goto err_out;
> > >
> > > }
> > >
> > > + old_fid = fid;
> > >
> > > clone = 1;
> > > i = 0;
> > > while (i < n) {
> > >
> > > @@ -221,15 +221,8 @@ static struct p9_fid
> > > *v9fs_fid_lookup_with_uid(struct
> > > dentry *dentry, * walk to ensure none of the patch component change
> > >
> > > */
> > >
> > > fid = p9_client_walk(fid, l, &wnames[i], clone);
> > >
> > > + p9_client_clunk(old_fid);
> > >
> > > if (IS_ERR(fid)) {
> > >
> > > - if (old_fid) {
> > > - /*
> > > - * If we fail, clunk fid which are
> >
> > mapping
> >
> > > - * to path component and not the last
> >
> > component
> >
> > > - * of the path.
> > > - */
> > > - p9_client_clunk(old_fid);
> > > - }
> > >
> > > kfree(wnames);
> > > goto err_out;
> > >
> > > }
> >
> > So this is the actual fix mentioned in the commit log. Makes sense.
>
> I think the refcount_inc() change for the root fid is an important and
> required part of the fix.
>
> > Nitpicking: Wouldn't it be a bit cleaner to set old_fid solely within the
> > while loop and just before overwriting fid? And as we now have bumped to
> > -std=C11, probably making old_fid a local variable within loop scope only?
>
> You're right that it would be cleaner for the purposes of this single
> patch. In a followup patch in this series, I start tracking the root fid
> with a root_fid variable and that requires "old_fid = root_fid" before
> we enter the loop and then "old_fid = fid" inside of the loop.

s/while/for/ would do the trick I guess. Not a big deal though.

Best regards,
Christian Schoenebeck