Re: [PATCH v2 1/5] 9p: Fix refcounting during full path walks for fid lookups
From: Tyler Hicks
Date: Mon Jun 06 2022 - 23:41:37 EST
On 2022-06-01 16:28:49, Christian Schoenebeck wrote:
> On Dienstag, 31. Mai 2022 16:28:29 CEST Tyler Hicks wrote:
> > On 2022-05-30 19:14:43, Christian Schoenebeck wrote:
> > > On Freitag, 27. Mai 2022 01:59:59 CEST Tyler Hicks wrote:
> > > > Decrement the refcount of the parent dentry's fid after walking
> > > > each path component during a full path walk for a lookup. Failure to do
> > > > so can lead to fids that are not clunked until the filesystem is
> > > >
> > > > unmounted, as indicated by this warning:
> > > > 9pnet: found fid 3 not clunked
> > >
> > > That explains why I saw so many fids not being clunked with recent Linux
> > > kernel versions while doing some 9p protocol debugging with QEMU recently.
> >
> > In addition to this refcounting bug, there's another one that I noticed
> > while running fstests. My series does not fix it and I haven't had a
> > chance to look into it more. The generic/531 test triggers it.
> >
> > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/tests/generic/5
> > 31
> > > > The improper refcounting after walking resulted in open(2) returning
> > > > -EIO on any directories underneath the mount point when using the virtio
> > > > transport. When using the fd transport, there's no apparent issue until
> > > > the filesytem is unmounted and the warning above is emitted to the logs.
> > >
> > > Actually I never saw that open() = -EIO error. Do you have a reproducer?
> >
> > The reproducer that I have is binary only (fairly large and runs a bunch
> > of different tests) and is used to regression test the Windows Subsystem
> > for Linux 2 (WSL2) host <-> guest filesystem sharing. Now that I think
> > about it, I'm not sure if the open() = -EIO error happens with other 9p
> > servers.
This -EIO error looks to be specific to the WSL 9p server. I was unable
to reproduce it with QEMU's 9p server. I just see unclunked fids with
QEMU.
> >
> > I can try to tease out the exact sequence of filesystem operations from
> > this test binary but it might take me a bit. It looks like it has to do
> > with switching UIDs, which could make sense because different users may
> > not be connected to the filesystem yet (the conditional block that does
> > p9_client_attach() and v9fs_fid_add()).
I didn't have much luck here. This issue only reproduces after a
sequence of somewhat unrelated tests running in succession. They each
contain a lot of unnecessary filesystem operations but they each contain
some setuid() calls which makes some sense considering the refcounting
change proposed in this patch.
9p maintainers, is there anything else that I can help with to get this
bug fix reviewed/merged? Thanks!
Tyler