Re: regression, bisected: getcwd() ENOENT on NFS4...

From: Daniel J Blueman
Date: Thu Nov 05 2009 - 19:41:39 EST


On Thu, Nov 5, 2009 at 5:45 PM, Trond Myklebust
<trond.myklebust@xxxxxxxxxx> wrote:
> On Wed, 2009-11-04 at 09:36 +0000, Daniel J Blueman wrote:
>> On Sun, Nov 1, 2009 at 12:47 PM, Daniel J Blueman
>> <daniel.blueman@xxxxxxxxx> wrote:
>> > Hi Trond,
>> >
>> > On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
>> > <Trond.Myklebust@xxxxxxxxxx> wrote:
>> >> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
>> >>> Since 2.6.30-rc, I've been experiencing various issues relating to
>> >>> getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
>> >>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
>> >>> server) to bisect [2].
>> >>>
>> >>> The impact of this regression is moderate (side-effects range from
>> >>> benign to failure), so we should get a fix into 2.6.32 if at all
>> >>> possible and strongly consider a 2.6.31 stable update.
>> >>>
>> >>> Thanks,
>> >>>   Daniel
>> >>>
>> >>> --- [1]
>> >>>
>> >>> $ apt-get source apt
>> >>> $ cd apt-*
>> >>> $ ./configure && make
>> >>> [snip]
>> >>> sh: getcwd() failed: No such file or directory
>> >>>
>> >>> --- [2]
>> >>>
>> >>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
>> >>> commit a65318bf3afc93ce49227e849d213799b072c5fd
>> >>> Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
>> >>> Date:   Wed Mar 11 14:10:28 2009 -0400
>> >>>
>> >>>     NFSv4: Simplify some cache consistency post-op GETATTRs
>> >>
>> >> I'm having a lot of trouble seeing how this patch could result in
>> >> ENOENT. All it should be doing is reducing the frequency with which we
>> >> update some of the inode metadata.
>> >>
>> >> Have you ever been able to capture one of these errors using strace?
>> >
>> > Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32-rc5
>> > on server) corrects the behaviour. It's readily reproducible [1];
>> > using 2.6.30, the issue is not seen, thus is a regression.
>> >
>> > To observe the change to user-level behaviour (after the reproducer commands):
>> > # make clean
>> > # strace -ffe getcwd make -n >list
>> > [pid  3829] getcwd(0x7fffa269a380, 4096) = -1 ENOENT (No such file or directory)
>> > make: getcwd: No such file or directory
>> >
>> > Would this help for me to log this via a bugzilla.kernel.org ticket?
>> >
>> > Thanks,
>> >  Daniel
>> >
>> > --- [1]
>> >
>> > booting eg:
>> > http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-desktop-amd64.iso
>> >
>> > $ sudo bash
>> > # apt-get install build-essential
>> > # apt-get build-dep apt
>> > # mount server:/ /mnt -tnfs4 && cd /mnt
>> > # apt-get source apt
>> > # cd apt-0.7.23.1ubuntu2
>> > # ./configure && make
>> >  -> "getcwd: No such file or directory" messages observed with cited
>> > patch and not without
>>
>> For continuity with the mailing list thread, I've created a bug report
>> of this at:
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=14541
>
> I just committed the following patch into the above bugzilla entry. I
> hope it suffices to fix the bug.
>
> Cheers
>  Trond
> -------------------------------------------------------------------
> NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
> From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
>
> Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify some
> cache consistency post-op GETATTRs) incorrectly changed the getattr
> bitmap for readdir().
> This causes the readdir() function to fail to return a
> fileid/inode number, which again exposed a bug in the NFS readdir code that
> causes spurious ENOENT errors to appear in applications (see
> http://bugzilla.kernel.org/show_bug.cgi?id=14541).
>
> The immediate band aid is to revert the incorrect bitmap change, but more
> long term, we should change the NFS readdir code to cope with the
> fact that NFSv4 servers are not required to support fileids/inode numbers.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
> ---
>
>  fs/nfs/nfs4proc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index ff37454..741a562 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -2767,7 +2767,7 @@ static int _nfs4_proc_readdir(struct dentry *dentry, struct rpc_cred *cred,
>                .pages = &page,
>                .pgbase = 0,
>                .count = count,
> -               .bitmask = NFS_SERVER(dentry->d_inode)->cache_consistency_bitmask,
> +               .bitmask = NFS_SERVER(dentry->d_inode)->attr_bitmask,
>        };
>        struct nfs4_readdir_res res;
>        struct rpc_message msg = {
>

This fixes the behaviour and passes some heavy testing with two good
test-cases, with 2.6.32-rc6. As well, this would be good value for the
stable stream. I've sync'd the bugzilla report.

Thanks for your work on this, Trond!

Daniel
--
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/