Re: [PATCH] NFS: only invalidate dentrys that are clearly invalid.

From: NeilBrown
Date: Mon Nov 16 2020 - 01:09:44 EST


On Mon, Nov 16 2020, Trond Myklebust wrote:

> On Mon, 2020-11-16 at 16:12 +1100, NeilBrown wrote:
>> On Mon, Nov 16 2020, Trond Myklebust wrote:
>>
>> > On Mon, 2020-11-16 at 16:00 +1100, NeilBrown wrote:
>> > > On Mon, Nov 16 2020, Trond Myklebust wrote:
>> > >
>> > > > On Mon, 2020-11-16 at 15:43 +1100, NeilBrown wrote:
>> > > > > On Mon, Nov 16 2020, Trond Myklebust wrote:
>> > > > >
>> > > > > > On Mon, 2020-11-16 at 13:59 +1100, NeilBrown wrote:
>> > > > > > >
>> > > > > > > Prior to commit 5ceb9d7fdaaf ("NFS: Refactor
>> > > > > > > nfs_lookup_revalidate()")
>> > > > > > > and error from nfs_lookup_verify_inode() other than -
>> > > > > > > ESTALE
>> > > > > > > would
>> > > > > > > result
>> > > > > > > in nfs_lookup_revalidate() returning that error code (-
>> > > > > > > ESTALE
>> > > > > > > is
>> > > > > > > mapped
>> > > > > > > to zero).
>> > > > > > > Since that commit, all errors result in zero being
>> > > > > > > returned.
>> > > > > > >
>> > > > > > > When nfs_lookup_revalidate() returns zero, the dentry is
>> > > > > > > invalidated
>> > > > > > > and, significantly, if the dentry is a directory that is
>> > > > > > > mounted
>> > > > > > > on,
>> > > > > > > that mountpoint is lost.
>> > > > > > >
>> > > > > > > If you:
>> > > > > > >  - mount an NFS filesystem which contains a directory
>> > > > > > >  - mount something (e.g. tmpfs) on that directory
>> > > > > > >  - use iptables (or scissors) to block traffic to the
>> > > > > > > server
>> > > > > > >  - ls -l the-mounted-on-directory
>> > > > > > >  - interrupt the 'ls -l'
>> > > > > > > you will find that the directory has been unmounted.
>> > > > > > >
>> > > > > > > This can be fixed by returning the actual error code from
>> > > > > > > nfs_lookup_verify_inode() rather then zero (except for -
>> > > > > > > ESTALE).
>> > > > > > >
>> > > > > > > Fixes: 5ceb9d7fdaaf ("NFS: Refactor
>> > > > > > > nfs_lookup_revalidate()")
>> > > > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx>
>> > > > > > > ---
>> > > > > > >  fs/nfs/dir.c | 8 +++++---
>> > > > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
>> > > > > > >
>> > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> > > > > > > index cb52db9a0cfb..d24acf556e9e 100644
>> > > > > > > --- a/fs/nfs/dir.c
>> > > > > > > +++ b/fs/nfs/dir.c
>> > > > > > > @@ -1350,7 +1350,7 @@ nfs_do_lookup_revalidate(struct
>> > > > > > > inode
>> > > > > > > *dir,
>> > > > > > > struct dentry *dentry,
>> > > > > > >                          unsigned int flags)
>> > > > > > >  {
>> > > > > > >         struct inode *inode;
>> > > > > > > -       int error;
>> > > > > > > +       int error = 0;
>> > > > > > >  
>> > > > > > >         nfs_inc_stats(dir, NFSIOS_DENTRYREVALIDATE);
>> > > > > > >         inode = d_inode(dentry);
>> > > > > > > @@ -1372,8 +1372,10 @@ nfs_do_lookup_revalidate(struct
>> > > > > > > inode
>> > > > > > > *dir,
>> > > > > > > struct dentry *dentry,
>> > > > > > >             nfs_check_verifier(dir, dentry, flags &
>> > > > > > > LOOKUP_RCU))
>> > > > > > > {
>> > > > > > >                 error = nfs_lookup_verify_inode(inode,
>> > > > > > > flags);
>> > > > > > >                 if (error) {
>> > > > > > > -                       if (error == -ESTALE)
>> > > > > > > +                       if (error == -ESTALE) {
>> > > > > > >                                 nfs_zap_caches(dir);
>> > > > > > > +                               error = 0;
>> > > > > > > +                       }
>> > > > > > >                         goto out_bad;
>> > > > > > >                 }
>> > > > > > >                 nfs_advise_use_readdirplus(dir);
>> > > > > > > @@ -1395,7 +1397,7 @@ nfs_do_lookup_revalidate(struct
>> > > > > > > inode
>> > > > > > > *dir,
>> > > > > > > struct dentry *dentry,
>> > > > > > >  out_bad:
>> > > > > > >         if (flags & LOOKUP_RCU)
>> > > > > > >                 return -ECHILD;
>> > > > > > > -       return nfs_lookup_revalidate_done(dir, dentry,
>> > > > > > > inode,
>> > > > > > > 0);
>> > > > > > > +       return nfs_lookup_revalidate_done(dir, dentry,
>> > > > > > > inode,
>> > > > > > > error);
>> > > > > >
>> > > > > > Which errors do we actually need to return here? As far as
>> > > > > > I
>> > > > > > can
>> > > > > > tell,
>> > > > > > the only errors that nfs_lookup_verify_inode() is supposed
>> > > > > > to
>> > > > > > return is
>> > > > > > ENOMEM, ESTALE, ECHILD, and possibly EIO or ETiMEDOUT.
>> > > > > >
>> > > > > > Why would it be better to return those errors rather than
>> > > > > > just
>> > > > > > a 0
>> > > > > > when
>> > > > > > we need to invalidate the inode, particularly since we
>> > > > > > already
>> > > > > > have
>> > > > > > a
>> > > > > > special case in nfs_lookup_revalidate_done() when the
>> > > > > > dentry is
>> > > > > > root?
>> > > > >
>> > > > > ERESTARTSYS is the error that easily causes problems.
>> > > > >
>> > > > > Returning 0 causes d_invalidate() to be called which is quite
>> > > > > heavy
>> > > > > handed in mountpoints.
>> > > >
>> > > > My point is that it shouldn't get returned for mountpoints. See
>> > > > nfs_lookup_revalidate_done().
>> > >
>> > > nfs_lookup_revalidate_done() only checks IS_ROOT(), and while
>> > > many
>> > > mountpoints are IS_ROOT(), not all are (--bind easily makes
>> > > others).
>> > >
>> > > But that isn't even really relevant here.  The dentry being
>> > > revalidated
>> > > is the underlying directory - that something else is mounted on.
>> > > step_into() which follows mount points is called in
>> > > walk_component()
>> > > *after* lookup_fast or lookup_slow which will have revalidated
>> > > the
>> > > dentry.
>> >
>> > So then why is it not sufficient to just add a check for
>> > d_mountpoint()? This is a revalidation, not a new lookup.
>> >
>>
>> I guess you could do that.
>> But why would you want to call d_invalidate() just because a signal
>> was
>> received, or a memory allocation failed?
>
> Why would I care about the error return from nfs_lookup_verify_inode()?

?? Why wouldn't you? What am I missing?

> This is a revalidation, and so sometimes the error returned is not
> transient, but is persistent (e.g. EIO/ETIMEDOUT if the server is
> down). In those cases, I still want to be able to do things like
> unmount the filesystem.

Why do you think that a server being down is a persistent error?
Servers come back up.

I don't have a particular opinion on how EIO and ETIMEDOUT should be
handled as I don't have a clear idea of what underlying issue produces
them.

I do know that ESTALE means the directory doesn't exist any more, so the
dentry should be invalidated, whether it is mounted on or not.
I do know that ERESTARTSYS means that a fatal signal was received by the
current process, so it is inappropriate to invalidate the dentry.

So I'm certain that we need to be able to handle different error codes
differently. Maybe EIO should be treated the same as ESTALE. Probably
ENOMEM should be handled like ERESTARTSYS....

NeilBrown


>
>>
>> NeilBrown
>>
>>
>> > >
>> > > NeilBrown
>> > >
>> > >
>> > > >
>> > > > > So it is only reasonable to return 0 when we have unambiguous
>> > > > > confirmation from the server that the object no longer
>> > > > > exists. 
>> > > > > ESTALE
>> > > > > is unambiguous. EIO might be unambiguous.  ERESTARTSYS,
>> > > > > ENOMEM,
>> > > > > ETIMEDOUT are transient and don't justify d_invalidate()
>> > > > > being
>> > > > > called.
>> > > > >
>> > > > > (BTW, Commit cc89684c9a26 ("NFS: only invalidate dentrys that
>> > > > > are
>> > > > > clearly invalid.")
>> > > > >  fixed much the same bug 3 years ago).
>> > > > >  
>> > > > > Thanks,
>> > > > > NeilBrown
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > >  }
>> > > > > > >  
>> > > > > > >  static int
>> > > > > >
>> > > > > > --
>> > > > > > Trond Myklebust
>> > > > > > Linux NFS client maintainer, Hammerspace
>> > > > > > trond.myklebust@xxxxxxxxxxxxxxx
>> > > >
>> > > > --
>> > > > Trond Myklebust
>> > > > CTO, Hammerspace Inc
>> > > > 4984 El Camino Real, Suite 208
>> > > > Los Altos, CA 94022
>> > > > ​
>> > > > www.hammer.space
>> >
>> > --
>> > Trond Myklebust
>> > Linux NFS client maintainer, Hammerspace
>> > trond.myklebust@xxxxxxxxxxxxxxx
>
> --
> Trond Myklebust
> CTO, Hammerspace Inc
> 4984 El Camino Real, Suite 208
> Los Altos, CA 94022
>
> www.hammer.space

Attachment: signature.asc
Description: PGP signature