RE: [PATCH] NFSv4: open() should try lease recovery on NFS4ERR_EXPIRED

From: Robert Milkowski
Date: Mon Dec 16 2019 - 13:36:45 EST


Hi,


Currently if nfsv4 lease expires on a server Linux client will return EIO to
open and won't try to recover.

For example:

$ date; strace -t -e trace=open head /mnt/3/var/log/vmware-vmsvc.log
>/dev/null
Mon 16 Dec 17:37:16 GMT 2019
...
17:37:16 open("/mnt/3/var/log/vmware-vmsvc.log", O_RDONLY) = -1 EIO
(Input/output error)
...


The network/nfs traffic:

2638 17:37:16.286642918 10.50.2.57 -> 10.50.2.59 NFS 246 V4 Call ACCESS
FH: 0x62d40c52, [Check: RD LU MD XT DL]
2639 17:37:16.286929287 10.50.2.59 -> 10.50.2.57 NFS 194 V4 Reply (Call In
2638) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
2640 17:37:16.287295942 10.50.2.57 -> 10.50.2.59 NFS 258 V4 Call ACCESS
FH: 0xeddb7439, [Check: RD LU MD XT DL]
2641 17:37:16.287448815 10.50.2.59 -> 10.50.2.57 NFS 194 V4 Reply (Call In
2640) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
2642 17:37:16.287666576 10.50.2.57 -> 10.50.2.59 NFS 266 V4 Call ACCESS
FH: 0x8503e2cd, [Check: RD LU MD XT DL]
2643 17:37:16.287786851 10.50.2.59 -> 10.50.2.57 NFS 194 V4 Reply (Call In
2642) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
2644 17:37:16.287984808 10.50.2.57 -> 10.50.2.59 NFS 350 V4 Call OPEN DH:
0x8503e2cd/vmware-vmsvc.log
2645 17:37:16.288194276 10.50.2.59 -> 10.50.2.57 NFS 122 V4 Reply (Call In
2644) OPEN Status: NFS4ERR_EXPIRED


Both Linux and Solaris NFS servers return NFS4ERR_EXPIRED if nfsv4 lease
expire.
Recent ONTAP versions return NFS4ERR_STALE_CLIENTID which is handled
correctly.

The patch changes handling of NFS4ERR_EXPIRED error to also try to recover.

The issue can be triggered by exporting over NFSv4 a filesystem with a
sub-filesystem,
for example by exporting / and /var (assuming /var is a separate
filesystem).

How to reproduce the issue:

On an NFS server run:

# cat /etc/exports
/ *(rw,sync)
/var *(rw,sync)


On a Linux NFS client:

# mount -o vers=4 10.50.2.59:/ /mnt/3

$ date; strace -t -e trace=open head /mnt/3/var/log/vmware-vmsvc.log
>/dev/null
Mon 16 Dec 17:28:45 GMT 2019
...
17:28:45 open("/mnt/3/var/log/vmware-vmsvc.log", O_RDONLY) = 3
...


Now, on the NFS client run:

# while [ 1 ]; do date; umount /mnt/3/var; ls /mnt/3/var >/dev/null; sleep
10; done
...

Because of another bug (see email with subject: [PATCH] NFSv4:
nfs4_do_fsinfo() should not do implicit lease renewals)
the above while loop will prevent the NFS client from sending RENEW
operations to server as it currently assumes an implicit
lease renewal which I believe is not compliant with the RFC. If you now wait
long enough for the NFS server to expire the lease,
and then try to open a file (with no other NFS activity) it should result in
server sending NFS4ERR_EXPIRED and the NFS LINUX client
will return EIO to an application as shown above.


This also gets triggered if a sub-mount in automatically unmounted after
nfs_mountpoint_expiry_timeout, then shortly after there is some
access to it which will trigger for it to be automatically mounted again
which will delay the client in sending the RENEW operation.
This results in a short window during which open() will return EIO to any
files from the file server.

This was tested with 5.5.0-rc2 and the provided patch is applied on top of
the 5.5.0-rc2 as well.


Best regards,
Robert Milkowski