RE: [PATCH] NFSv4: nfs4_do_fsinfo() should not do implicit lease renewals
From: Robert Milkowski
Date: Mon Dec 16 2019 - 13:36:30 EST
Hi,
If a sub-filesystem (nfsv4 mirror mount) gets unmounted and then mounted
again (by accessing it) the nfs4_do_fsinfo() function is called,
which currently assumes implicit lease renewal. I believe this is no
compliant with the RFC.
I've managed to trigger the issue by two different methods:
1) in prod
If there is an NFSv4 filesystem mounted with sub-mounts (similar setup as
below), after nfs_mountpoint_expiry_timeout of inactivity,
Each submount will be unmounted. If now it gets accessed again it will be
automatically mounted again which will result currently
in implicit lease renewal on the client side which in turn can result in a
relatively small window where the client thinks its lease
is still valid while an nfs server has already expired the lease.
2) manual unmount
# cat /etc/exports
/ *(rw,sync)
/var *(rw,sync)
On a Linux NFS client:
# mount -o vers=4 10.50.2.59:/ /mnt/3
$ head /mnt/3/var/log/vmware-vmsvc.log >/dev/null
$ df -h | tail -2
10.50.2.59:/ 29G 25G
2.3G 92% /mnt/3
10.50.2.59:/var 20G 2.2G
16G 13% /mnt/3/var
# while [ 1 ]; do date; umount /mnt/3/var; ls /mnt/3/var >/dev/null; sleep
10; done
...
By constantly unmounting the sub-filesystem (/var) and then accessing it so
it gets mounted again (which triggers the nfs4_do_fsinfo()),
the cl_last_renewal is set to now on the client which prevents RENEW
operations from being send, and eventually the NFS server
will expire the lease (common defaults are 60s on Linux and 90s on Solaris
servers).
In testing I confirmed that both Linux and Solaris NFSv4 servers will not do
an implicit lease renewal in this case
(nfs4_do_fsinfo() results in GETATTR operations being send), in which case
the lease might expire and both Linux and Solaris
NFS servers will return NFS4ERR_EXPIRED.
The error is not handled correctly either and will result in EIO propagated
to an application issuing open().
See my other email with subject: [PATCH] NFSv4: open() should try lease
recovery on NFS4ERR_EXPIRED
which contains a fix for the NFS4ERR_EXPIRED handling.
This patch however fixes the issue with implicit lease renewal by not
setting the cl_last_renewal to now,
unless there is no lease set yet.
This was tested with 5.5.0-rc2 and the provided patch is applied on top of
the 5.5.0-rc2 as well.
btw: Recent ONTAP versions return NFS4ERR_STALE_CLIENTID which is handled
correctly - Linux client will try to renew its lease and if successful it
will retry open().
Best regards,
Robert Milkowski