Re: NFS Caching broken in 4.19.37
From: Timo Rothenpieler
Date: Fri Feb 26 2021 - 10:10:48 EST
I think I can reproduce this, or something that at least looks very
similar to this, on 5.10. Namely on 5.10.17 (On both Client and Server).
We are running slurm, and since a while now (coincides with updating
from 5.4 to 5.10, but a whole bunch of other stuff was updated at the
same time, so it took me a while to correlate this) the logs it writes
have been truncated, but only while they're being observed on the
client, using tail -f or something like that.
Looks like this then:
On Server:
store01 /srv/export/home/users/timo/TestRun # ls -l slurm-41101.out
-rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
store01 /srv/export/home/users/timo/TestRun # wc -l slurm-41101.out
61 slurm-41101.out
On Client:
timo@login01 ~/TestRun $ ls -l slurm-41101.out
-rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
timo@login01 ~/TestRun $ wc -l slurm-41101.out
24 slurm-41101.out
See https://gist.github.com/BtbN/b9eb4fc08ccc53bb20087bce0bf9f826 for
the respective file-contents.
If I run the same test job, wait until its done, and then look at its
slurm.out file, it matches between NFS Client and Server.
If I tail -f the slurm.out on an NFS client, the file stops getting
updated on the client, but keeps getting more logs written to it on the
NFS server.
The slurm.out file is being written to by another NFS client, which is
running on one of the compute nodes of the system. It's being reads from
a login node.
Timo
On 21.02.2021 16:53, Anton Ivanov wrote:
Client side. This seems to be an entirely client side issue.
A variety of kernels on the clients starting from 4.9 and up to 5.10
using 4.19 servers. I have observed it on a 4.9 client versus 4.9 server
earlier.
4.9 fails, 4.19 fails, 5.2 fails, 5.4 fails, 5.10 works.
At present the server is at 4.19.67 in all tests.
Linux jain 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
x86_64 GNU/Linux
I can set-up a couple of alternative servers during the week, but so far
everything is pointing towards a client fs cache issue, not a server one.
Brgds,
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature