I think I can reproduce this, or something that at least looks very similar to this, on 5.10. Namely on 5.10.17 (On both Client and Server).
We are running slurm, and since a while now (coincides with updating from 5.4 to 5.10, but a whole bunch of other stuff was updated at the same time, so it took me a while to correlate this) the logs it writes have been truncated, but only while they're being observed on the client, using tail -f or something like that.
Looks like this then:
On Server:
store01 /srv/export/home/users/timo/TestRun # ls -l slurm-41101.out
-rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
store01 /srv/export/home/users/timo/TestRun # wc -l slurm-41101.out
61 slurm-41101.out
On Client:
timo@login01 ~/TestRun $ ls -l slurm-41101.out
-rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
timo@login01 ~/TestRun $ wc -l slurm-41101.out
24 slurm-41101.out
See https://gist.github.com/BtbN/b9eb4fc08ccc53bb20087bce0bf9f826 for the respective file-contents.
If I run the same test job, wait until its done, and then look at its slurm.out file, it matches between NFS Client and Server.
If I tail -f the slurm.out on an NFS client, the file stops getting updated on the client, but keeps getting more logs written to it on the NFS server.
The slurm.out file is being written to by another NFS client, which is running on one of the compute nodes of the system. It's being reads from a login node.
Timo
On 21.02.2021 16:53, Anton Ivanov wrote:
Client side. This seems to be an entirely client side issue.
A variety of kernels on the clients starting from 4.9 and up to 5.10 using 4.19 servers. I have observed it on a 4.9 client versus 4.9 server earlier.
4.9 fails, 4.19 fails, 5.2 fails, 5.4 fails, 5.10 works.
At present the server is at 4.19.67 in all tests.
Linux jain 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux
I can set-up a couple of alternative servers during the week, but so far everything is pointing towards a client fs cache issue, not a server one.
Brgds,