NFS Caching broken in 4.19.37

From: Anton Ivanov
Date: Mon Jul 08 2019 - 14:58:01 EST


Hi list,

NFS caching appears broken in 4.19.37.

The more cores/threads the easier to reproduce. Tested with identical results on Ryzen 1600 and 1600X.

1. Mount an openwrt build tree over NFS v4
2. Run make -j `cat /proc/cpuinfo | grep vendor | wc -l` ; make clean in a loop
3. Result after 3-4 iterations:

State on the client

ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 8
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../

State as seen on the server (mounted via nfs from localhost):

ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../
-rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h

Actual state on the filesystem:

ls -laF /exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../
-rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h

So the client has quite clearly lost the plot. Telling it to drop caches and re-reading the directory shows the file present.

It is possible to reproduce this using a linux kernel tree too, just takes much more iterations - 10+ at least.

Both client and server run 4.19.37 from Debian buster. This is filed as debian bug 931500. I originally thought it to be autofs related, but IMHO it is actually something fundamentally broken in nfs caching resulting in cache corruption.

--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/