[PATCH v2 0/3] fs: make the i_size_read/write helpers be smp_load_acquire/store_release()

From: Baokun Li
Date: Wed Jan 24 2024 - 09:26:41 EST


V1->V2:
Add patch 3 to fix an error when compiling code for 32-bit architectures
without CONFIG_SMP enabled.

This patchset follows the Linus suggestion to make the i_size_read/write
helpers be smp_load_acquire/store_release(), after which the extra smp_rmb
in filemap_read() is no longer needed, so it is removed. And remove the
extra type checking in smp_load_acquire/smp_store_release under the
!CONFIG_SMP case to avoid compilation errors.

Functional tests were performed and no new problems were found.

Here are the results of unixbench tests based on 6.7.0-next-20240118 on
arm64, with some degradation in single-threading and some optimization in
multi-threading, but overall the impact is not significant.

### 72 CPUs in system; running 1 parallel copy of tests
System Benchmarks Index Values | base | patched | cmp |
--------------------------------------|---------|---------|--------|
Dhrystone 2 using register variables | 3635.06 | 3596.3 | -1.07% |
Double-Precision Whetstone | 808.58 | 808.58 | 0.00% |
Execl Throughput | 623.52 | 618.1 | -0.87% |
File Copy 1024 bufsize 2000 maxblocks | 1715.82 | 1668.58 | -2.75% |
File Copy 256 bufsize 500 maxblocks | 1320.98 | 1250.16 | -5.36% |
File Copy 4096 bufsize 8000 maxblocks | 2639.36 | 2488.48 | -5.72% |
Pipe Throughput | 869.06 | 872.3 | 0.37% |
Pipe-based Context Switching | 106.26 | 117.22 | 10.31% |
Process Creation | 247.72 | 246.74 | -0.40% |
Shell Scripts (1 concurrent) | 1234.98 | 1226 | -0.73% |
Shell Scripts (8 concurrent) | 6893.96 | 6210.46 | -9.91% |
System Call Overhead | 493.72 | 494.28 | 0.11% |
--------------------------------------|---------|---------|--------|
Total | 1003.92 | 989.58 | -1.43% |

### 72 CPUs in system; running 72 parallel copy of tests
System Benchmarks Index Values | base | patched | cmp |
--------------------------------------|-----------|-----------|--------|
Dhrystone 2 using register variables | 260471.88 | 258065.04 | -0.92% |
Double-Precision Whetstone | 58212.32 | 58219.3 | 0.01% |
Execl Throughput | 6954.7 | 7444.08 | 7.04% |
File Copy 1024 bufsize 2000 maxblocks | 64244.74 | 64618.24 | 0.58% |
File Copy 256 bufsize 500 maxblocks | 89933.8 | 87026.38 | -3.23% |
File Copy 4096 bufsize 8000 maxblocks | 79808.14 | 81916.42 | 2.64% |
Pipe Throughput | 62174.38 | 62389.74 | 0.35% |
Pipe-based Context Switching | 27239.28 | 27887.24 | 2.38% |
Process Creation | 3551.28 | 3800.54 | 7.02% |
Shell Scripts (1 concurrent) | 19212.26 | 20749.34 | 8.00% |
Shell Scripts (8 concurrent) | 20842.02 | 21958.12 | 5.36% |
System Call Overhead | 35328.24 | 35451.68 | 0.35% |
--------------------------------------|-----------|-----------|--------|
Total | 35592.42 | 36450.36 | 2.41% |

Baokun Li (3):
fs: make the i_size_read/write helpers be
smp_load_acquire/store_release()
Revert "mm/filemap: avoid buffered read/write race to read
inconsistent data"
asm-generic: remove extra type checking in acquire/release for non-SMP
case

include/asm-generic/barrier.h | 2 --
include/linux/fs.h | 10 ++++++++--
mm/filemap.c | 9 ---------
3 files changed, 8 insertions(+), 13 deletions(-)

--
2.31.1