[PATCH v2 0/5] mm/shmem: optimize read with reduced xarray lookups and folio batching

From: Chi Zhiling

Date: Mon Jun 01 2026 - 01:59:44 EST


From: Chi Zhiling <chizhiling@xxxxxxxxxx>

This series improves shmem read performance by implementing folio
batching in the read path and reducing unnecessary xarray lookups.

Performance Results:

fio --ioengine=sync --rw=read --bs=$1 --size=1G --runtime=180 --time_based --group_reporting --name=seq_read_test --filename=testfile

| THP disabled in tmpfs | v7.1-rc5 | v7.1-rc5 + fbatch | Improvement |
| ---------------------- | ------------ | ----------------- | ----------- |
| 1M + normal file | bw=11.5GiB/s | bw=12.7GiB/s | +10.4% |
| 64k + normal file | bw=11.0GiB/s | bw=12.3GiB/s | +11.8% |
| 4k + normal file | bw=3826MiB/s | bw=3849MiB/s | +0.6% |
| 1M + fallocated file | bw=23.8GiB/s | bw=28.6GiB/s | +20.2% |
| 64k + fallocated file | bw=22.5GiB/s | bw=27.3GiB/s | +21.3% |
| 4k + fallocated file | bw=4655MiB/s | bw=4680MiB/s | +0.5% |
| 1M + hole | bw=24.2GiB/s | bw=28.6GiB/s | +18.2% |
| 64k + hole | bw=22.6GiB/s | bw=27.6GiB/s | +22.1% |
| 4k + hole | bw=4652MiB/s | bw=4489MiB/s | -3.5% |


| THP enabled in tmpfs | v7.1-rc5 | v7.1-rc5 + fbatch | Improvement |
| --------------------- | ------------ | ----------------- | ----------- |
| 1M + normal file | bw=13.7GiB/s | bw=13.9GiB/s | +1.4% |
| 64k + normal file | bw=13.5GiB/s | bw=13.5GiB/s | +0.0% |
| 4k + normal file | bw=3833MiB/s | bw=3859MiB/s | +0.7% |
| 1M + fallocated file | bw=24.9GiB/s | bw=34.2GiB/s | +37.3% |
| 64k + fallocated file | bw=23.0GiB/s | bw=31.4GiB/s | +36.5% |
| 4k + fallocated file | bw=4710MiB/s | bw=4655MiB/s | -1.2% |
| 1M + hole | bw=24.3GiB/s | bw=34.5GiB/s | +42.0% |
| 64k + hole | bw=23.5GiB/s | bw=31.1GiB/s | +32.3% |
| 4k + hole | bw=4690MiB/s | bw=4647MiB/s | -0.9% |


v1:
https://lore.kernel.org/linux-mm/20260520101538.58745-1-chizhiling@xxxxxxx/#t
rfc:
https://lore.kernel.org/linux-fsdevel/20260515094702.1092355-1-chizhiling@xxxxxxx/


Chi Zhiling (5):
mm/filemap: reduce unnecessary xarray lookups when read cached pages
mm/filemap: reduce xarray lookups in filemap_get_folios_contig()
mm/shmem: introduce copy_zero_to_iter() for large zeroing
mm/shmem: remove page-copy fallback in shmem read path
mm/shmem: optimize file read with folio batching

mm/filemap.c | 46 +++++++++++--------
mm/shmem.c | 126 +++++++++++++++++++++++++++++++++++----------------
2 files changed, 113 insertions(+), 59 deletions(-)

--
2.43.0