[PATCH v1 0/5] mm/shmem: optimize read with reduced xarray lookups and folio batching

From: Chi Zhiling

Date: Wed May 20 2026 - 06:24:47 EST


From: Chi Zhiling <chizhiling@xxxxxxxxxx>

This series improves shmem read performance by implementing folio
batching in the read path and reducing unnecessary xarray lookups.


Changes since RFC
=================

The RFC version used xas_for_each() in shmem_get_read_batch(), which
introduced about a 1% regression for 4K read workloads.

This v1 addresses the regression in patch 2 by switching to
filemap_get_folios_contig() and optimizing it to avoid the extra
xarray traversal overhead.


Performance Results
===================

Testing was performed with fio sequential read workloads:

fio --ioengine=sync --rw=read --size=1G --runtime=180


### THP Disabled - Normal Files ###

| Block Size | Baseline | v1 | Improvement |
| ---------- | --------- | --------- | ----------- |
| 1M | 11.4GiB/s | 12.7GiB/s | +11.4% |
| 64k | 11.2GiB/s | 12.2GiB/s | +8.9% |
| 4k | 3809MiB/s | 3838MiB/s | +0.8% |

### THP Disabled - Fallocated Files ###

| Block Size | Baseline | v1 | Improvement |
| ---------- | --------- | --------- | ----------- |
| 1M | 23.7GiB/s | 28.7GiB/s | +21.1% |
| 64k | 22.6GiB/s | 27.0GiB/s | +19.5% |
| 4k | 4668MiB/s | 4678MiB/s | +0.2% |

### THP Enabled - Normal Files ###

| Block Size | Baseline | v1 | Improvement |
| ---------- | --------- | --------- | ----------- |
| 1M | 13.9GiB/s | 13.9GiB/s | 0% |
| 64k | 13.4GiB/s | 13.4GiB/s | 0% |
| 4k | 3818MiB/s | 3836MiB/s | +0.5% |

### THP Enabled - Fallocated Files ###

| Block Size | Baseline | v1 | Improvement |
| ---------- | --------- | --------- | ----------- |
| 1M | 24.1GiB/s | 34.9GiB/s | +44.8% |
| 64k | 22.9GiB/s | 31.3GiB/s | +36.7% |
| 4k | 4721MiB/s | 4708MiB/s | -0.3% |


rfc:
https://lore.kernel.org/linux-fsdevel/20260515094702.1092355-1-chizhiling@xxxxxxx/


Chi Zhiling (5):
mm/filemap: reduce unnecessary xarray lookups when read cached pages
mm/filemap: reduce xarray lookups in filemap_get_folios_contig()
mm/shmem: make SGP_NOALLOC succeed on hole like SGP_READ
mm/shmem: introduce copy_zero_to_iter() for large zeroing
mm/shmem: optimize file read with folio batching

include/linux/shmem_fs.h | 2 +-
mm/filemap.c | 34 ++++++++++++------
mm/khugepaged.c | 2 +-
mm/shmem.c | 75 +++++++++++++++++++++++++++++-----------
4 files changed, 79 insertions(+), 34 deletions(-)

--
2.43.0