Re: [PATCH v2 0/5] mm/shmem: optimize read with reduced xarray lookups and folio batching

From: Chi Zhiling

Date: Mon Jun 01 2026 - 22:45:51 EST

On 6/2/26 08:43, Andrew Morton wrote:

On Mon, 1 Jun 2026 13:56:59 +0800 Chi Zhiling <chizhiling@xxxxxxx> wrote:

From: Chi Zhiling <chizhiling@xxxxxxxxxx>

This series improves shmem read performance by implementing folio
batching in the read path and reducing unnecessary xarray lookups.

Performance Results:

fio --ioengine=sync --rw=read --bs=$1 --size=1G --runtime=180 --time_based --group_reporting --name=seq_read_test --filename=testfile

| THP disabled in tmpfs | v7.1-rc5 | v7.1-rc5 + fbatch | Improvement |
| ---------------------- | ------------ | ----------------- | ----------- |
| 1M + normal file | bw=11.5GiB/s | bw=12.7GiB/s | +10.4% |
| 64k + normal file | bw=11.0GiB/s | bw=12.3GiB/s | +11.8% |
| 4k + normal file | bw=3826MiB/s | bw=3849MiB/s | +0.6% |
| 1M + fallocated file | bw=23.8GiB/s | bw=28.6GiB/s | +20.2% |
| 64k + fallocated file | bw=22.5GiB/s | bw=27.3GiB/s | +21.3% |
| 4k + fallocated file | bw=4655MiB/s | bw=4680MiB/s | +0.5% |
| 1M + hole | bw=24.2GiB/s | bw=28.6GiB/s | +18.2% |
| 64k + hole | bw=22.6GiB/s | bw=27.6GiB/s | +22.1% |
| 4k + hole | bw=4652MiB/s | bw=4489MiB/s | -3.5% |

| THP enabled in tmpfs | v7.1-rc5 | v7.1-rc5 + fbatch | Improvement |
| --------------------- | ------------ | ----------------- | ----------- |
| 1M + normal file | bw=13.7GiB/s | bw=13.9GiB/s | +1.4% |
| 64k + normal file | bw=13.5GiB/s | bw=13.5GiB/s | +0.0% |
| 4k + normal file | bw=3833MiB/s | bw=3859MiB/s | +0.7% |
| 1M + fallocated file | bw=24.9GiB/s | bw=34.2GiB/s | +37.3% |
| 64k + fallocated file | bw=23.0GiB/s | bw=31.4GiB/s | +36.5% |
| 4k + fallocated file | bw=4710MiB/s | bw=4655MiB/s | -1.2% |
| 1M + hole | bw=24.3GiB/s | bw=34.5GiB/s | +42.0% |
| 64k + hole | bw=23.5GiB/s | bw=31.1GiB/s | +32.3% |
| 4k + hole | bw=4690MiB/s | bw=4647MiB/s | -0.9% |

That looks nice.

Microbenchmarks are useful, but are you able to help us understand how
much benefit our users might see in real-world workloads?

Hi, Andrew

I don't have real-world performance data yet. I'm working on this simply because the patch shows decent gains in microbenchmarks. Even with THP enabled, it can still reduce some unnecessary overhead.

I'll take no action at this time - it's late in the cycle and reviewers
have yet to participate.

Yes, it's unlikely to land in 7.2, and I still need to resolve some performance regressions.

AI review flagged a few possible issues, so please take a look:
https://sashiko.dev/#/patchset/20260601055704.167436-1-chizhiling@xxxxxxx

Okay, I will take a close look.

Thanks!