Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount

From: Baolin Wang

Date: Wed Apr 22 2026 - 20:44:04 EST

On 4/22/26 11:03 PM, Kefeng Wang wrote:

On 4/22/2026 2:28 PM, Baolin Wang wrote:

CC Kefeng,

On 4/21/26 9:39 PM, David Hildenbrand (Arm) wrote:

On 4/21/26 08:27, Baolin Wang wrote:

On 4/21/26 3:00 AM, David Hildenbrand (Arm) wrote:

On 4/17/26 14:45, Baolin Wang wrote:

Indeed. Good point.

Not really. There could be files created before remount whose mappings
don't support large folios (with 'huge=never' option), while files
created after remount will have mappings that support large folios (if
remounted with 'huge=always' option).

It looks like the previous commit 5a90c155defa was also problematic. The
huge mount option has introduced a lot of tricky issues:(

Now I think Zi's previous suggestion should be able to clean up this
mess? That is, calling mapping_set_large_folios() unconditionally for
all shmem mounts, and revisiting Kefeng's first version to fix the
performance issue.

Okay, so you'll send a patch to just set mapping_set_large_folios()
unconditionally?

I'm still hesitating on this. If we set mapping_set_large_folios()
unconditionally, we need to re-fix the performance regression that was
addressed by commit 5a90c155defa.

Just so I can follow: where is the test for large folios that we would
unlock large folios and cause a regression?

I spent some time investigating the performance regression that was addressed by commit 5a90c155defa ("tmpfs: don't enable large folios if not supported"). From my testing, I found that the performance issue no longer exists on upstream:

mount tmpfs -t tmpfs -o size=50G /mnt/tmpfs

Base:
dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.2 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (3.2 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (3.1 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (3.0 GB/s )
dd if=/dev/zero of=/mnt/tmpfs/test bs=3000K count=1398 (3.0 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (3.1 GB/s)

Base + revert 5a90c155defa:
dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.3 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (3.3 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (3.2 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (3.1 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/testbs=3000K count=1398 (3.0 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (3.1 GB/s)

The data is basically consistent with minor fluctuation noise.

Later, I continued investigating and found that commit 665575cff098b ("filemap: move prefaulting out of hot write path") fixed the write operation performance.

Base + revert 665575cff098b + revert 5a90c155defa:
dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.0 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (2.9 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (2.6 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (2.6 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=3000K count=1398 (2.5 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (2.5 GB/s)

We can see that after reverting commit 665575cff098b, there is a noticeable drop in write performance for tmpfs files.

So my conclusion is that we can now safely revert commit 5a90c155defa to set mapping_set_large_folios() for all shmem mounts unconditionally.

Kefeng, please correct me if I missed anything.

Hi Baolin，I found my testcases "bonnie Block/Re Write"

./bonnie -d /tmp -s Size (size is from 100,256,512,1024,2048,4096).

But the dd test is similar as well, and as commit 4e527d5841e2
("iomap: fault in smaller chunks for non-large folio mappings") said,
the issue is,

"If chunk is 2MB, total 512 pages need to be handled finally. During this
period, fault_in_iov_iter_readable() is called to check iov_iter readable
validity. Since only 4KB will be handled each time, below address space
will be checked over and over again"

But after 665575cff098b, fault_in_iov_iter_readable() is moved, so the
issue should be fixed.

Kefeng, thanks for confirming.