Re: [PATCH v2] mm: shmem: don't set large-order range for internal shmem mount
From: Baolin Wang
Date: Wed Apr 15 2026 - 22:08:37 EST
On 4/16/26 9:52 AM, Zi Yan wrote:
On 15 Apr 2026, at 21:45, Baolin Wang wrote:
On 4/16/26 9:36 AM, Zi Yan wrote:
On 15 Apr 2026, at 21:22, Baolin Wang wrote:
On 4/16/26 9:11 AM, Zi Yan wrote:
On 15 Apr 2026, at 21:05, Baolin Wang wrote:
On 4/15/26 10:36 PM, David Hildenbrand (Arm) wrote:
On 4/15/26 12:05, Baolin Wang wrote:
On 4/15/26 5:54 PM, David Hildenbrand (Arm) wrote:
Yes, that makes sense.
However, it’s also possible that the mapping does not support large
folios, yet anonymous shmem can still allocate large folios via the
sysfs interfaces. That doesn't make sense, right?
That's what I am saying: if there could be large folios in there, then
let's tell the world.
Getting in a scenario where the mapping claims to not support large
folios, but then we have large folios in there is inconsistent, not?
[...]
For the current anonymous shmem (tmpfs is already clear, no questions),
I don’t think there will be any "will never have/does never allow"
cases, because it can be changed dynamically via the sysfs interfaces.
Right. It's about non-anon shmem with huge=off.
If we still want that logic, then for anonymous shmem we can treat it as
always "might have large folios".
OK. To resolve the confusion about 1, the logic should be changed as
follows. Does that make sense to you?
if (sbinfo->huge || (sb->s_flags & SB_KERNMOUNT))
mapping_set_large_folios(inode->i_mapping);
I think that's better.
Thanks for your valuable input.
But has Willy says, maybe we can just
unconditionally set it and have it even simpler.
However, for tmpfs mounts, we should still respect the 'huge=' mount option. See commit 5a90c155defa ("tmpfs: don't enable large folios if not supported").
Is it possible to get sbinfo->huge during tmpfs’s folio allocation time, so that
even if all tmpfs has mapping_set_large_folios() but sbinfo->huge can still
decide whether huge page will be allocated for a tmpfs?
Yes, of course. However, the issue isn’t whether tmpfs allows allocating large folios.
The problem commit 5a90c155defa tries to fix is that when tmpfs is mounted with the 'huge=never' option, we will not allocate large folios for it. Then when writing tmpfs files, generic_perform_write() will call mapping_max_folio_size() to get the chunk size and ends up with an order-9 size for writing tmpfs files. However, this tmpfs file is populated only with small folios, resulting in a performance regression.
IIUC, generic_perform_write() needs to use a small chunk if tmpfs denies huge.
It seems that Kefeng did that in the first try[1]. But willy suggested
the current fix.
I wonder if we should revisit Kefeng’s first version.
[1] https://lore.kernel.org/all/20240914140613.2334139-1-wangkefeng.wang@xxxxxxxxxx/
Personally, I still prefer the current fix (commit 5a90c155defa). We should honor the tmpfs mount option. If it explicitly says no large folios, we shouldn’t call mapping_set_large_folios(). Isn’t that more consistent with its semantics?
Filesystems wishing to turn on large folios in the pagecache should call
``mapping_set_large_folios`` when initializing the incore inode.
You mean tmpfs with huge option set is a FS wishing to turn on large
folios in the pagecache, otherwise it is a FS wishing not to have large folio
in the pagecache. tmpfs with different options is seen as different FSes.
What I mean is that tmpfs is somewhat different from other filesystems. We have tried to make tmpfs behave like other FSes, but differences remain. For example, the previous fix to tmpfs’s large folio allocation policy, see commit 69e0a3b49003 ("mm: shmem: fix the strategy for the tmpfs 'huge=' options").
So the tmpfs specific 'huge=' mount option is another way it differs from other filesystems.