Re: [RFC PATCH v3 0/4] Support large folios for tmpfs

From: Baolin Wang
Date: Mon Oct 21 2024 - 02:24:38 EST

Next message: Tyrone Ting: "[PATCH v7 0/4] i2c: npcm: read/write operation, checkpatch"
Previous message: Ley Foon Tan: "[PATCH net-next, v1 0/3] net: stmmac: dwmac4: Fixes bugs in dwmac4"
In reply to: Kirill A. Shutemov: "Re: [RFC PATCH v3 0/4] Support large folios for tmpfs"
Next in thread: Kirill A. Shutemov: "Re: [RFC PATCH v3 0/4] Support large folios for tmpfs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2024/10/17 19:26, Kirill A. Shutemov wrote:

On Thu, Oct 17, 2024 at 05:34:15PM +0800, Baolin Wang wrote:

+ Kirill

On 2024/10/16 22:06, Matthew Wilcox wrote:

On Thu, Oct 10, 2024 at 05:58:10PM +0800, Baolin Wang wrote:

Considering that tmpfs already has the 'huge=' option to control the THP
allocation, it is necessary to maintain compatibility with the 'huge='
option, as well as considering the 'deny' and 'force' option controlled
by '/sys/kernel/mm/transparent_hugepage/shmem_enabled'.

No, it's not. No other filesystem honours these settings. tmpfs would
not have had these settings if it were written today. It should simply
ignore them, the way that NFS ignores the "intr" mount option now that
we have a better solution to the original problem.

To reiterate my position:

- When using tmpfs as a filesystem, it should behave like other
filesystems.
- When using tmpfs to implement MAP_ANONYMOUS | MAP_SHARED, it should
behave like anonymous memory.

I do agree with your point to some extent, but the ‘huge=’ option has
existed for nearly 8 years, and the huge orders based on write size may not
achieve the performance of PMD-sized THP in some scenarios, such as when the
write length is consistently 4K. So, I am still concerned that ignoring the
'huge' option could lead to compatibility issues.

Yeah, I don't think we are there yet to ignore the mount option.

OK.

Maybe we need to get a new generic interface to request the semantics
tmpfs has with huge= on per-inode level on any fs. Like a set of FADV_*
handles to make kernel allocate PMD-size folio on any allocation or on
allocations within i_size. I think this behaviour is useful beyond tmpfs.

Then huge= implementation for tmpfs can be re-defined to set these
per-inode FADV_ flags by default. This way we can keep tmpfs compatible
with current deployments and less special comparing to rest of
filesystems on kernel side.

I did a quick search, and I didn't find any other fs that require PMD-sized huge pages, so I am not sure if FADV_* is useful for filesystems other than tmpfs. Please correct me if I missed something.

If huge= is not set, tmpfs would behave the same way as the rest of
filesystems.

So if 'huge=' is not set, tmpfs write()/fallocate() can still allocate large folios based on the write size? If yes, that means it will change the default huge behavior for tmpfs. Because previously having 'huge=' is not set means the huge option is 'SHMEM_HUGE_NEVER', which is similar to what I mentioned:
"Another possible choice is to make the huge pages allocation based on write size as the *default* behavior for tmpfs, ..."

Next message: Tyrone Ting: "[PATCH v7 0/4] i2c: npcm: read/write operation, checkpatch"
Previous message: Ley Foon Tan: "[PATCH net-next, v1 0/3] net: stmmac: dwmac4: Fixes bugs in dwmac4"
In reply to: Kirill A. Shutemov: "Re: [RFC PATCH v3 0/4] Support large folios for tmpfs"
Next in thread: Kirill A. Shutemov: "Re: [RFC PATCH v3 0/4] Support large folios for tmpfs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]