Re: [PATCH v3] mm/filemap: Allow arch to request folio size for exec memory
From: Ryan Roberts
Date: Thu Mar 27 2025 - 16:23:32 EST
+ Kalesh
On 27/03/2025 12:44, Matthew Wilcox wrote:
> On Thu, Mar 27, 2025 at 04:06:58PM +0000, Ryan Roberts wrote:
>> So let's special-case the read(ahead) logic for executable mappings. The
>> trade-off is performance improvement (due to more efficient storage of
>> the translations in iTLB) vs potential read amplification (due to
>> reading too much data around the fault which won't be used), and the
>> latter is independent of base page size. I've chosen 64K folio size for
>> arm64 which benefits both the 4K and 16K base page size configs and
>> shouldn't lead to any read amplification in practice since the old
>> read-around path was (usually) reading blocks of 128K. I don't
>> anticipate any write amplification because text is always RO.
>
> Is there not also the potential for wasted memory due to ELF alignment?
I think this is an orthogonal issue? My change isn't making that any worse.
> Kalesh talked about it in the MM BOF at the same time that Ted and I
> were discussing it in the FS BOF. Some coordination required (like
> maybe Kalesh could have mentioned it to me rathere than assuming I'd be
> there?)
I was at Kalesh's talk. David H suggested that a potential solution might be for
readahead to ask the fs where the next hole is and then truncate readahead to
avoid reading the hole. Given it's padding, nothing should directly fault it in
so it never ends up in the page cache. Not sure if you discussed anything like
that if you were talking in parallel?
Anyway, I'm not sure if you're suggesting these changes need to be considered as
one somehow or if you're just mentioning it given it is loosely related? My view
is that this change is an improvement indepently and could go in much sooner.
>
>> +#define arch_exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT)
>
> I don't think the "arch" really adds much value here.
I was following the pattern used by arch_wants_old_prefaulted_pte(),
arch_has_hw_pte_young(), etc. But I think you're right. I'll change as you suggest.
>
> #define exec_folio_order() get_order(SZ_64K)
ooh... get_order()... nice.
>
>> +#ifndef arch_exec_folio_order
>> +/*
>> + * Returns preferred minimum folio order for executable file-backed memory. Must
>> + * be in range [0, PMD_ORDER]. Negative value implies that the HW has no
>> + * preference and mm will not special-case executable memory in the pagecache.
>> + */
>> +static inline int arch_exec_folio_order(void)
>> +{
>> + return -1;
>> +}
>
> This feels a bit fragile. I often expect to be able to store an order
> in an unsigned int. Why not return 0 instead?
Well 0 is a valid order, no? I think we have had the "is order signed or
unsigned" argument before. get_order() returns a signed int :)
Personally I'd prefer to keep it signed and use a negative value as the
sentinel. I don't think 0 is the right choice because it's a valid order. How
about returning unsigned int and use UINT_MAX as the sentinel?
#define EXEC_FOLIO_ORDER_NONE UINT_MAX
static inline unsigned int arch_exec_folio_order(void)
{
return EXEC_FOLIO_ORDER_NONE;
}
Thanks,
Ryan