Re: [RFC PATCH] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP

From: David Hildenbrand (Arm)

Date: Tue Mar 03 2026 - 04:15:28 EST


On 3/3/26 08:00, hev wrote:
> On Tue, Mar 3, 2026 at 1:32 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>>
>> On Tue, Mar 03, 2026 at 12:31:59PM +0800, hev wrote:
>>>
>>> This optimization is not entirely free. Increasing PT_LOAD alignment
>>> can waste virtual address space, which is especially significant on
>>> 32-bit systems, and it also reduces ASLR entropy by limiting the
>>> number of possible load addresses.
>>>
>>> In addition, coarser alignment may have secondary microarchitectural
>>> effects (eg. on indirect branch prediction), depending on the
>>> workload. Because this change affects address space layout and
>>> security-related properties, providing users with a way to opt out is
>>> reasonable, rather than making it completely unconditional. This
>>> behavior fits naturally under READ_ONLY_THP_FOR_FS.
>>
>> This isn't reasonable at all. You're asking distro maintainers to make
>> a decision they have insufficient information to make. Almost none of
>> our users compile their own kernels, and frankly those that do don't have
>> enough information to make an informed decision about which way to choose.
>>
>> So if we're going to have a way to opt in/out, it needs to be something
>> different. Maybe a heuristic based on size of text segment? Maybe an
>> ELF flag? But then, if we're going to modify the binary, why not just
>> set p_align and then we don't need this patch at all?
>
> I agree that a compile-time config is not a good fit here, and I’m
> fine with dropping it in v2.
>
> Relying on ELF-side changes is problematic. Increasing p_align in the
> linker inflates file size due to extra padding, and more importantly
> it cannot help existing binaries. The loader is therefore the only
> place where this can be done without ABI changes or file size
> regressions.
>
> The logic here is deliberately strict rather than heuristic: the
> segment must be read-only, at least PMD_SIZE in length, and PMD_SIZE
> is capped at 32MB to avoid pathological address space waste. If these
> conditions are not met, the layout is unchanged.
>
> I don’t see a reliable way to make a smarter decision at load time
> without workload knowledge. With READ_ONLY_THP_FOR_FS already limiting
> the scope and the THP policy applied at runtime, this keeps the
> behavior constrained.

A note that READ_ONLY_THP_FOR_FS will likely go away soon.

--
Cheers,

David