Re: THP backed thread stacks

From: William Kucharski
Date: Sat Mar 11 2023 - 07:27:31 EST



> On Mar 10, 2023, at 04:25, David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 10.03.23 02:40, William Kucharski wrote:
>>> On Mar 9, 2023, at 17:05, Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote:
>>>
>>>> I think the hugepage alignment in their environment was somewhat luck.
>>>> One suggestion made was to change stack size to avoid alignment and
>>>> hugepage usage. That 'works' but seems kind of hackish.
>>>
>>> That was my first thought, if the alignment was purely due to luck,
>>> and not somebody manually specifying it. Agreed it's kind of hackish
>>> if anyone can get bit by this by sheer luck.
>> I don't agree it's "hackish" at all, but I go more into that below.
>>>
>>>> Also, David H pointed out the somewhat recent commit to align sufficiently
>>>> large mappings to THP boundaries. This is going to make all stacks huge
>>>> page aligned.
>>>
>>> I think that change was reverted by Linus in commit 0ba09b173387
>>> ("Revert "mm: align larger anonymous mappings on THP boundaries""),
>>> until it's perf regressions were better understood -- and I haven't
>>> seen a revamp of it.
>> It's too bad it was reverted, though I understand the concerns regarding it.
>> From my point of view, if an address is properly aligned and a caller is
>> asking for 2M+ to be mapped, it's going to be advantageous from a purely
>> system-focused point of view to do that mapping with a THP.
>
> Just noting that, if user space requests multiple smaller mappings, and the kernel decides to all place them in the same PMD, all VMAs might get merged and you end up with a properly aligned VMA where khugepaged would happily place a THP.
>
> That case is, of course, different to the "user space asks for 2M+" mapping case, but from khugepaged perspective they might look alike -- and it might be unclear if a THP is valuable or not (IOW maybe that THP could be better used somewhere else).

That's a really, really good point.

My general philosophy on the subject (if the address is aligned and the caller is asking for a THP-sized allocation, why not map it with a THP if you can) kind of falls apart when it's the system noticing it can coalesce a bunch of smaller allocations into one THP via khugepaged.

Arguably it's the difference between the caller knowing it's asking for something THP-sized on its behalf and the system deciding to remap a bunch of disparate mappings using a THP because _it_ can.

If we were to say allow a caller's request for a THP-sized allocation/mapping take priority over those from khugepaged, it would not only be a major vector for abuse, it would also lead to completely indeterminate behavior ("When I start my browser after a reboot I get a bunch of THPs, but after the system's been up for a few weeks, I don't, how come?")

I don't have a good answer here.

-- Bill