Re: THP backed thread stacks

From: Peter Xu
Date: Mon Mar 06 2023 - 19:16:23 EST


On Mon, Mar 06, 2023 at 03:57:30PM -0800, Mike Kravetz wrote:
> One of our product teams recently experienced 'memory bloat' in their
> environment. The application in this environment is the JVM which
> creates hundreds of threads. Threads are ultimately created via
> pthread_create which also creates the thread stacks. pthread attributes
> are modified so that stacks are 2MB in size. It just so happens that
> due to allocation patterns, all their stacks are at 2MB boundaries. The
> system has THP always set, so a huge page is allocated at the first
> (write) fault when libpthread initializes the stack.
>
> It would seem that this is expected behavior. If you set THP always,
> you may get huge pages anywhere.
>
> However, I can't help but think that backing stacks with huge pages by
> default may not be the right thing to do. Stacks by their very nature
> grow in somewhat unpredictable ways over time. Using a large virtual
> space so that memory is allocated as needed is the desired behavior.
>
> The only way to address their 'memory bloat' via thread stacks today is
> by switching THP to madvise.
>
> Just wondering if there is anything better or more selective that can be
> done? Does it make sense to have THP backed stacks by default? If not,
> who would be best at disabling? A couple thoughts:
> - The kernel could disable huge pages on stacks. libpthread/glibc pass
> the unused flag MAP_STACK. We could key off this and disable huge pages.
> However, I'm sure there is somebody somewhere today that is getting better
> performance because they have huge pages backing their stacks.
> - We could push this to glibc/libpthreads and have them use
> MADV_NOHUGEPAGE on thread stacks. However, this also has the potential
> of regressing performance if somebody somewhere is getting better
> performance due to huge pages.

Yes it seems it's always not safe to change a default behavior to me.

For stack I really can't tell why it must be different here. I assume the
problem is the wasted space and it exaggerates easily with N-threads. But
IIUC it'll be the same as thp to normal memories iiuc, e.g., there can be a
per-thread mmap() of 2MB even if only 4K is used each, then if such mmap()
is populated by THP for each thread there'll also be a huge waste.

> - Other thoughts?
>
> Perhaps this is just expected behavior of THP always which is unfortunate
> in this situation.

I would think it's proper the app explicitly choose what it wants if
possible, and we do have the interfaces.

Then, would pthread_attr_getstack() plus MADV_NOHUGEPAGE work, which to be
applied from the JVM framework level?

Thanks,

--
Peter Xu