THP backed thread stacks

From: Mike Kravetz
Date: Mon Mar 06 2023 - 18:57:48 EST


One of our product teams recently experienced 'memory bloat' in their
environment. The application in this environment is the JVM which
creates hundreds of threads. Threads are ultimately created via
pthread_create which also creates the thread stacks. pthread attributes
are modified so that stacks are 2MB in size. It just so happens that
due to allocation patterns, all their stacks are at 2MB boundaries. The
system has THP always set, so a huge page is allocated at the first
(write) fault when libpthread initializes the stack.

It would seem that this is expected behavior. If you set THP always,
you may get huge pages anywhere.

However, I can't help but think that backing stacks with huge pages by
default may not be the right thing to do. Stacks by their very nature
grow in somewhat unpredictable ways over time. Using a large virtual
space so that memory is allocated as needed is the desired behavior.

The only way to address their 'memory bloat' via thread stacks today is
by switching THP to madvise.

Just wondering if there is anything better or more selective that can be
done? Does it make sense to have THP backed stacks by default? If not,
who would be best at disabling? A couple thoughts:
- The kernel could disable huge pages on stacks. libpthread/glibc pass
the unused flag MAP_STACK. We could key off this and disable huge pages.
However, I'm sure there is somebody somewhere today that is getting better
performance because they have huge pages backing their stacks.
- We could push this to glibc/libpthreads and have them use
MADV_NOHUGEPAGE on thread stacks. However, this also has the potential
of regressing performance if somebody somewhere is getting better
performance due to huge pages.
- Other thoughts?

Perhaps this is just expected behavior of THP always which is unfortunate
in this situation.
--
Mike Kravetz