On Thu 16-03-17 15:26:54, Avi Kivity wrote:
the default has changed since then because the THP faul latencies were
On 03/16/2017 02:34 PM, Michal Hocko wrote:
On Wed 15-03-17 18:50:32, Avi Kivity wrote:A good question; it was 3.10.something-el.something. The user mentioned
A user is trying to allocate 1TB of anonymous memory in parallel on 48 coresWhich kernel version is that?
(4 NUMA nodes). The kernel ends up spinning in isolate_freepages_block().
above updated to 4.4, and the problem was gone, so it looks like it is a Red
Hat specific problem. I would really like the 3.10.something kernel to
handle this workload well, but I understand that's not this list's concern.
What is the THP defrag modeThe default (always).
(/sys/kernel/mm/transparent_hugepage/defrag)?
just too large. Currently we only allow madvised VMAs to go stall and
even then we try hard to back off sooner rather than later. See
444eb2a449ef ("mm: thp: set THP defrag by default to madvise and add a
stall-free defrag option") merged in 4.4
Is there any strong reason to not use hugetlb then? You probably wantSo that I get huge pages even if transparent_hugepage/enabled=madvise. I'mI thought to help it along by using MAP_POPULATE, but then my MADV_HUGEPAGEWhy do you need MADV_HUGEPAGE?
won't be seen until after mmap() completes, with pages already populated.
Are MAP_POPULATE and MADV_HUGEPAGE mutually exclusive?
allocating almost all of the memory of that machine to be used as a giant
cache, so I want it backed by hugepages.
that memory reclaimable, right?
It is possible. A lot has changed since 3.10 times.Since the process starts with all of that memory free, there should not beIs my only option to serialize those memory allocations, and fault in thoseI am still not 100% sure I see what you are trying to achieve, though.
pages manually? Or perhaps use mlock()?
So you do not want all those processes to contend inside the compaction
while still allocate as many huge pages as possible?
any compaction going on (or perhaps very minimal eviction/movement of a few
pages here and there). And since it's fixed in later kernels, it looks like
the contention was not really mandated by the workload, just an artifact of
the implementation.