Re: [RFC PATCH 00/16] 1GB THP support on x86_64

From: William Kucharski
Date: Thu Sep 10 2020 - 06:06:32 EST

Next message: Dan Carpenter: "Re: [PATCH] hwmon: (pmbus) Expose PEC debugfs attribute"
Previous message: Wolfram Sang: "Re: [PATCH] i2c: stm32: do not display error when DMA is not requested"
In reply to: David Hildenbrand: "Re: [RFC PATCH 00/16] 1GB THP support on x86_64"
Next in thread: Zi Yan: "Re: [RFC PATCH 00/16] 1GB THP support on x86_64"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> On Sep 9, 2020, at 7:27 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 09.09.20 15:14, Jason Gunthorpe wrote:
>> On Wed, Sep 09, 2020 at 01:32:44PM +0100, Matthew Wilcox wrote:
>>
>>> But here's the thing ... we already allow
>>> mmap(MAP_POPULATE | MAP_HUGETLB | MAP_HUGE_1GB)
>>>
>>> So if we're not doing THP, what's the point of this thread?
>>
>> I wondered that too..
>>
>>> An madvise flag is a different beast; that's just letting the kernel
>>> know what the app thinks its behaviour will be. The kernel can pay
>>
>> But madvise is too late, the VMA already has an address, if it is not
>> 1G aligned it cannot be 1G THP already.
>
> That's why user space (like QEMU) is THP-aware and selects an address
> that is aligned to the expected THP granularity (e.g., 2MB on x86_64).

To me it's always seemed like there are two major divisions among THP use
cases:

1) Applications that KNOW they would benefit from use of THPs, so they
call madvise() with an appropriate parameter and explicitly inform the
kernel of such

2) Applications that know nothing about THP but there may be an
advantage that comes from "automatic" THP mapping when possible.

This is an approach that I am more familiar with that comes down to:

1) Is a VMA properly aligned for a (whatever size) THP?

2) Is the mapping request for a length >= (whatever size) THP?

3) Let's try allocating memory to map the space using (whatever size)
THP, and:

-- If we succeed, great, awesome, let's do it.
-- If not, no big deal, map using as large a page as we CAN get.

There of course are myriad performance implications to this. Processes
that start early after boot have a better chance of getting a THP,
but that also means frequently mapped large memory spaces have a better
chance of being mapped in a shared manner via a THP, e.g. libc, X servers
or Firefox/Chrome. It also means that processes that would be mapped
using THPs early in boot may not be if they should crash and need to be
restarted.

There are all sorts of tunables that would likely need to be in place to make
the second approach more viable, but I think it's certainly worth investigating.

The address selection you suggest is the basis of one of the patches I wrote
for a previous iteration of THP support (and that is in Matthew's THP tree)
that will try to round VM addresses to the proper alignment if possible so a
THP can then be used to map the area.

Next message: Dan Carpenter: "Re: [PATCH] hwmon: (pmbus) Expose PEC debugfs attribute"
Previous message: Wolfram Sang: "Re: [PATCH] i2c: stm32: do not display error when DMA is not requested"
In reply to: David Hildenbrand: "Re: [RFC PATCH 00/16] 1GB THP support on x86_64"
Next in thread: Zi Yan: "Re: [RFC PATCH 00/16] 1GB THP support on x86_64"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]