Re: [PATCH v9 4/4] mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl

From: Mike Kravetz
Date: Wed May 04 2022 - 23:36:36 EST


On 5/4/22 19:35, Muchun Song wrote:
> On Wed, May 04, 2022 at 03:12:39PM -0700, Mike Kravetz wrote:
>> On 4/29/22 05:18, Muchun Song wrote:
>>> +static void vmemmap_optimize_mode_switch(enum vmemmap_optimize_mode to)
>>> +{
>>> + if (vmemmap_optimize_mode == to)
>>> + return;
>>> +
>>> + if (to == VMEMMAP_OPTIMIZE_OFF)
>>> + static_branch_dec(&hugetlb_optimize_vmemmap_key);
>>> + else
>>> + static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>> + vmemmap_optimize_mode = to;
>>> +}
>>> +
>>> static int __init hugetlb_vmemmap_early_param(char *buf)
>>> {
>>> bool enable;
>>> + enum vmemmap_optimize_mode mode;
>>>
>>> if (kstrtobool(buf, &enable))
>>> return -EINVAL;
>>>
>>> - if (enable)
>>> - static_branch_enable(&hugetlb_optimize_vmemmap_key);
>>> - else
>>> - static_branch_disable(&hugetlb_optimize_vmemmap_key);
>>> + mode = enable ? VMEMMAP_OPTIMIZE_ON : VMEMMAP_OPTIMIZE_OFF;
>>> + vmemmap_optimize_mode_switch(mode);
>>>
>>> return 0;
>>> }
>>> @@ -60,6 +80,8 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>>> vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT);
>>> vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
>>>
>>> + VM_BUG_ON_PAGE(!vmemmap_pages, head);
>>> +
>>> /*
>>> * The pages which the vmemmap virtual address range [@vmemmap_addr,
>>> * @vmemmap_end) are mapped to are freed to the buddy allocator, and
>>> @@ -69,8 +91,10 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>>> */
>>> ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
>>> GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
>>> - if (!ret)
>>> + if (!ret) {
>>> ClearHPageVmemmapOptimized(head);
>>> + static_branch_dec(&hugetlb_optimize_vmemmap_key);
>>> + }
>>>
>>> return ret;
>>> }
>>> @@ -84,6 +108,8 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
>>> if (!vmemmap_pages)
>>> return;
>>>
>>> + static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>
>> Can you explain the reasoning behind doing the static_branch_inc here in free,
>> and static_branch_dec in alloc?
>> IIUC, they may not be absolutely necessary but you could use the count to
>> know how many optimized pages are in use? Or, I may just be missing
>> something.
>>
>
> Partly right. One 'count' is not enough. I have implemented this with similar
> approach in v6 [1]. Except the 'count', we also need a lock to do synchronization.
> However, both count and synchronization are included in static_key_inc/dec
> infrastructure. It is simpler to use static_key_inc/dec directly, right?
>
> [1] https://lore.kernel.org/all/20220330153745.20465-5-songmuchun@xxxxxxxxxxxxx/
>

Sorry, but I am a little confused.

vmemmap_optimize_mode_switch will static_key_inc to enable and static_key_dec
to disable. In addition each time we optimize (allocate) a hugetlb page after
enabling we will static_key_inc.

Suppose we have 1 hugetlb page optimized. So static count == 2 IIUC.
The someone turns off optimization via sysctl. static count == 1 ???
If we then add another hugetlb page via nr_hugepages it seems that it
would be optimized as static count == 1. Is that correct? Do we need
to free all hugetlb pages with optimization before we can add new pages
without optimization?

--
Mike Kravetz