Re: [PATCH v2 2/2] mm/hugetlb: refactor subpage recording

From: Zi Yan
Date: Sat Feb 13 2021 - 10:48:37 EST


On 11 Feb 2021, at 18:44, Mike Kravetz wrote:

> On 2/11/21 12:47 PM, Zi Yan wrote:
>> On 28 Jan 2021, at 16:53, Mike Kravetz wrote:
>>
>>> On 1/28/21 10:26 AM, Joao Martins wrote:
>>>> For a given hugepage backing a VA, there's a rather ineficient
>>>> loop which is solely responsible for storing subpages in GUP
>>>> @pages/@vmas array. For each subpage we check whether it's within
>>>> range or size of @pages and keep increment @pfn_offset and a couple
>>>> other variables per subpage iteration.
>>>>
>>>> Simplify this logic and minimize the cost of each iteration to just
>>>> store the output page/vma. Instead of incrementing number of @refs
>>>> iteratively, we do it through pre-calculation of @refs and only
>>>> with a tight loop for storing pinned subpages/vmas.
>>>>
>>>> Additionally, retain existing behaviour with using mem_map_offset()
>>>> when recording the subpages for configurations that don't have a
>>>> contiguous mem_map.
>>>>
>>>> pinning consequently improves bringing us close to
>>>> {pin,get}_user_pages_fast:
>>>>
>>>> - 16G with 1G huge page size
>>>> gup_test -f /mnt/huge/file -m 16384 -r 30 -L -S -n 512 -w
>>>>
>>>> PIN_LONGTERM_BENCHMARK: ~12.8k us -> ~5.8k us
>>>> PIN_FAST_BENCHMARK: ~3.7k us
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
>>>> ---
>>>> mm/hugetlb.c | 49 ++++++++++++++++++++++++++++---------------------
>>>> 1 file changed, 28 insertions(+), 21 deletions(-)
>>>
>>> Thanks for updating this.
>>>
>>> Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>>>
>>> I think there still is an open general question about whether we can always
>>> assume page structs are contiguous for really big pages. That is outside
>>
>> I do not think page structs need to be contiguous, but PFNs within a big page
>> need to be contiguous, at least based on existing code like mem_map_offset() we have.
>
> Thanks for looking Zi,
> Yes, PFNs need to be contiguous. Also, as you say page structs do not need
> to be contiguous. The issue is that there is code that assumes page structs
> are contiguous for gigantic pages. hugetlb code does not make this assumption
> and does a pfn_to_page() when looping through page structs for gigantic pages.
>
> I do not believe this to be a huge issue. In most cases CONFIG_VIRTUAL_MEM_MAP
> is defined and struct pages can be accessed contiguously. I 'think' we could
> run into problems with CONFIG_SPARSEMEM and without CONFIG_VIRTUAL_MEM_MAP
> and doing hotplug operations. However, I still need to look into more.

Yeah, you are right about this. The combination of CONFIG_SPARSEMEM,
!CONFIG_SPARSEMEM_VMEMMAP and doing hotplug does cause errors, as simple as
dynamically reserving gigantic hugetlb pages then freeing them in a system
with CONFIG_SPARSEMEM_VMEMMAP not set and some hotplug memory.

Here are the steps to reproduce:
0. Configure a kernel with CONFIG_SPARSEMEM_VMEMMAP not set.
1. Create a VM using qemu with “-m size=8g,slots=16,maxmem=16g” to enable hotplug.
2. After boot the machine, add large enough memory using
“object_add memory-backend-ram,id=mem1,size=7g” and
“device_add pc-dimm,id=dimm1,memdev=mem1”.
3. In the guest OS, online all hot-plugged memory. My VM has 128MB memory block size.
If you have larger memory block size, I think you will need to plug in more memory.
4. Reserve gigantic hugetlb pages so that hot-plugged memory will be used. I reserved
12GB, like “echo 12 | sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages”.
5. Free all hugetlb gigantic pages,
“echo 0 | sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages”.
6. You will get “BUG: Bad page state in process …” errors.

The patch below can fix the error, but I suspect there might be other places missing
the necessary mem_map_next() too.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bdb58ab14cb..aae99c6984f3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1319,7 +1319,8 @@ static void update_and_free_page(struct hstate *h, struct page *page)
h->nr_huge_pages--;
h->nr_huge_pages_node[page_to_nid(page)]--;
for (i = 0; i < pages_per_huge_page(h); i++) {
- page[i].flags &= ~(1 << PG_locked | 1 << PG_error |
+ struct page *subpage = mem_map_next(subpage, page, i);
+ subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
1 << PG_referenced | 1 << PG_dirty |
1 << PG_active | 1 << PG_private |
1 << PG_writeback);



Best Regards,
Yan Zi

Attachment: signature.asc
Description: OpenPGP digital signature