Re: [PATCH v3 3/5] mm, thp: introduce FOLL_SPLIT_PMD
From: Song Liu
Date: Thu Jun 13 2019 - 12:53:14 EST
Hi Kirill,
> On Jun 13, 2019, at 8:24 AM, Song Liu <songliubraving@xxxxxx> wrote:
>
>
>
>> On Jun 13, 2019, at 8:14 AM, Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
>>
>> On Thu, Jun 13, 2019 at 03:03:01PM +0000, Song Liu wrote:
>>>
>>>
>>>> On Jun 13, 2019, at 7:16 AM, Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:
>>>>
>>>> On Thu, Jun 13, 2019 at 01:57:30PM +0000, Song Liu wrote:
>>>>>> And I'm not convinced that it belongs here at all. User requested PMD
>>>>>> split and it is done after split_huge_pmd(). The rest can be handled by
>>>>>> the caller as needed.
>>>>>
>>>>> I put this part here because split_huge_pmd() for file-backed THP is
>>>>> not really done after split_huge_pmd(). And I would like it done before
>>>>> calling follow_page_pte() below. Maybe we can still do them here, just
>>>>> for file-backed THPs?
>>>>>
>>>>> If we would move it, shall we move to callers of follow_page_mask()?
>>>>> In that case, we will probably end up with similar code in two places:
>>>>> __get_user_pages() and follow_page().
>>>>>
>>>>> Did I get this right?
>>>>
>>>> Would it be enough to replace pte_offset_map_lock() in follow_page_pte()
>>>> with pte_alloc_map_lock()?
>>>
>>> This is similar to my previous version:
>>>
>>> + } else { /* flags & FOLL_SPLIT_PMD */
>>> + pte_t *pte;
>>> + spin_unlock(ptl);
>>> + split_huge_pmd(vma, pmd, address);
>>> + pte = get_locked_pte(mm, address, &ptl);
>>> + if (!pte)
>>> + return no_page_table(vma, flags);
>>> + spin_unlock(ptl);
>>> + ret = 0;
>>> + }
>>>
>>> I think this is cleaner than use pte_alloc_map_lock() in follow_page_pte().
>>> What's your thought on these two versions (^^^ vs. pte_alloc_map_lock)?
>>
>> It's additional lock-unlock cycle and few more lines of code...
>>
>>>> This will leave bunch not populated PTE entries, but it is fine: they will
>>>> be populated on the next access to them.
>>>
>>> We need to handle page fault during next access, right? Since we already
>>> allocated everything, we can just populate the PTE entries and saves a
>>> lot of page faults (assuming we will access them later).
>>
>> Not a lot due to faultaround and they may never happen, but you need to
>> tear down the mapping any way.
>
> I see. Let me try this way.
>
> Thanks,
> Song
To make sure I understand your suggestions. Here is what I got:
diff --git c/mm/gup.c w/mm/gup.c
index ddde097cf9e4..85e6f46fd925 100644
--- c/mm/gup.c
+++ w/mm/gup.c
@@ -197,7 +197,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
if (unlikely(pmd_bad(*pmd)))
return no_page_table(vma, flags);
- ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+ ptep = pte_alloc_map_lock(mm, pmd, address, &ptl);
+ if (!ptep)
+ return ERR_PTR(-ENOMEM);
+
pte = *ptep;
if (!pte_present(pte)) {
swp_entry_t entry;
@@ -398,7 +401,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
spin_unlock(ptl);
return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
}
- if (flags & FOLL_SPLIT) {
+ if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) {
int ret;
page = pmd_page(*pmd);
if (is_huge_zero_page(page)) {
@@ -407,7 +410,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
split_huge_pmd(vma, pmd, address);
if (pmd_trans_unstable(pmd))
ret = -EBUSY;
- } else {
+ } else if (flags & FOLL_SPLIT) {
if (unlikely(!try_get_page(page))) {
spin_unlock(ptl);
return ERR_PTR(-ENOMEM);
@@ -419,6 +422,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
put_page(page);
if (pmd_none(*pmd))
return no_page_table(vma, flags);
+ } else { /* flags & FOLL_SPLIT_PMD */
+ spin_unlock(ptl);
+ split_huge_pmd(vma, pmd, address);
+ ret = 0;
}
return ret ? ERR_PTR(ret) :
follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
This version doesn't work as-is, because it returns at the first check:
if (unlikely(pmd_bad(*pmd)))
return no_page_table(vma, flags);
Did I misunderstand anything here?
Thanks,
Song
>
>>
>> --
>> Kirill A. Shutemov