Re: [PATCH] mm: fix account pmd page to the process

From: Mike Kravetz
Date: Thu Jun 16 2016 - 12:48:13 EST


On 06/16/2016 09:31 AM, Michal Hocko wrote:
> On Thu 16-06-16 09:05:23, Mike Kravetz wrote:
>> On 06/16/2016 08:43 AM, Michal Hocko wrote:
>>> [It seems that this patch has been sent several times and this
>>> particular copy didn't add Kirill who has added this code CC him now]
>>>
>>> On Thu 16-06-16 17:42:14, Michal Hocko wrote:
>>>> On Thu 16-06-16 19:36:11, zhongjiang wrote:
>>>>> From: zhong jiang <zhongjiang@xxxxxxxxxx>
>>>>>
>>>>> when a process acquire a pmd table shared by other process, we
>>>>> increase the account to current process. otherwise, a race result
>>>>> in other tasks have set the pud entry. so it no need to increase it.
>>>>>
>>>>> Signed-off-by: zhong jiang <zhongjiang@xxxxxxxxxx>
>>>>> ---
>>>>> mm/hugetlb.c | 5 ++---
>>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>> index 19d0d08..3b025c5 100644
>>>>> --- a/mm/hugetlb.c
>>>>> +++ b/mm/hugetlb.c
>>>>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>>>> if (pud_none(*pud)) {
>>>>> pud_populate(mm, pud,
>>>>> (pmd_t *)((unsigned long)spte & PAGE_MASK));
>>>>> - } else {
>>>>> + } else
>>>>> put_page(virt_to_page(spte));
>>>>> - mm_inc_nr_pmds(mm);
>>>>> - }
>>>>
>>>> The code is quite puzzling but is this correct? Shouldn't we rather do
>>>> mm_dec_nr_pmds(mm) in that path to undo the previous inc?
>>
>> I agree that the code is quite puzzling. :(
>>
>> However, if this were an issue I would have expected to see some reports.
>> Oracle DB makes use of this feature (shared page tables) and if the pmd
>> count is wrong we would catch it in check_mm() at exit time.
>>
>> Upon closer examination, I believe the code in question is never executed.
>> Note the callers of huge_pmd_share. The calling code looks like:
>>
>> if (want_pmd_share() && pud_none(*pud))
>> pte = huge_pmd_share(mm, addr, pud);
>> else
>> pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>
>> Therefore, we do not call huge_pmd_share unless pud_none(*pud). The
>> code in question is only executed when !pud_none(*pud).
>
> My understanding is that the check is needed after we retake page lock
> because we might have raced with other thread. But it's been quite some
> time since I've looked at hugetlb locking and page table sharing code.

That is correct, we could have raced. Duh!

In the case of a race, the other thread would have incremented the
PMD count already. Your suggestion of decrementing pmd count in
this case seems to be the correct approach. But, I need to think
about this some more.

--
Mike Kravetz