Re: [PATCH] mm/huge_memory: fix the memory leak due to the race

From: zhong jiang
Date: Wed Jun 22 2016 - 06:02:00 EST


On 2016/6/21 23:29, Kirill A. Shutemov wrote:
> On Tue, Jun 21, 2016 at 11:19:07PM +0800, zhong jiang wrote:
>> On 2016/6/21 22:37, Kirill A. Shutemov wrote:
>>> On Tue, Jun 21, 2016 at 10:05:56PM +0800, zhongjiang wrote:
>>>> From: zhong jiang <zhongjiang@xxxxxxxxxx>
>>>>
>>>> with great pressure, I run some test cases. As a result, I found
>>>> that the THP is not freed, it is detected by check_mm().
>>>>
>>>> BUG: Bad rss-counter state mm:ffff8827edb70000 idx:1 val:512
>>>>
>>>> Consider the following race :
>>>>
>>>> CPU0 CPU1
>>>> __handle_mm_fault()
>>>> wp_huge_pmd()
>>>> do_huge_pmd_wp_page()
>>>> pmdp_huge_clear_flush_notify()
>>>> (pmd_none = true)
>>>> exit_mmap()
>>>> unmap_vmas()
>>>> zap_pmd_range()
>>>> pmd_none_or_trans_huge_or_clear_bad()
>>>> (result in memory leak)
>>>> set_pmd_at()
>>>>
>>>> because of CPU0 have allocated huge page before pmdp_huge_clear_notify,
>>>> and it make the pmd entry to be null. Therefore, The memory leak can occur.
>>>>
>>>> The patch fix the scenario that the pmd entry can lead to be null.
>>> I don't think the scenario is possible.
>>>
>>> exit_mmap() called when all mm users have gone, so no parallel threads
>>> exist.
>>>
>> Forget this patch. It 's my fault , it indeed don not exist.
>> But I hit the following problem. we can see the memory leak when the process exit.
>>
>>
>> Any suggestion will be apprecaited.
> Could you try this:
>
> http://lkml.kernel.org/r/20160621150433.GA7536@xxxxxxxxxxxxxxxxxx
The patch I have seen , but I don not think this patch can fix so problem . if that race occur, pmd entry points to
the huge page will be changed , and freeze_page spilt pmd will fail. subsequent vm_bug_on() will fired.

freeze_page()
try_to_unmap()
split_huge_pmd_address() (return fail) result in page_mapcount is not zero
vm_bug_on()