Re: [PATCH] mm: hwpoison: fix thp split handing in soft_offline_in_use_page()

From: zhong jiang
Date: Fri Mar 01 2019 - 02:46:42 EST

On 2019/3/1 15:29, Naoya Horiguchi wrote:
> On Tue, Feb 26, 2019 at 10:34:32PM +0800, zhong jiang wrote:
>> On 2019/2/26 21:51, Kirill A. Shutemov wrote:
>>> On Tue, Feb 26, 2019 at 07:18:00PM +0800, zhong jiang wrote:
>>>> From: zhongjiang <zhongjiang@xxxxxxxxxx>
>>>> When soft_offline_in_use_page() runs on a thp tail page after pmd is plit,
>>> s/plit/split/
>>>> we trigger the following VM_BUG_ON_PAGE():
>>>> Memory failure: 0x3755ff: non anonymous thp
>>>> __get_any_page: 0x3755ff: unknown zero refcount page type 2fffff80000000
>>>> Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
>>>> page:ffffea000d360140 count:0 mapcount:0 mapping:0000000000000000 index:0x1
>>>> flags: 0x2fffff80000000()
>>>> raw: 002fffff80000000 ffffea000d360108 ffffea000d360188 0000000000000000
>>>> raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
>>>> page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
>>>> ------------[ cut here ]------------
>>>> kernel BUG at ./include/linux/mm.h:519!
>>>> soft_offline_in_use_page() passed refcount and page lock from tail page to
>>>> head page, which is not needed because we can pass any subpage to
>>>> split_huge_page().
>>> I don't see a description of what is going wrong and why change will fixed
>>> it. From the description, it appears as it's cosmetic-only change.
>>> Please elaborate.
>> When soft_offline_in_use_page runs on a thp tail page after pmd is split,
>> and we pass the head page to split_huge_page, Unfortunately, the tail page
>> can be free or count turn into zero.
> I guess that you have the similar fix on memory_failure() in your mind:
> commit c3901e722b2975666f42748340df798114742d6d
> Author: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> Date: Thu Nov 10 10:46:23 2016 -0800
> mm: hwpoison: fix thp split handling in memory_failure()
> So it seems that I somehow missed fixing soft offline when I wrote commit
> c3901e722b29, and now you find and fix that. Thank you very much.
> If you resend the patch with fixing typo, can you add some reference to
> c3901e722b29 in the patch description to show the linkage?
> And you can add the following tags:
Yep, I find that that is a similar issue. hence I refer to that description in the patch you
had mentioned.

I will add the above desprition you had mentioned in V2.

zhong jiang
> Fixes: 61f5d698cc97 ("mm: re-enable THP")
> Acked-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> Thanks,
> Naoya Horiguchi
> .