Re: [Patch v2 1/4] mm: enable dump several reasons for __dump_page()

From: Wei Yang
Date: Sat Jan 25 2020 - 21:52:36 EST


On Tue, Jan 21, 2020 at 10:50:41AM +0530, Anshuman Khandual wrote:
>
>
>On 01/20/2020 02:25 PM, Wei Yang wrote:
>> On Mon, Jan 20, 2020 at 11:42:30AM +0530, Anshuman Khandual wrote:
>>>
>>>
>>> On 01/20/2020 08:34 AM, Wei Yang wrote:
>>>> This is a preparation to dump all reasons during check page.
>>>
>>> This really makes sense rather then just picking the reason from
>>> the last "if" statement.
>>>
>>>>
>>>> Signed-off-by: Wei Yang <richardw.yang@xxxxxxxxxxxxxxx>
>>>> ---
>>>> include/linux/mmdebug.h | 2 +-
>>>> mm/debug.c | 11 ++++++-----
>>>> mm/page_alloc.c | 2 +-
>>>> 3 files changed, 8 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
>>>> index 2ad72d2c8cc5..f0a612db8bae 100644
>>>> --- a/include/linux/mmdebug.h
>>>> +++ b/include/linux/mmdebug.h
>>>> @@ -10,7 +10,7 @@ struct vm_area_struct;
>>>> struct mm_struct;
>>>>
>>>> extern void dump_page(struct page *page, const char *reason);
>>>> -extern void __dump_page(struct page *page, const char *reason);
>>>> +extern void __dump_page(struct page *page, int num, const char **reason);
>>>> void dump_vma(const struct vm_area_struct *vma);
>>>> void dump_mm(const struct mm_struct *mm);
>>>>
>>>> diff --git a/mm/debug.c b/mm/debug.c
>>>> index 0461df1207cb..a8ac6f951f9f 100644
>>>> --- a/mm/debug.c
>>>> +++ b/mm/debug.c
>>>> @@ -42,11 +42,11 @@ const struct trace_print_flags vmaflag_names[] = {
>>>> {0, NULL}
>>>> };
>>>>
>>>> -void __dump_page(struct page *page, const char *reason)
>>>> +void __dump_page(struct page *page, int num, const char **reason)
>>>> {
>>>> struct address_space *mapping;
>>>> bool page_poisoned = PagePoisoned(page);
>>>> - int mapcount;
>>>> + int mapcount, i;
>>>>
>>>> /*
>>>> * If struct page is poisoned don't access Page*() functions as that
>>>> @@ -97,8 +97,9 @@ void __dump_page(struct page *page, const char *reason)
>>>> sizeof(unsigned long), page,
>>>> sizeof(struct page), false);
>>>>
>>>> - if (reason)
>>>> - pr_warn("page dumped because: %s\n", reason);
>>>> + pr_warn("page dumped because:\n");
>>>> + for (i = 0; i < num; i++)
>>>> + pr_warn("\t%s\n", reason[i]);
>>>
>>> We should have a NR_BAD_PAGE_REASONS or something to cap this iteration
>>> and also check reason[i] for non-NULL before trying to print the array.
>>> There might be call sites like the following which will be problematic
>>> otherwise.
>>>
>>> split_huge_page_to_list() -> dump_page(head, NULL)
>>>
>>
>> You are right, I missed this case.
>>
>>>>
>>>> #ifdef CONFIG_MEMCG
>>>> if (!page_poisoned && page->mem_cgroup)
>>>
>>> While here, will it be better to move the above debug print block after
>>> mem_cgroup block instead ?
>>>
>>
>> Not sure, let's see whether others have some idea.
>>
>>>> @@ -108,7 +109,7 @@ void __dump_page(struct page *page, const char *reason)
>>>>
>>>> void dump_page(struct page *page, const char *reason)
>>>> {
>>>> - __dump_page(page, reason);
>>>> + __dump_page(page, 1, &reason);
>>>> dump_page_owner(page);
>>>> }
>>>> EXPORT_SYMBOL(dump_page);
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index d047bf7d8fd4..0cf6218aaba7 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -638,7 +638,7 @@ static void bad_page(struct page *page, const char *reason,
>>>>
>>>> pr_alert("BUG: Bad page state in process %s pfn:%05lx\n",
>>>> current->comm, page_to_pfn(page));
>>>> - __dump_page(page, reason);
>>>> + __dump_page(page, 1, &reason);
>>>> bad_flags &= page->flags;
>>>> if (bad_flags)
>>>> pr_alert("bad because of flags: %#lx(%pGp)\n",
>>>>
>>>
>>> Do we still need to have bad_flags ? After consolidating all reasons making
>>> a page bad should not we just print page->flags unconditionally each time and
>>> let the user decipher it instead. __dump_page() will print page->flags for
>>> each case (atleast after the new patch from Vlastimil). AFAICS, the only
>>> place currently consuming bad_flags is bad_page() which seems redundant after
>>> first calling __dump_page().
>>
>> Hmm... I don't catch this. The work in __dump_page() seems a little different
>> from this one. Not sure we could remove it.
>
>Lets look at 'bad_flags' as it exists today without this series.
>
>It gets evaluated in free_pages_check_bad() and check_new_page_bad() before
>being passed into bad_page(). All other call sites for bad_page() just pass
>0 for 'bad_flags'. Now in bad_page(), we have
>
> __dump_page(page, reason);
> bad_flags &= page->flags;
> if (bad_flags)
> pr_alert("bad because of flags: %#lx(%pGp)\n",
> bad_flags, &bad_flags);
>
>Here, bad_flags &= page->flags will always be positive when 'reason'
>is either
>
>"PAGE_FLAGS_CHECK_AT_FREE flag(s) set"
>
>or
>
>"PAGE_FLAGS_CHECK_AT_PREP flag set"
>
>The point here is we dont need to print bad_flags here as __dump_page()
>already prints page->flags universally along with the "bad_reason"
>after the following change.
>
>[1] https://patchwork.kernel.org/patch/11332035/

Hi, Anshuman

I am preparing a patch to remove the bad_flags. While since the above change
is not merged upstream yet, how can I wording the change log to point this
change?

Or I should wait till this one is merged?

--
Wei Yang
Help you, Help me