Re: [PATCH v2] mm: page_alloc: dump migrate-failed pages

From: Michal Hocko
Date: Tue Mar 09 2021 - 04:33:36 EST


On Mon 08-03-21 12:20:47, Minchan Kim wrote:
> alloc_contig_range is usually used on cma area or movable zone.
> It's critical if the page migration fails on those areas so
> dump more debugging message.

I disagree with this statement. alloc_contig_range is not a reliable
allocator. Any user, be it CMA or direct users of alloc_contig_range
have to deal with allocation failures. Debugging information can be
still useful but considering migration failures critical is
overstatement to say the least.

> page refcount, mapcount with page flags on dump_page are
> helpful information to deduce the culprit. Furthermore,
> dump_page_owner was super helpful to find long term pinner
> who initiated the page allocation.
>
> Admin could enable the dump like this(by default, disabled)
>
> echo "func dump_migrate_failure_pages +p" > control
>
> Admin could disable it.
>
> echo "func dump_migrate_failure_pages =_" > control

My original idea was to add few pr_debug and -DDYNAMIC_DEBUG_MODULE for
page_alloc.c. It makes sense to enable a whole bunch at once though.
The naming should better reflect this is alloc_contig_rage related
because the above sounds like a generic migration failure thing.

Somebody more familiar with the dynamic debugging infrastructure needs
to have a look but from from a quick look it seems ok.

Do we really need all the ugly ifdefery, though? Don't we want to have
this compiled in all the time and just rely on the static branch managed
by the dynamic debugging framework?

[...]
> +void dump_migrate_failure_pages(struct list_head *page_list)
> +{
> + DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
> + "migrate failure");
> + if (DYNAMIC_DEBUG_BRANCH(descriptor) &&
> + alloc_contig_ratelimit()) {
> + struct page *page;
> +
> + WARN(1, "failed callstack");
> + list_for_each_entry(page, page_list, lru)
> + dump_page(page, "migration failure");
> + }

Apart from the above, do we have to warn for something that is a
debugging aid? A similar concern wrt dump_page which uses pr_warn and
page owner is using even pr_alert.
Would it make sense to add a loglevel parameter both into __dump_page
and dump_page_owner?
--
Michal Hocko
SUSE Labs