Re: [PATCH] mm: be more verbose for alloc_contig_range faliures

From: David Hildenbrand
Date: Fri Feb 19 2021 - 05:39:05 EST


On 19.02.21 11:02, Michal Hocko wrote:
On Fri 19-02-21 10:30:12, David Hildenbrand wrote:
On 19.02.21 10:28, Michal Hocko wrote:
On Thu 18-02-21 08:19:50, Minchan Kim wrote:
On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
On 18.02.21 10:35, Michal Hocko wrote:
On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
On 18.02.21 09:56, Michal Hocko wrote:
On Wed 17-02-21 08:36:03, Minchan Kim wrote:
alloc_contig_range is usually used on cma area or movable zone.
It's critical if the page migration fails on those areas so
dump more debugging message like memory_hotplug unless user
specifiy __GFP_NOWARN.

I agree with David that this has a potential to generate a lot of output
and it is not really clear whether it is worth it. Page isolation code
already has REPORT_FAILURE mode which currently used only for the memory
hotplug because this was just too noisy from the CMA path - d381c54760dc
("mm: only report isolation failures when offlining memory").

Maybe migration failures are less likely to fail but still.

Side note: I really dislike that uncontrolled error reporting on memory
offlining path we have enabled as default. Yeah, it might be useful for
ZONE_MOVABLE in some cases, but otherwise it's just noise.

Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded

Anyway we can discuss this in a separate thread but I think this is not
a representative workload.

Sure, but the essence is "this is noise", and we'll have more noise on
alloc_contig_range() as we see these calls more frequently. There should be
an explicit way to enable such *debug* messages.

alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
Why shouldn't people use it if they don't care the failure?
Semantically, it makes sense to me.

Well, alloc_contig_range doesn't really have to implement all the gfp
flags. This is a matter of practicality. alloc_contig_range is quite
different from the page allocator because it is to be expected that it
can fail the request. This is avery optimistic allocation request. That
would suggest that complaining about allocation failures is rather
noisy.

Now I do understand that some users would like to see why those
allocations have failed. The question is whether that information is
generally useful or it is more of a debugging aid. The amount of
information is also an important aspect. It would be rather unfortunate
to dump thousands of pages just because they cannot be migrated.

I do not have a strong opinion here. We can make all alloc_contig_range
users use GFP_NOWARN by default and only skip the flag from the cma
allocator but I am slowly leaning towards (ab)using dynamic debugging
infrastructure for this.

Just so I understand what you are referring to - trace points?

Documentation/admin-guide/dynamic-debug-howto.rst
but I have to confess I have 0 experience with this.

Me too, but it does sound like a good fit.

--
Thanks,

David / dhildenb