Re: [PATCH] mm, meminit: Serially initialise deferred memory if trace_buf_size is specified

From: Mel Gorman
Date: Thu Nov 16 2017 - 05:06:53 EST

On Thu, Nov 16, 2017 at 09:54:33AM +0100, Michal Hocko wrote:
> On Wed 15-11-17 14:17:52, YASUAKI ISHIMATSU wrote:
> > Hi Michal and Mel,
> >
> > To reproduce the issue, I specified the large trace buffer. The issue also occurs with
> > trace_buf_size=12M and movable_node on 4.14.0.
> This is still 10x more than the default. Why do you need it in the first
> place? You can of course find a size that will not fit into the initial
> memory but I am questioning why do you want something like that during
> early boot in the first place.

This confused me as well. I couldn't think of a sensible use-case for
increasing the buffer other than stuffing trace_printk in multiple places
for debugging purposes. Even in such cases, it would be feasible to disable
the option in Kconfig just to have the large buffer. Otherwise, just wait
until the system is booted and set if from userspace.

The lack of a sensible use-case is why I took a fairly blunt approach to
the problem. Keep it (relatively) simple and all that.

> The whole deferred struct page allocation operates under assumption
> that there are no large page allocator consumers that early during
> the boot process.


> If this assumption is not correct then we probably
> need a generic way to describe this. Add-hoc trace specific thing is
> far from idea, imho. If anything the first approach to disable the
> deferred initialization via kernel command line option sounds much more
> appropriate and simpler to me.

So while the first approach was blunt, there are multiple other options.

1. Parse trace_buf_size in __setup to record the information before
page alloc init starts. Take that into account in reset_deferred_meminit
to increase the amount of memory that is serially initialised

2. Have tracing init with a small buffer and then resize it after
page_alloc_init_late. This modifies tracing a bit but most of the
helpers that are required are there. It would be more complex but
it's doable

3. Add a kernel command line parameter that explicitly disables deferred
meminit. We used to have something like this but it was never merged
as we should be able to estimate how much memory is needed to boot.

4. Put a check into the page allocator slowpath that triggers serialised
init if the system is booting and an allocation is about to fail. It
would be such a cold path that it would never be noticable although it
would leave dead code in the kernel image once boot had completed

However, it would be preferable by far to have a sensible use-case as to
why trace_buf_size= would be specified with a large buffer. It's hard to
know what level of complexity is justified without it.

Mel Gorman