Re: [PATCH] mm, meminit: Serially initialise deferred memory if trace_buf_size is specified

From: Pavel Tatashin
Date: Fri Nov 17 2017 - 13:20:10 EST


On Thu, Nov 16, 2017 at 5:06 AM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> 4. Put a check into the page allocator slowpath that triggers serialised
> init if the system is booting and an allocation is about to fail. It
> would be such a cold path that it would never be noticable although it
> would leave dead code in the kernel image once boot had completed

Hi Mel,

The forth approach is the best as it is seamless for admins and
engineers, it will also work on any system configuration with any
parameters without any special involvement.

This approach will also address the following problem:
reset_deferred_meminit() has some assumptions about how much memory we
will need beforehand may break periodically as kernel requirements
change. For, instance, I recently reduced amount of memory system hash
tables take on large machines [1], so the comment in that function is
already outdated:
/*
* Initialise at least 2G of a node but also take into account that
* two large system hashes that can take up 1GB for 0.25TB/node.
*/

With this approach we could always init a very small amount of struct
pages, and allow the rest to be initialized on demand as boot requires
until deferred struct pages are initialized. Since, having deferred
pages feature assumes that the machine is large, there is no drawback
of having some extra byte of dead code, especially that all the checks
can be permanently switched of via static branches once deferred init
is complete.

The second benefit that this approach may bring is the following: it
may enable to add a new feature which would initialize struct pages on
demand later, when needed by applications. This feature would be
configurable or enabled via kernel parameter (not sure which is
better).

if (allocation is failing)
if (uninit struct pages available)
init enought to finish alloc

Again, once all pages are initialized, the checks will be turned off
via static branching, so I think the code can be shared.

Here is the rationale for this feature:

Each physical machine may run a very large number of linux containers.
Steve Sistare (CCed), recently studied how much memory each instance
of clear container is taking, and it turns out to be about 125 MB,
when containers are booted with 2G of memory and 1 CPU. Out of those
125 MB, 32 MB is consumed by struct page array as we use 64-bytes per
page. Admins tend to be protective in the amount of memory that is
configured, therefore they may over-commit the amount of memory that
is actually required by the container. So, by allowing struct pages to
be initialized only on demand, we can save around 25% of the memory
that is consumed by fresh instance of container. Now, that struct
pages are not zeroed during boot [2], and if we will implement the
forth option, we can get closer to implementing a complete on demand
struct page initialization.

I can volunteer to work on these projects.

[1] https://patchwork.kernel.org/patch/9599545/
[2] https://lwn.net/Articles/734374

Thank you,
Pavel