[PATCH v4 0/9] mm: introduce Designated Movable Blocks

From: Doug Berger
Date: Fri Mar 10 2023 - 19:39:50 EST


This is essentially a resubmission of v3 rebased with a
rewritten cover letter to hopefully clarify the submission based
on feedback and follow-on discussion. The individual patches
have not materially changed.

The Linux Memory Management system (MM) has long supported the
concept of movable memory. It takes advantage of address
abstraction to allow the data held in physical memory to be
moved to a different physical address or other form of storage
without the user of the abstracted (i.e. virtual) address
needing to be aware. This is generally the foundation of user
space memory and the basic service the kernel provides to
applications.

On the other hand, the kernel itself is generally not tolerant
of the movement of data that it accesses so most of its usage is
unmovable memory. It may be useful to understand that this
terminology is relative to the kernel's perspective and that
what the kernel considers unmovable memory may in fact be moved
by a hypervisor that hosts the kernel, but an additional address
abstraction must exist to keep the kernel unaware of such
movement.

The MM supports the conversion of free memory between MOVABLE
and UNMOVABLE (and other) migration types to allow better
sharing of memory resources. More recently, the MM introduced
"movablecore" memory that should never be made UNMOVABLE. As an
implementation detail "movablecore" memory introduced the
ZONE_MOVABLE zone to manage this type of memory and significant
progress has been made to ensure the movability of memory in
this zone with the few exceptions now documented in
include/linux/mmzone.h.

"Movablecore" memory can support multiple use cases including
dynamic allocation of hugetlbfs pages, but an imbalance of
"movablecore" memory and kernel memory can lead to serious
consequences for kernel operation which is why the kernel
parameter includes the warning "the administrator must be
careful that the amount of memory usable for all allocations is
not too small."

Designated Movable Blocks represent a generic extension of the
"movablecore" concept to allow specific blocks of memory to be
designated part of the "movablecore" to provide support for
additional use cases. For example, it could have been/could
still be used to support hot unplugging of memory. A very
similar concept was proposed in [1] for that purpose, and
revised in [2], but ultimately a more use case specific
implementation of the movable_node parameter was accepted. That
implementation is dependent on NUMA, ACPI, and SRAT tables which
narrow its usefullness. Designated Movable Blocks allow for the
same type of discontiguous and non-monotonic configuration of
ZONE_MOVABLE for systems whether or not they support NUMA, ACPI,
or SRAT tables. Specifically this feature is desired by users of
the arm64 Android GKI common kernel on Broadcom SoCs where NUMA
is not available. These patches make minimal additions to
existing code to offer a controllable "movablecore" feature to
those systems.

Like all "movablecore" memory there are no Designated Movable
Blocks created by default. They are only created when specified
and the warning on the "movablecore" kernel parameter remains
just as relevant.

The key feature of "movablecore" memory is that any allocations
of the memory by the kernel page allocator must be movable and
this has the follow-on effect that GFP_MOVABLE allocation
requests look to "movablecore" memory first. This prioritizes
the use of "movablecore" memory by user processes though the
kernel can conceivably use the memory as long as movability can
be preserved.

One use case of interest to customers of Broadcom SoCs with
multiple memory controllers is for improved memory bandwidth
utilization for multi-threaded user space dominant workloads.
Designated Movable Blocks can be located on each memory
controller and the page_alloc.shuffle=1 kernel parameter can be
applied to provide a simplistic software-based memory channel
interleaving of accesses from user space across the multiple
memory controllers. Experiments using this approach with a dummy
workload [3] on a BCM7278 dual memory controller system with 1GB
of RAM on each controller (i.e. 2GB total RAM) and using the
kernel parameters "movablecore=300M@0x60000000,300M@0x320000000
page_alloc.shuffle=1" showed a more than 20% performance
improvement over a system without this feature using either
"movablecore=600M" or no "movablecore" kernel parameter.

Another use case of interest is to add broader support for the
"reusable" parameter for reserved-memory device tree nodes. The
Designated Movable Block extension of movablecore would allow
designation of the location as well as ownership of the block.
A device driver that owns a reusable reserved-memory would own
the underlying portion of a Designated Movable Block and could
reclaim memory from the OS for use exclusively by the device on
demand in a manner similar to memory hot unplugging. The
existing alloc/free_contig_range functions could be used to
support this or a different API could be developed. This use
case is mentioned for consideration, but an implementation is
not part of this submission.

There have also been efforts to reduce the amounts of memory
CMA holds in reserve (e.g. [4]). Adding the ability to place a
CMA pool in a Designated Movable Block could offer an option to
improve memory utilization when increased allocation latency can
be tolerated, but again such an implementation is not part of
this submission.

Changes in v4:
- rewrote the cover letter in an attempt to provide clarity
and encourage review.
- rebased to akpm-mm/master (i.e. Linux 6.3-rc1).

Changes in v3:
- removed OTHER OPPORTUNITIES and NOTES from the cover letter.
- prevent the creation of empty zones instead of adding extra
info to zoneinfo.
- size the ZONE_MOVABLE span to the minimum necessary to cover
pages within the zone to be more intuitive.
- removed "real" from variable names that were consolidated.
- rebased to akpm-mm/master (i.e. Linux 6.1-rc1).

Changes in v2:
- first three commits upstreamed separately.
- commits 04-06 submitted separately.
- Corrected errors "Reported-by: kernel test robot <lkp@xxxxxxxxx>"
- Deferred commits after 15 to simplify review of the base
functionality.
- minor reorganization of commit 13.

v3: https://lore.kernel.org/lkml/20221020215318.4193269-1-opendmb@xxxxxxxxx/
v2: https://lore.kernel.org/linux-mm/20220928223301.375229-1-opendmb@xxxxxxxxx/
v1: https://lore.kernel.org/linux-mm/20220913195508.3511038-1-opendmb@xxxxxxxxx/

[1] https://lwn.net/Articles/543790/
[2] https://lore.kernel.org/all/1374220774-29974-1-git-send-email-tangchen@xxxxxxxxxxxxxx/
[3] https://lore.kernel.org/lkml/342da4ea-d04a-996c-85c4-3065dd4dc01f@xxxxxxxxx/
[4] https://lore.kernel.org/linux-mm/20230131071052.GB19285@xxxxxxxxxxxxxxxxxxxxxxxxxxx/

Doug Berger (9):
lib/show_mem.c: display MovableOnly
mm/page_alloc: calculate node_spanned_pages from pfns
mm/page_alloc: prevent creation of empty zones
mm/page_alloc.c: allow oversized movablecore
mm/page_alloc: introduce init_reserved_pageblock()
memblock: introduce MEMBLOCK_MOVABLE flag
mm/dmb: Introduce Designated Movable Blocks
mm/page_alloc: make alloc_contig_pages DMB aware
mm/page_alloc: allow base for movablecore

.../admin-guide/kernel-parameters.txt | 14 +-
include/linux/dmb.h | 29 +++
include/linux/gfp.h | 5 +-
include/linux/memblock.h | 8 +
lib/show_mem.c | 2 +-
mm/Kconfig | 12 ++
mm/Makefile | 1 +
mm/cma.c | 15 +-
mm/dmb.c | 91 +++++++++
mm/memblock.c | 30 ++-
mm/page_alloc.c | 188 +++++++++++++-----
11 files changed, 338 insertions(+), 57 deletions(-)
create mode 100644 include/linux/dmb.h
create mode 100644 mm/dmb.c

--
2.34.1