So whatever we do, this should in general not be the kernel soleI believe you are stating that Designated Movable Blocks should only be
decision to make this memory any special and let ZONE_MOVABLE manage it.
created as a result of special configuration (e.g. kernel parameters,
devicetree, ...). I would agree with that. Is that what you intended by
this statement, or am I missing something?
Broadcom memory controllers do support configurable real-time scheduling
only be located at the end of addressable memory then it will always be
located on MEMC1 of a 7278 system. This will create a tendency for user
space accesses to consume more bandwidth on the MEMC1 memory controller
and kernel space accesses to consume more bandwidth on MEMC0. A more
even distribution of ZONE_MOVABLE memory between the available memory
controllers in theory makes more memory bandwidth available to user
space intensive loads.
Sorry to be dense, is this also about different memory access latency or
just memory bandwidth?
with bandwidth guarantees for different memory clients so I suppose this
is a fair question. However, the expectation here is that the CPUs would
have equivalent access latencies, so it is really just about memory
bandwidth for the CPUs.
To the extent that this implementation only supports creating DesignatedI believe my changes address all such reliance, but if you are aware of
Unfortunately, the historical monotonic layout of zones would
mean that if the lowest addressed memory controller contains
ZONE_MOVABLE memory then all of the memory available from
memory controllers at higher addresses must also be in the
ZONE_MOVABLE zone. This would force all kernel memory accesses
onto the lowest addressed memory controller and significantly
reduce the amount of memory available for non-movable
allocations.
We do have code that relies on zones during boot to not overlap within a
single node.
something I missed please let me know.
One example I'm aware of is drivers/base/memory.c:memory_block_add_nid()
/ early_node_zone_for_memory_block().
If we get it wrong, or actually have memory blocks that span multiple
zones, we can no longer offline these memory blocks. We really wanted to
avoid scanning the memmap for now and it seems to get the job done in
environments we care about.
Movable Blocks in boot memory and boot memory does not generally support
offlining, I wouldn't expect this to be an issue. However, if for some
reason offlining boot memory becomes desirable then we should use
dmb_intersects() along with zone_intersects() to take the appropriate
action. Based on the current usage of zone_intersects() I'm not entirely
sure what the correct action should be.
I would say that is open for debate. The implementations would be
That may be true, but I found it superior to the 'sticky' movable
The main objective of this patch set is therefore to allow a
block of memory to be designated as part of the ZONE_MOVABLE
zone where it will always only be used by the kernel page
allocator to satisfy requests for movable pages. The term
Designated Movable Block is introduced here to represent such a
block. The favored implementation allows modification of the
Sorry to say, but that term is rather suboptimal to describe what you
are doing here. You simply have some system RAM you'd want to have
managed by ZONE_MOVABLE, no?
terminology put forth by Mel Gorman ;). I'm happy to entertain
alternatives, but they may not be as easy to find as you think.
Especially the "blocks" part is confusing. Movable pageblocks? Movable
Linux memory blocks?
Note that the sticky movable *pageblocks* were a completely different
concept than simply reusing ZONE_MOVABLE for some memory ranges.
"completely different" but the objectives could be quite similar.
There appear to be a number of people that are interested in the concept
of memory that can only contain data that tolerates relocation for
various potentially non-competing reasons.
Fundamentally, the concept of MIGRATE_MOVABLE memory is useful to allow
competing user space processes to share limited physical memory supplied
by the kernel. The data in that memory can be relocated elsewhere by the
kernel when the process that owns it is not executing. This movement is
typically not observable to the owning process which has its own address
space.
The kernel uses MIGRATE_UNMOVABLE memory to protect the integrity of its
address space, but of course what the kernel considers unmovable could
in fact be moved by a hypervisor in a way that is analogous to what the
kernel does for user space.
For maximum flexibility the Linux memory management allows for
converting the migratetype of free memory to help satisfy requests to
allocate pages of memory through a mechanism I will call "fallback". The
concepts of sticky movable pageblocks and ZONE_MOVABLE have the common
objective of preventing the migratetype of pageblocks from getting
converted to anything other than MIGRATE_MOVABLE, and this is what makes
the memory special.
I agree with Mel Gorman that zones are meant to be about address induced
limitations, so using a zone for the purpose of breaking the fallback
mechanism of the page allocator is a misuse of the concept. A new
migratetype would be more appropriate for representing this change in
how fallback should apply to the pageblock because the desired behavior
has nothing to do with the address at which the memory is located. It is
entirely reasonable to desire "sticky" movable behavior for memory in
any zone. Such a solution would be directly applicable to our multiple
memory controller use case, and is really how Designated Movable Blocks
should be imagined.
However, I also recognize the efficiency benefits of using a
ZONE_MOVABLE zone to manage the pages that have this "sticky" movable
behavior. Introducing a new sticky MIGRATE_MOVABLE migratetype adds a
new free_list to every free_area which increases the search space and
associated work when trying to allocate a page for all callers.
Introducing ZONE_MOVABLE reduces the search space by providing an early
separation between searches for movable and non-movable allocations. The
classic zone restrictions weren't a good fit for multiple memory
controllers, but those restrictions were lifted to overcome similar
issues with memory_hotplug. It is not that Designated Movable Blocks
want to be in ZONE_MOVABLE, but rather that ZONE_MOVABLE provides a
convenience for managing the page allocators use of "sticky" movable
memory just like it does for memory hotplug. Dumping the memory in
Designated Movable Blocks into the ZONE_MOVABLE zone allows an existing
mechanism to be reused, reducing the risk of negatively impacting the
page allocator behavior.
There are some subtle distinctions between Designated Movable Blocks and
the existing ZONE_MOVABLE zone. Because Designated Movable Blocks are
reserved when created they are protected against any early boot time
kernel reservations that might place unmovable allocations in them. The
implementation continues to track the zone_movable_pfn as the start of
the "classic" ZONE_MOVABLE zone on each node. A Designated Movable Block
can overlap any other zone including the "classic" ZONE_MOVABLE zone.
I wasn't familiar with these kernel mechanisms and did enjoy reading
Doing it the DAX/CXL way would be to expose these memory ranges as
daxdev instead, and letting the admin decide how to online these memory
ranges when adding them to the buddy via the dax/kmem kernel module.
That could mean that your booting with memory on MC0 only, and expose
memory of MC1 via a daxdev, giving the admin the possibility do decide
to which zone the memory should be onlined too.
That would avoid most kernel code changes.
about the somewhat oxymoronic "volatile-use of persistent memory" that
is dax/kmem, but this isn't performance differentiated RAM. It really is
just System RAM so this degree of complexity seems unwarranted.
I'm not sure that is a wholly fair characterization (or maybe I just
One of the "other opportunities" for Designated Movable Blocks is to
Why do we have to start using ZONE_MOVABLE for them?
allow CMA to allocate from a DMB as an alternative. This would allow
current users to continue using CMA as they want, but would allow users
(e.g. hugetlb_cma) that are not sensitive to the allocation latency to
let the kernel page allocator make more complete use (i.e. waste less)
of the shared memory. ZONE_MOVABLE pageblocks are always MIGRATE_MOVABLE
so the restrictions placed on MIGRATE_CMA pageblocks are lifted within a
DMB.
The whole purpose of ZONE_MOVABLE is that *no* unmovable allocations end
up on it. The biggest difference to CMA is that the CMA *owner* is able
to place unmovable allocations on it.
hope that's the case :). I would agree that the Linux page allocator
can't place any unmovable allocations on it. I expect that people locate
memory in ZONE_MOVABLE for different purposes. For example, the memory
hotplug users ostensibly place memory their so that any data on the hot
plugged memory can be moved off of the memory prior to it being hot
unplugged. Unplugging the memory removes the memory from the
ZONE_MOVABLE zone, but it is not materially different from allocating
the memory for a different purpose (perhaps in a different machine).
Conceptually, allowing a CMA allocator to operate on a Designated
Movable Block of memory that it *owns* is also removing that memory from
the ZONE_MOVABLE zone. Issues of ownership should be addressed which is
why these "other opportunities" are being deferred for now, but I do not
believe such use is unreasonable. Again, Designated Movable Blocks are
only allowed in boot memory so there shouldn't be a conflict with memory
hotplug. I believe the same would apply for hugetlb_cma.
Perhaps it is more helpful to think of a Designated Movable Block as a
Using ZONE_MOVABLE for unmovable allocations (hugetlb_cma) is not
acceptable as is.
Using ZONE_MOVABLE in different context and calling it DMB is very
confusing TBH.
block of memory whose migratetype is not allowed to be changed from
MIGRATE_MOVABLE (i.e. "sticky" migrate movable). The fact that
ZONE_MOVABLE is being used to achieve that is an implementation detail
for this commit set. In the same way that memory hotplug is the concept
of adding System RAM during run time, but placing it in ZONE_MOVABLE is
an implementation detail to make it easier to unplug.
Best not let Mel hear you suggesting another zone;).
Just a note that I described the idea of a "PREFER_MOVABLE" zone in the
past. In contrast to ZONE_MOVABLE, we cannot run into weird OOM
situations in a ZONE misconfiguration, and we'd end up placing only
movable allocations on it as long as we can. However, especially
gigantic pages could be allocated from it. It sounds kind-of more like
what you want -- and maybe in combination of daxctl to let the user
decide how to online memory ranges.
Understood.
And just to make it clear again: depending on ZONE_MOVABLE == only user
space allocations is not future proof.