Re: [RFC PATCH] memory,memory_hotplug: allow restricting memory blocks to zone movable

From: David Hildenbrand (Red Hat)

Date: Thu Jan 08 2026 - 10:32:40 EST


On 1/8/26 08:31, Hannes Reinecke wrote:
On 1/6/26 21:22, David Hildenbrand (Red Hat) wrote:
On 1/6/26 20:59, Gregory Price wrote:
On Tue, Jan 06, 2026 at 07:38:54PM +0100, David Hildenbrand (Red Hat)
wrote:
On 1/6/26 19:06, Gregory Price wrote:
On Tue, Jan 06, 2026 at 06:52:11PM +0100, David Hildenbrand (Red
Hat) wrote:
On 1/6/26 17:58, Gregory Price wrote:

Fair, I'll revist this once Hannes gets a chance to chime in.

This was effective at getting the discussion started though :P

Hehe, yes.

Another thing to look into would be to provide a way for ndctl to just
add+online the memory in one shot, without having to go back to walking
memory blocks to online them etc.


I think it's the opposite: offline+remove needing to be done in one step
while holding the hotplug lock.  Right now, I think you have to do
something like

That's what I note below, yes.

For the udev vs. ndctl race to be handled in a
good way you need add+online be done in one operation.


daxctl offline-memory ...
daxctl destroy ...

You can't destroy and have it offline the memory for you in one go IIRC.

As noted below, we have offline_and_remove_memory().

I added the comment:

/*
 * Try to offline and remove memory. Might take a long time to finish
in case
 * memory is still in use. Primarily useful for memory devices that
logically
 * unplugged all memory (so it's no longer in use) and want to offline
+ remove
 * that memory.
 */

Nothing speaks against letting dax use that, but the tricky part is that
offlining might take forever, so one has to be prepared to handle that
(and letting user space cancel the operation).

And for dax devices that consist of multiple ranges, it can be "fun" having
some regions removed and others not.

Something to think about :)

We had this discussion at LPC. The current interface of having to
individually offline every single memory block is not very
user-friendly. While it provides the best possible granularity, it
really only makes sense for virtual environments where you _can_
hotplug individual blocks.

Yes.

For hardware-based scenarios memory will always be removed in
larger entities (eg the CXL device), and it's always an 'all-or-nothing'
scenario; you cannot remove individual memory blocks on a CXL device.
So there the memory block abstraction makes less sense, and it
would be good to have a single 'knob' to remove the entire CXL
device and all memory blocks on it.
Sure, it might take some time, but one doesn't need to worry about
restoring the original state if the operation on one block fails.

That's not what I was getting at:

offline_and_remove_memory() can be called on large regions, and it properly handles whether we have to back out because some offlining failed.

The issue arises once dax would have to call offline_and_remove_memory() multiple times, on non-contiguous areas. Of course, we could handle that by providing an interface that consumes multiple memory ranges.

For the DAX use case, I thing we'd really want a way to just use

* add_and_online_memory() [does not exist yet, but ppc does something
similar]
* offline_and_remove_memory()

And not have user space to worry otherwise about onlining/offlining of memory at all.

Of course, that will require some new plumbing for ndctl to make use of this functionality.

--
Cheers

David