Re: cxl/region.c improvements and DAX/Hotplug plumbing

From: David Hildenbrand (Arm)

Date: Wed Mar 18 2026 - 04:53:37 EST


On 1/23/26 01:28, Gregory Price wrote:
> On Thu, Jan 22, 2026 at 11:14:15PM +0100, David Hildenbrand (Red Hat) wrote:
>> Some of that (especially the interaction with core-mm) feels like it would
>> be a good fit to discuss with he wider MM community in one of the bi-weekly
>> mm meeting. (CCing David R.)
>>
>
> There is a Monthly Linux-DAX meeting, and a Monthly Linux-CXL meeting,
> obviously this is a lot of cross-attendance.
>
> Happy to attend additional discussion. I was trying to shore up some of
> the cxl-region plumbing aspects before going wider.

Oh hey, I found an unanswered mail in my inbox :)

Sorry for stumbling over this that late.

>
>>> - hiding memory blocks? (discussed in last meeting)
>>
>> What is that about and what was the result of that discussion? :)
>>
>
> It was just a question as to whether memory blocks are still useful
> if the intent is to provide a collective hotplug interface. I don't
> think there are any real proposals for this, just making note of it.

Okay, thanks.

>
>>> Solution 2: Make a dedicated sysram_region with policy
>>
>> What kind of region would that be?
>
> plumbing between regionN and dax_region kobjects
>
> right now the kobject relationship is:
>
> region0 <- cxl driver created kobject
> └dax_region0 <- default selects IORESOURCE_DAX_KMEM
> └dax0.0 <- auto-probes on discovery
>
> But there is baggage in the existing plumbing:
>
> 1) dax/cxl.c => hard-coded IORESOURCE_DAX_KMEM for dax_region
> 2) dax/bus.c => devdax is probed on discovery w/o manual bind step
> 3) cxl/core/region.c => BIOS-configured CXL regions automatically
> generate a dax_region, and this auto-creates a dax_kmem device
> which is subject to system-wide MHP policy.
>
> This creates a backwards compatibility headache.

Agreed.

>
> The same auto-plumbing is used in the manual creation path, so:
>
> echo regionN > cxl/decoder0.0/create_ram_region
> /* program decoders */
> echo regionN > cxl/drivers/region/bind
>
> will pump the whole thing directly into dax_kmem and auto-online
> according to system default MHP policy. There's no intermediate
> step in which the user can define preferences (unless you add
> them as attributes to regionN - which is another option).
>
> Adding the intermediate object:
>
> regionN
> └sysram_region <- encodes policy like hotplug and dax drv
> └dax_regionN <- which would be passed here on creation
> └dax0.0
>
> lets the cxl-cli command to be more expressive:
> `cxl-cli create-region -t ram --driver=sysram` => kmem
> `cxl-cli create-region -t ram --driver=dax` => device_dax
>
> and would change the sysfs pattern to
> echo regionN > cxl/decoder0.0/create_ram_region
> echo regionN > cxl/drivers/sysram_region/bind
> echo online_movable > cxl/devices/dax_regionN/hotplug
> echo dax_regionN > cxl/drivers/dax_region/bind
>
> and gives the user a chance to configure a policy before the region
> is pumped all the way through to the endpoint dax driver.

Would that still be backwards-compatible?


>>> Solution 2: dedicated sysram_region driver w/ or w/o DAX.
>>> Can support sparseness w/o DAX (see DCD problem)
>>> Could use DAX for tagged DCD regions.
>>> Tradeoff: May duplicate some DAX logic.
>>
>> How would that look like?
>
> For untagged extents w/o dax:
>
> sysram_region->nr_range
> sysram_region->ranges[0 : nr_range-1]
>
> Extents in this list would be hotpluggable individually and
> could be returned to the DCD device individually
>
> sysram_region.c code would call hotplug directly, not via dax.
> - hence, this duplicates some DAX logic
>
> The above just prevents needlessly creating dax-indirection for sysram
> extents with only one destination: add_memory_driver_managed()
>
>
> For tagged extents:
> sysram_region->nr_regions
> sysram_region->dax_regions[0 : nr_regions]
>
> A set of tagged extents would only be hotpluggable as a group
> and could only be returned to the DCD as a group.
>
> it would also expose: dax0.0/uuid <- contains the tag


Interesting.

>
>
> from this you get a cli command like
>
> cxl release-extents regionN [--id=X] [--tag=Y]
>
> translates to something like
>
> echo "release" > regionN/sysram_region/extents/[X,Y]
>
> Something like this.
>
>>>
>>> Solution 4: Prevent non-driver actions from changing state.
>>> Also solves hotplug protection problem (see next)
>>
>> The crucial part is solving what you spelled out in the description: "race
>> conditions". Forbidding someone to re-configure system RAM sounds
>> unnecessary.
>>
>> For example, I use it a lot for testing issues with page migration while
>> offlining memory from ZONE_MOVABLE.
>>
>
> For most use-cases yes. For something like FAMFS (distributed shared
> memory), one system onlining a block as kmem could be potentially
> destructive to an entirely separate physical server.

Right. But shouldn't we fail this already at the add_memory() stage?
Sounds like during onlining is a bit too late. Conceptually, the hotplug
as sysram was already wrong for famfs, or am I wrong?


>
>>> Example: Slow(er) memory
>>> Some memory is "just memory", but might be particularly slow and
>>> intended for use as a filesystem backend or as only a demotion
>>> target. Otherwise its allocated / mapped like any other memory,
>>> but it still required isolation so isolated to the demotion path
>>> and not a fallback allocation target
>>
>> That doesn't quite fit the description of N_PRIVATE_MEMORY, though. Or what
>> am I missing?
>
> I suppose we could also explore a per-node fallback policy to accomplish
> this - but there was also the LPC talk about trying to deprecate that
> entirely.

I'm looking forward to that LPC talk!

--
Cheers,

David