DCD: Add support for Dynamic Capacity Devices (DCD)
From: Anisa Su
Date: Thu Jun 25 2026 - 07:27:15 EST
Table of Contents
=================
1. Changes since v10
2. Background
3. Patch organization
4. Noteable
5. Testing
This series branch: https://github.com/anisa-su993/anisa-linux-kernel/tree/dcd-v11-06-23-26
NDCTL branch: https://github.com/anisa-su993/anisa-ndctl/tree/dcd-2026-06-24
v10: https://lore.kernel.org/linux-cxl/ajuMJi5nTQRB_ZP0@AnisaLaptop.localdomain/T/#mfdfc28c829071204333824c542ca3af4170dafb4
Changes since v10
=================
The overall architecture and semantics are unchanged; v11 is review
fixes, naming/ABI corrections, and irons out locking/concurrency edge cases
between the CXL and DAX layers.
Naming / ABI:
- Renamed dynamic_ram_a to dynamic_ram_1 throughout (endpoint-decoder
mode, the partition sysfs name, and enum CXL_PARTMODE_DYNAMIC_RAM_1),
matching the numbered-partition convention.
- Sharable extent sequence numbers are now a dense 0..n-1 (previously
1..n); the CXL validation path and the DAX claim path enforce the same
0..n-1 invariant.
- The DAX 'uuid' attribute reads back the null UUID (all-zeroes) when
untagged rather than "0".
Recovery and lifecycle:
- Creating a region over a DC partition now reads the device's
already-accepted extents at probe time. cxl_dax_region probe
and recovered extents are not re-acknowledged via Add-DC-Response. New
add events are deferred until the initial scan completes so a tag already in use
is never registered twice.
- Per-tag-group add and release of DAX resources are atomic (all-or-none). Previously,
adding a tag group only locked for each extent addition. The lock is widened to
the entire group.
- Upper bound of 100 pending extents to prevent 20-second timeout for the More
chain to close from being infinitely refreshed (unlikely unless device is malicious)
Robustness (device-supplied data is treated as untrusted):
- Various device-supplied payload sizing checks, overflow/underflow, etc.
- Fix places where we need to check for native_cxl to avoid overriding
BIOS-owned events
Documentation:
- Small changes to reflect dynamic_ram_a to dynamic_ram_1 change and the
sequence num change (0...n-1 instead of 1...n)
- Bump kver to 7.3 and date for sysfs attribute documentation
Signoffs/Tags:
- updated Ira's signoffs and authored-by to use iweiny@xxxxxxxxxx
- update Jonathan Cameron's email to jic23@xxxxxxxxxx for various review tags
- update Fan's email to nifan.cxl@xxxxxxxxx
- update Dan's email to djbw@xxxxxxxxxx
Background
=============
A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory
device that allows memory capacity within a region to change
dynamically without the need for resetting the device, reconfiguring
HDM decoders, or reconfiguring software DAX regions.
One of the biggest anticipated use cases for Dynamic Capacity is to
allow hosts to dynamically add or remove memory from a host within a
data center without physically changing the per-host attached memory nor
rebooting the host.
The general flow for the addition or removal of memory is to have an
orchestrator coordinate the use of the memory. Generally there are 5
actors in such a system, the Orchestrator, Fabric Manager, the Logical
device, the Host Kernel, and a Host User.
An example work flow is shown below.
Orchestrator FM Device Host Kernel Host User
| | | | |
|-------------- Create region ------------------------>|
| | | | |
| | | |<-- Create ----|
| | | | Region |
| | | |(dynamic_ram_1)|
|<------------- Signal done ---------------------------|
| | | | |
|-- Add ----->|-- Add --->|--- Add --->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Accept -|<- Accept -| |
| | Extent | Extent | |
| | | |<- Create -----|
| | | | DAX dev |-- Use memory
| | | | | |
| | | | | |
| | | |<- Release ----| <-+
| | | | DAX dev |
| | | | |
|<------------- Signal done ---------------------------|
| | | | |
|-- Remove -->|- Release->|- Release ->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Release-|<- Release -| |
| | Extent | Extent | |
| | | | |
|-- Add ----->|-- Add --->|--- Add --->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Accept -|<- Accept -| |
| | Extent | Extent | |
| | | |<- Create -----|
| | | | DAX dev |-- Use memory
| | | | | |
| | | |<- Release ----| <-+
| | | | DAX dev |
|<------------- Signal done ---------------------------|
| | | | |
|-- Remove -->|- Release->|- Release ->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Release-|<- Release -| |
| | Extent | Extent | |
| | | | |
|-- Add ----->|-- Add --->|--- Add --->| |
| Capacity | Extent | Extent | |
| | | |<- Create -----|
| | | | DAX dev |-- Use memory
| | | | | |
|-- Remove -->|- Release->|- Release ->| | |
| Capacity | Extent | Extent | | |
| | | | | |
| | | (Release Ignored) | |
| | | | | |
| | | |<- Release ----| <-+
| | | | DAX dev |
|<------------- Signal done ---------------------------|
| | | | |
| |- Release->|- Release ->| |
| | Extent | Extent | |
| | | | |
| |<- Release-|<- Release -| |
| | Extent | Extent | |
| | | |<- Destroy ----|
| | | | Region |
| | | | |
Patch organization
==================
Device enablement and partition configuration:
- cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
- cxl/mem: Read dynamic capacity configuration from the device
- cxl/cdat: Gather DSMAS data for DCD partitions
- cxl/core: Enforce partition order/simplify partition calls
- cxl/mem: Expose dynamic ram 1 partition in sysfs
- cxl/port: Add 'dynamic_ram_1' to endpoint decoder mode
- cxl/region: Add DC DAX region support
Event and interrupt plumbing:
- cxl/events: Split event msgnum configuration from irq setup
- cxl/pci: Factor out interrupt policy check
- cxl/mem: Configure dynamic capacity interrupts
- cxl/core: Return endpoint decoder information from region search
- cxl/mem: Set up framework for handling DC Events
- cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains
Extent handling - add, release, and validation:
- cxl/extent: Handle DC Add Capacity events
- cxl/mem: Drop misaligned DCD extent groups
- cxl/extent: Validate DC extent partition
- cxl/mem: Enforce tag-group semantics
- cxl/extent: Handle DC Release Capacity events
- cxl/extent: Enforce cross-region tag uniqueness
- cxl/region/extent: Expose dc_extent information in sysfs
DAX resource surfacing and device model:
- cxl + dax: Surface dax_resources on DCD Add Capacity events
- cxl + dax: Release dax_resources on DCD Release Capacity events
- dax/bus: Factor out dev dax resize logic
- dax/bus: Add uuid sysfs attribute to dax devices
- dax/bus: Reject resize on DC dax devices and enforce 0-size creation
- dax/bus: Tag-aware uuid claim and show on DC dax devices
- cxl/region: Read existing extents on region creation
Tracing, test infrastructure, and documentation:
- cxl/mem: Trace Dynamic capacity Event Record
- tools/testing/cxl: Make event logs dynamic
- tools/testing/cxl: Add DC Regions to mock mem data
- Documentation/cxl: Document DCD extent handling and DC-backed DAX regions
Noteable
========
- A More=1 add chain is bounded by the 20s timeout and CXL_DC_MAX_PENDING_EXTENTS,
set to 100. Suggested by Sashiko as a defensive cap against a fabric manager
that never closes the chain. The value is arbitrary; feedback on it is welcome.
- Several Sashiko review comments assumed multiple host threads could process a
single DCD add event, or concurrently mutate one tag group, at the same
time. But I don't think that happens because DCD events for a memdev are delivered
and handled serially by that device's event-interrupt thread,
and a tag group is owned by exactly one memory device. Those comments
were therefore ignored. Please correct me if this assumption is wrong
so I can fix those.
Testing
=======
ndctl unit suite: built and run against the QEMU cxl_test mock with the
ndctl 'cxl' suite (branch dcd-2026-06-24): 16 of 17 tests pass and
cxl-features is skipped as unsupported, including cxl-dcd.sh and the
cxl-region-replay.sh crash-recovery test that exercises reading
pre-existing extents on region creation.
QEMU end-to-end: used Ali's QEMU patchset adding tag support
[1], with the below topology:
TOPO='-object memory-backend-file,id=cxl-mem1,mem-path=/tmp/t3_cxl1.raw,size=12G \
-object memory-backend-file,id=cxl-lsa1,mem-path=/tmp/t3_lsa1.raw,size=1G \
-device usb-ehci,id=ehci \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
-device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
-device cxl-type3,bus=cxl_rp_port0,id=cxl-dcd0,dc-regions-total-size=12G,num-dc-regions=1,sn=99 \
-device usb-cxl-mctp,bus=ehci.0,id=usb1,target=cxl-dcd0\
-machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=12G,cxl-fmw.0.interleave-granularity=1k'
The exact instructions are the same as the previous version, so I've truncated some details.
1. Boot the guest.
2. QMP object-add a tagged 8G memory-backend-ram
(tag 5be13bce-ae34-4a77-b6c3-16df975fcf1a).
3. cxl create-region -m -d decoder0.0 -w 1 -s 8G mem0 -t dynamic_ram_1
4. QMP cxl-add-dynamic-capacity (prescriptive, region 0, same tag)
injecting an 8G extent at offset 0.
5. The extent surfaces under the region: dax_region0/extent0.0 reports
offset 0x0, length 0x200000000, uuid 5be13bce-...
6. daxctl create-device -r region0 --uuid 5be13bce-... creates the 8G
devdax device.
We are also working with some internal teams to test on real hardware, so
I'll report any findings as we go.
References:
[1] https://lore.kernel.org/linux-cxl/20260325184259.366-1-alireza.sanaee@xxxxxxxxxx/T/#t
This series applies on the v7.1 tag (Linus' tree).
base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
Anisa Su (6):
cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains
cxl/mem: Enforce tag-group semantics
cxl/extent: Enforce cross-region tag uniqueness
dax/bus: Add uuid sysfs attribute to dax devices
dax/bus: Tag-aware uuid claim and show on DC dax devices
Documentation/cxl: Document DCD extent handling and DC-backed DAX
regions
Ira Weiny (25):
cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
cxl/mem: Read dynamic capacity configuration from the device
cxl/cdat: Gather DSMAS data for DCD partitions
cxl/core: Enforce partition order/simplify partition calls
cxl/mem: Expose dynamic ram 1 partition in sysfs
cxl/port: Add 'dynamic_ram_1' to endpoint decoder mode
cxl/region: Add DC DAX region support
cxl/events: Split event msgnum configuration from irq setup
cxl/pci: Factor out interrupt policy check
cxl/mem: Configure dynamic capacity interrupts
cxl/core: Return endpoint decoder information from region search
cxl/mem: Set up framework for handling DC Events
cxl/extent: Handle DC Add Capacity events
cxl/mem: Drop misaligned DCD extent groups
cxl/extent: Validate DC extent partition
cxl/extent: Handle DC Release Capacity events
cxl/region/extent: Expose dc_extent information in sysfs
cxl + dax: Surface dax_resources on DCD Add Capacity events
cxl + dax: Release dax_resources on DCD Release Capacity events
dax/bus: Factor out dev dax resize logic
dax/bus: Reject resize on DC dax devices and enforce 0-size creation
cxl/region: Read existing extents on region creation
cxl/mem: Trace Dynamic capacity Event Record
tools/testing/cxl: Make event logs dynamic
tools/testing/cxl: Add DC Regions to mock mem data
Documentation/ABI/testing/sysfs-bus-cxl | 100 +-
Documentation/ABI/testing/sysfs-bus-dax | 18 +
.../driver-api/cxl/linux/cxl-driver.rst | 149 +++
.../driver-api/cxl/linux/dax-driver.rst | 169 +++
drivers/cxl/core/Makefile | 2 +-
drivers/cxl/core/cdat.c | 12 +
drivers/cxl/core/core.h | 67 +-
drivers/cxl/core/extent.c | 783 ++++++++++++
drivers/cxl/core/hdm.c | 14 +-
drivers/cxl/core/mbox.c | 1107 +++++++++++++++-
drivers/cxl/core/memdev.c | 87 +-
drivers/cxl/core/port.c | 9 +
drivers/cxl/core/region.c | 53 +-
drivers/cxl/core/region_dax.c | 49 +-
drivers/cxl/core/trace.h | 75 ++
drivers/cxl/cxl.h | 114 +-
drivers/cxl/cxlmem.h | 162 ++-
drivers/cxl/mem.c | 2 +-
drivers/cxl/pci.c | 136 +-
drivers/dax/bus.c | 653 +++++++++-
drivers/dax/bus.h | 4 +-
drivers/dax/cxl.c | 115 +-
drivers/dax/dax-private.h | 63 +
drivers/dax/hmem/hmem.c | 2 +-
drivers/dax/pmem.c | 2 +-
include/cxl/cxl.h | 7 +-
include/cxl/event.h | 38 +
tools/testing/cxl/Kbuild | 5 +-
tools/testing/cxl/test/cxl.c | 12 +
tools/testing/cxl/test/mem.c | 1109 +++++++++++++++--
tools/testing/cxl/test/mock.h | 9 +
31 files changed, 4858 insertions(+), 269 deletions(-)
create mode 100644 drivers/cxl/core/extent.c
--
2.43.0