[RFC PATCH 2/4] cxl/extent: Fix DCD add-capacity: per-tag assembly, ordering, and integrity

From: John Groves

Date: Thu Apr 23 2026 - 19:52:29 EST


From: John Groves <John@xxxxxxxxxx>

Prior to this commit, the DCD add-capacity path mis-handles the stream
of extent events that the device uses to deliver allocations:

1. It list_sort()s the pending batch by DPA, rejects the whole batch
unless every adjacent pair is DPA-contiguous, then re-sorts by a
saved arrival index to restore order for the response.

This defeats the purpose of the extent mechanism. Extents exist
so the device can satisfy a single allocation from non-contiguous
DPA pieces; rejecting those sets means only trivially-contiguous
allocations ever become dax devices. The idx bookkeeping on
struct cxl_extent_list_node exists only to undo the DPA sort.

2. It conflates "More-chain" with "allocation". The allocation unit
in the spec is the *tag*: extents that share the same tag form one
allocation and belong in one dax device. A More-chain is a
delivery boundary, not an allocation boundary — it may carry
extents for several distinct tags. What More=0 guarantees is
completeness: for every tag that appears inside the chain, all of
that tag's extents are delivered by the time the chain closes.
Consequently, an extent bearing a tag that a previous More-chain
already committed is a firmware bug; this applies to any tagged
allocation, whether or not the extents carry sequence numbers.

Prior to this commit, the code forces all More-chain extents into a single
region_extent via cxl_add_extent()'s uuid_equal() check, so for a
multi-tag chain every allocation but the first is silently
dropped. It also has no explicit detection for a tag reappearing
in a later More-chain.

3. It orders assembly by the host's event arrival timing (via the
DPA sort + idx unsort), not by the per-allocation sequence number
the spec defines on each extent (Shared Extent Sequence Number).
The dax device's backing layout should be a property of the
device's allocation, not of host event delivery.

Note that sequence numbers are a sharable-region concern: extents
in non-sharable regions do not carry them (shared_extn_seq == 0
on every extent), and a single host may simply assemble those in
arrival order. The same comparator must handle both cases.

There is also no check that the set of sequence numbers in a tag
group is well-formed. Valid sharable values are 1..n contiguous
and unique; mixing 0 with non-zero, skipping positions, or
duplicating a position is a firmware bug that should not produce
a dax device.

4. It rejects any extent with a null (untagged) UUID up front, even
though CXL 3.1 explicitly permits untagged extents in non-sharable
regions.

5. It does not validate extent alignment. A dax device backed by
extents that are not aligned to device-dax granularity cannot be
mapped; surfacing such a device to userspace is a bug waiting to
happen.

Fix:

- Replace the flat DPA sort + contiguity reject + idx restore with
a per-tag-group assembly pipeline:

1. Extract this tag's pending extents to a local list.
2. Stable-sort by shared_extn_seq. For sharable extents this
walks the group in device-stamped order; for non-sharable
extents every key is 0 and the stable sort preserves arrival
order. One comparator, both cases.
3. Cross-More-chain uniqueness: if this (tagged) group's tag
already maps to a committed region_extent on its target
cxlr_dax, the device has re-sent a completed allocation —
reject the whole group with a firmware-bug warning. Skipped
for the null UUID (the spec does not define cross-chain
identity for untagged extents). Implemented as a linear walk
of cxlr_dax->region_extents comparing stored UUIDs.
4. Sequence-number integrity: a tag group is well-formed iff
every member carries shared_extn_seq == 0 (non-sharable) or
the sorted group is exactly 1, 2, ..., n (sharable). A mix,
a gap, a duplicate, or a non-zero set that does not start at
1 is a firmware bug and the whole group is dropped. For now,
the per-extent sharable-extent rejection (see below) means only
the all-zero branch is reachable; the non-zero branches are in
place so that lifting that restriction keeps the sequencing-
integrity contract enforced.
5. Alignment gate: if any extent in the group fails
CXL_DCD_EXTENT_ALIGN (SZ_2M), drop the whole group with a
warning. Partial acceptance would leave an unusable dax
device.
6. Validate + cxl_add_extent() each survivor into a fresh
region_extent.
7. Online + notify, splice accepted extents onto the response
list, clear add_ctx.region_extent for the next tag.

The outer loop picks tags in first-appearance order; the *intra-
group* order is the spec's sequence number (or, for non-sharable
extents, arrival order via stable-sort tie-breaking).

- Drop the contiguity check. Same-tag extents form one dax device
regardless of DPA layout.

- Drop the null-UUID rejection in cxl_validate_extent(). Untagged
extents may now be accepted; uuid_equal() in cxl_add_extent()
already collapses them into a single untagged region_extent, which
is the simplest conformant choice given the spec's silence on
untagged aggregation. Sharable extents (shared_extn_seq != 0) remain
unsupported at the per-extent validate stage, so for now the group-
level sequence-integrity check only exercises the all-zero branch.

- Drop the DPA sort, the restore-by-idx sort, and the idx field on
struct cxl_extent_list_node. list_sort is retained (stable sort
for the intra-tag sequence-number ordering).

- Lift cxl_add_extent()'s "region already has a region_extent" gate.
It was the single-slot safety stop; with the prior commit's xarray
in place there is no reason to reject adding a second allocation
to cxlr_dax, and the cross-More-chain uniqueness check above
handles the one case the gate actually defended against (a
same-tag re-delivery).

Fixes: 7f9b600a07e1 ("cxl/extent: Process dynamic partition events and realize region extents")
Signed-off-by: John Groves <John@xxxxxxxxxx>
Signed-off-by: John Groves <john@xxxxxxxxxx>
---
drivers/cxl/core/extent.c | 6 -
drivers/cxl/core/mbox.c | 494 +++++++++++++++++++++++++++++++-------
include/cxl/event.h | 1 -
3 files changed, 401 insertions(+), 100 deletions(-)

diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c
index 44b58cd477655..559c68f10dc6b 100644
--- a/drivers/cxl/core/extent.c
+++ b/drivers/cxl/core/extent.c
@@ -428,12 +428,6 @@ int cxl_add_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
return -ENXIO;

cxlr_dax = cxlr->cxlr_dax;
- /* Cannot add to a region_extent once it's been onlined */
- if (!xa_empty(&cxlr_dax->region_extents)) {
- dev_err(&cxlr_dax->dev, "Can no longer add to region %d\n",
- cxlr->id);
- return -EINVAL;
- }

if (pending_region_ext &&
!uuid_equal((uuid_t *)extent->uuid, &pending_region_ext->uuid)) {
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 34ba57d0494f5..804b7846b5726 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -7,6 +7,7 @@
#include <linux/unaligned.h>
#include <linux/list.h>
#include <linux/list_sort.h>
+#include <linux/sizes.h>
#include <cxlpci.h>
#include <cxlmem.h>
#include <cxl.h>
@@ -970,11 +971,13 @@ static int cxl_validate_extent_partition(struct cxl_memdev_state *mds,
}

/*
- * Check extent tag is non-null, tag is not already in use, extent belongs to a
- * region, and the extent is within the bounds of a DC partition.
- * If extent is not first in the pending_list, check tag and region match the
- * previous entry's.
+ * CXL 3.1 permits both tagged (non-null UUID) and untagged (null UUID)
+ * extents. The spec is silent on whether untagged extents from different
+ * events may be aggregated; we allow them to be combined into a single
+ * dax device for simplicity. Sharable extents (shared_extn_seq != 0) are
+ * not supported yet and are rejected here.
*
+ * Partition boundary and region-attachment are validated separately.
*/
static int cxl_validate_extent(struct cxl_memdev_state *mds,
struct cxl_extent_list_node *pos)
@@ -988,14 +991,9 @@ static int cxl_validate_extent(struct cxl_memdev_state *mds,
};
uuid_t *uuid = (uuid_t *)extent->uuid;

- if (uuid_is_null(uuid)) {
- dev_dbg(dev, "no tag for extent: %pra\n", &ext_range);
- return -EINVAL;
- }
-
if (le16_to_cpu(extent->shared_extn_seq) != 0) {
dev_dbg(dev,
- "DC extent DPA %pra (%pU) can not be shared\n",
+ "DC extent DPA %pra (%pU) is sharable; not supported\n",
&ext_range, uuid);
return -ENXIO;
}
@@ -1293,122 +1291,444 @@ static void clear_pending_extents(void *_mds)
mds->add_ctx.region_extent = NULL;
}

-static int dpa_compare(void *priv,
- const struct list_head *a,
- const struct list_head *b)
+/*
+ * Device-dax requires extent boundaries aligned to its mapping granularity.
+ * Use SZ_2M as a conservative default; a tighter check that queries the
+ * cxl_dax_region / cxl_endpoint_decoder for its actual alignment would be
+ * strictly more correct, but SZ_2M is the minimum device-dax supports on
+ * every architecture that enables CXL DCD today.
+ */
+#define CXL_DCD_EXTENT_ALIGN SZ_2M
+
+static bool cxl_extent_dcd_aligned(const struct cxl_extent *extent)
+{
+ u64 start = le64_to_cpu(extent->start_dpa);
+ u64 len = le64_to_cpu(extent->length);
+
+ return IS_ALIGNED(start, CXL_DCD_EXTENT_ALIGN) &&
+ IS_ALIGNED(len, CXL_DCD_EXTENT_ALIGN);
+}
+
+/*
+ * Compare two extents by shared_extn_seq (ascending).
+ *
+ * Per CXL 3.1 Table 8-51, shared_extn_seq is defined only for extents in
+ * *sharable* CDAT regions: those extents are required to carry both a
+ * non-null tag and a per-allocation sequence number so multiple hosts
+ * reading the same allocation assemble the extents into the same order.
+ *
+ * Extents in non-sharable regions do not carry a sequence number
+ * (shared_extn_seq == 0 on every extent); for those, a single host's
+ * arrival order is a sufficient definition of "the order the device
+ * sent them." list_sort() is stable, so when every element in a group
+ * has shared_extn_seq == 0, ties fall back to list order — which is
+ * arrival order via list_add_tail() in add_to_pending_list(). Thus
+ * the same comparator gives the right answer for both cases, and the
+ * code stays correct if/when sharable (sequenced) extents become
+ * supported.
+ */
+static int extent_seq_compare(void *priv,
+ const struct list_head *a,
+ const struct list_head *b)
{
const struct cxl_extent_list_node *ea =
list_entry(a, struct cxl_extent_list_node, list);
const struct cxl_extent_list_node *eb =
list_entry(b, struct cxl_extent_list_node, list);
+ u16 sa = le16_to_cpu(ea->extent->shared_extn_seq);
+ u16 sb = le16_to_cpu(eb->extent->shared_extn_seq);

- if (ea->extent->start_dpa < eb->extent->start_dpa)
+ if (sa < sb)
return -1;
- if (ea->extent->start_dpa > eb->extent->start_dpa)
+ if (sa > sb)
return 1;
-
return 0;
}

-static int idx_compare(void *priv,
- const struct list_head *a,
- const struct list_head *b)
+/*
+ * Move every pending extent whose tag matches @tag onto @group, preserving
+ * the order they appear in @pending. @group is left in arrival order so
+ * the caller can then sort it by shared_extn_seq with list_sort()'s stable
+ * ordering guarantee.
+ */
+static void extract_tag_group(struct list_head *pending,
+ const uuid_t *tag,
+ struct list_head *group)
{
- const struct cxl_extent_list_node *ea =
- list_entry(a, struct cxl_extent_list_node, list);
- const struct cxl_extent_list_node *eb =
- list_entry(b, struct cxl_extent_list_node, list);
+ struct cxl_extent_list_node *pos, *tmp;

- if (ea->idx < eb->idx)
- return -1;
- if (ea->idx > eb->idx)
- return 1;
+ list_for_each_entry_safe(pos, tmp, pending, list) {
+ uuid_t t;
+
+ import_uuid(&t, pos->extent->uuid);
+ if (uuid_equal(&t, tag))
+ list_move_tail(&pos->list, group);
+ }
+}

+/*
+ * Detect a tagged allocation re-appearing after its More-chain closed.
+ *
+ * A More-chain (the sequence of Add-Capacity events terminated by
+ * More=0) guarantees completeness for every tag it carries: once the
+ * chain ends, no extent bearing a tag that appeared inside it may
+ * arrive in any later chain. This is true for tagged extents whether
+ * or not they carry shared_extn_seq — sequencing is a sharable-region
+ * concern, completeness is a general one.
+ *
+ * Detection here is a linear walk of cxlr_dax->region_extents (keyed
+ * by allocator-assigned IDs, not by UUID) comparing each stored
+ * region_extent's UUID against the incoming tag.
+ *
+ * Returns true iff @tag is non-null AND a region_extent with a
+ * matching uuid already exists on the target region. For an untagged
+ * (null-UUID) extent the check is skipped: the spec is silent on
+ * aggregating untagged extents across More-chains, so we don't
+ * manufacture a rule here.
+ */
+static bool cxl_tag_already_committed(struct cxl_memdev_state *mds,
+ struct cxl_extent *extent,
+ const uuid_t *tag)
+{
+ u64 start_dpa = le64_to_cpu(extent->start_dpa);
+ struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
+ struct cxl_dax_region *cxlr_dax;
+ struct region_extent *re;
+ struct cxl_region *cxlr;
+ unsigned long idx;
+
+ if (uuid_is_null(tag))
+ return false;
+
+ guard(rwsem_read)(&cxl_rwsem.region);
+ cxlr = cxl_dpa_to_region(cxlmd, start_dpa, NULL);
+ if (!cxlr)
+ return false;
+
+ cxlr_dax = cxlr->cxlr_dax;
+ xa_for_each(&cxlr_dax->region_extents, idx, re) {
+ if (uuid_equal(&re->uuid, tag))
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Validate shared_extn_seq across a tag group already sorted ascending.
+ *
+ * Per CXL 3.1 Table 8-51, shared_extn_seq is the per-allocation
+ * extent sequence number. Interpretation:
+ *
+ * - For extents in non-sharable regions the field is unused; every
+ * extent of the allocation carries shared_extn_seq == 0.
+ * - For extents in sharable regions the field carries the device's
+ * stamped position within the allocation. Valid values are 1..n
+ * where n is the number of extents in the allocation; the set
+ * must be contiguous (no gaps), unique (no duplicates), and
+ * complete (no missing positions). 0 is reserved as the
+ * "non-sharable" marker and is not a valid sharable sequence
+ * number.
+ *
+ * Hence a tag group is well-formed iff either (a) every extent has
+ * shared_extn_seq == 0, or (b) the sorted group is exactly 1, 2, ...,
+ * n. Anything else — a mix of 0 and non-zero values, a non-zero set
+ * that does not start at 1, a gap, or a duplicate — is a device
+ * firmware bug. Reject the whole group in those cases; partial
+ * acceptance would surface a dax device whose backing layout does
+ * not reflect the device's allocation.
+ *
+ * NOTE: at the time of this patch cxl_validate_extent() still rejects
+ * any extent with shared_extn_seq != 0 per-extent (sharable extents are
+ * not yet surfaced). This group-level check therefore only exercises
+ * the all-zero arm in the current driver; the non-zero arms are in
+ * place so that lifting the per-extent restriction does not leave the
+ * sequencing-integrity contract unenforced.
+ */
+static int cxl_check_group_seq(struct device *dev,
+ const uuid_t *tag,
+ const struct list_head *group)
+{
+ struct cxl_extent_list_node *pos;
+ u16 first, expected;
+
+ if (list_empty(group))
+ return 0;
+
+ pos = list_first_entry(group, struct cxl_extent_list_node, list);
+ first = le16_to_cpu(pos->extent->shared_extn_seq);
+
+ if (first == 0) {
+ /* Non-sharable: every member must be 0. */
+ list_for_each_entry(pos, group, list) {
+ if (le16_to_cpu(pos->extent->shared_extn_seq) != 0) {
+ dev_warn(dev,
+ "Tag %pUb: shared_extn_seq mixed 0/non-zero in one allocation (firmware bug)\n",
+ tag);
+ return -EINVAL;
+ }
+ }
+ return 0;
+ }
+
+ /* Sharable: group must be exactly 1, 2, ..., n (contiguous). */
+ if (first != 1) {
+ dev_warn(dev,
+ "Tag %pUb: shared_extn_seq starts at %u, expected 1 (firmware bug)\n",
+ tag, first);
+ return -EINVAL;
+ }
+
+ expected = 1;
+ list_for_each_entry(pos, group, list) {
+ u16 s = le16_to_cpu(pos->extent->shared_extn_seq);
+
+ if (s != expected) {
+ dev_warn(dev,
+ "Tag %pUb: shared_extn_seq gap/dup: expected %u got %u (firmware bug)\n",
+ tag, expected, s);
+ return -EINVAL;
+ }
+ expected++;
+ }
return 0;
}

/*
- * Validate and add contiguous extents. Removes invalid, non-contiguous, or
- * mismatched extents from pending_list. Sorts by DPA for processing, then
- * restores original order for response.
+ * Assemble the pending Add-Capacity events into dax devices and send the
+ * ADD_DC_RESPONSE.
+ *
+ * Spec semantics (CXL 3.1 8.2.9.9.9.3 / 8.2.9.2.1.6):
+ *
+ * - The unit of allocation is a *tag*, not a More-chain. All extents
+ * that share the same tag form one allocation and must be assembled
+ * into a single dax device. For extents in sharable CDAT regions
+ * a non-null tag is required; for extents in non-sharable regions
+ * the tag is optional — the null UUID is a valid "untagged"
+ * allocation identity.
+ *
+ * - Within a tag, extents must be ordered by shared_extn_seq (the
+ * per-allocation sequence number, Table 8-51). shared_extn_seq is
+ * a sharable-region concern: multiple hosts reading the same
+ * allocation need to agree on assembly order, so the device stamps
+ * each extent with its position. For non-sharable extents the
+ * spec does not provide sequence numbers (shared_extn_seq == 0 on
+ * every extent); the lone host simply assembles in arrival order.
+ * list_sort() is stable, so one comparator handles both cases:
+ * sequence-number order when it is populated, arrival order when
+ * every tie key is zero.
+ *
+ * Valid sharable values are 1..n, contiguous and unique across the
+ * n extents of one allocation; 0 is reserved for the non-sharable
+ * marker. A tag group is well-formed iff either every member is
+ * 0 or the sorted group is exactly 1, 2, ..., n. See
+ * cxl_check_group_seq().
+ *
+ * - A More-chain is a delivery boundary, not an allocation boundary:
+ * it may carry extents for several distinct tags. What More=0
+ * guarantees is completeness — for every tag that appears inside
+ * the chain, all of that tag's extents are delivered by the time
+ * the chain closes. This completeness guarantee applies to tagged
+ * allocations regardless of whether the extents carry sequence
+ * numbers. Therefore, receiving an extent bearing a tag that a
+ * previous More-chain already committed is a device firmware bug:
+ * the tag's allocation was supposed to have been complete when its
+ * chain closed. The untagged case is excluded — the spec does not
+ * define a cross-chain identity for untagged extents.
+ *
+ * - An allocation is not required to be DPA-contiguous; extents exist
+ * precisely so the device can satisfy one allocation from scattered
+ * DPA pieces.
+ *
+ * - Untagged extents from distinct events: the spec is silent on
+ * aggregation. Collapsing them into a single untagged dax device
+ * is the simplest conformant choice and is what the existing
+ * cxl_add_extent()/uuid_equal() logic implements for the null-UUID
+ * case.
+ *
+ * Enforced here, per tag group (in first-appearance order of the tag):
+ *
+ * 1. Extract the group to a local list, then stable-sort by
+ * shared_extn_seq. For sharable extents this walks the group
+ * in device-stamped sequence order; for non-sharable extents
+ * every key is 0 and the stable sort preserves arrival order.
+ * 2. Cross-More-chain uniqueness — if this (tagged) group's tag
+ * already maps to a committed region_extent on its target
+ * cxlr_dax, the device has re-sent a completed allocation.
+ * Drop the whole group with a firmware-bug warning. Skipped
+ * for the null UUID.
+ * 3. Sequence-number integrity — either every member carries
+ * shared_extn_seq == 0 (non-sharable allocation) or the sorted
+ * group is exactly 1, 2, ..., n (sharable). Mix, gap,
+ * duplicate, or a non-zero set that does not start at 1 is a
+ * firmware bug; drop the whole group.
+ * 4. Alignment gate — every extent's start_dpa and length must be
+ * CXL_DCD_EXTENT_ALIGN-aligned, else drop the whole group with
+ * a warning. Partial acceptance would leave an unusable dax
+ * device.
+ * 5. Validate + cxl_add_extent() each surviving extent into a fresh
+ * region_extent built up in add_ctx.
+ * 6. Online + notify the region_extent, splice accepted extents
+ * into the response list, clear the add_ctx slot so the next
+ * tag's group can build its own. online_region_extent() inserts
+ * each realized region_extent into cxlr_dax->region_extents
+ * (an xarray keyed by an allocator-assigned ID, not by UUID),
+ * which is what allows multiple tag groups to surface under
+ * one DAX region.
*/
static int cxl_add_pending(struct cxl_memdev_state *mds)
{
struct device *dev = mds->cxlds.dev;
+ struct list_head *pending = &mds->add_ctx.pending_extents;
struct cxl_extent_list_node *pos, *tmp;
- struct region_extent *pending_reg_ext;
- struct cxl_extent *extent;
- u64 prev_end, start, len;
- int cnt = 0, rc;
+ LIST_HEAD(accepted);
+ int total_accepted = 0;
+
+ while (!list_empty(pending)) {
+ LIST_HEAD(group);
+ struct region_extent *reg_ext;
+ bool aligned = true;
+ int group_cnt = 0;
+ uuid_t tag;
+ int rc;

- list_sort(NULL, &mds->add_ctx.pending_extents, dpa_compare);
- list_for_each_entry_safe(pos, tmp, &mds->add_ctx.pending_extents, list) {
- extent = pos->extent;
- start = le64_to_cpu(extent->start_dpa);
- len = le64_to_cpu(extent->length);
-
- /* Start enforcing contiguity after accepting first extent */
- if (cnt && start != prev_end) {
- dev_dbg(dev,
- "Non-contiguous extent DPA:%#llx LEN:%#llx\n",
- start, len);
- delete_extent_node(pos);
+ /*
+ * (1) Extract this tag's extents from pending, then order
+ * them by shared_extn_seq. The outer tag is picked by the
+ * first-appearance extent in pending; groups *within* a tag
+ * are ordered by the per-allocation sequence number, which
+ * is the invariant the spec defines.
+ */
+ import_uuid(&tag,
+ list_first_entry(pending,
+ struct cxl_extent_list_node,
+ list)->extent->uuid);
+ extract_tag_group(pending, &tag, &group);
+ list_sort(NULL, &group, extent_seq_compare);
+
+ /*
+ * (2) Cross-More-chain uniqueness. A non-null tag seen in
+ * this group must not already correspond to a committed
+ * region_extent on its target cxlr_dax: More=0 was supposed
+ * to close that allocation. Firmware bug — reject the whole
+ * group. Any extent in the group maps to the same region
+ * (same tag == same allocation == same target), so checking
+ * the first suffices.
+ */
+ pos = list_first_entry(&group, struct cxl_extent_list_node,
+ list);
+ if (cxl_tag_already_committed(mds, pos->extent, &tag)) {
+ dev_warn(dev,
+ "Tag %pUb: dropping group, tag already committed in a previous More-chain (firmware bug)\n",
+ &tag);
+ list_for_each_entry_safe(pos, tmp, &group, list)
+ delete_extent_node(pos);
continue;
}

- if (cxl_validate_extent(mds, pos)) {
- delete_extent_node(pos);
+ /*
+ * (3) Sequence-number integrity. All-zero (non-sharable)
+ * or exact 1..n contiguous (sharable). Anything else is a
+ * firmware bug — reject the whole group; no partial
+ * acceptance.
+ */
+ if (cxl_check_group_seq(dev, &tag, &group)) {
+ list_for_each_entry_safe(pos, tmp, &group, list)
+ delete_extent_node(pos);
continue;
}

- if (cxl_add_extent(mds, extent)) {
- dev_dbg(dev,
- "Failed to add extent DPA:%#llx LEN:%#llx\n",
- start, len);
- delete_extent_node(pos);
+ /* (4) Alignment gate — abort the group if any member fails */
+ list_for_each_entry(pos, &group, list) {
+ if (!cxl_extent_dcd_aligned(pos->extent)) {
+ dev_warn(dev,
+ "Tag %pUb: dropping group, extent DPA:%#llx LEN:%#llx not %u-aligned\n",
+ &tag,
+ le64_to_cpu(pos->extent->start_dpa),
+ le64_to_cpu(pos->extent->length),
+ CXL_DCD_EXTENT_ALIGN);
+ aligned = false;
+ break;
+ }
+ }
+ if (!aligned) {
+ list_for_each_entry_safe(pos, tmp, &group, list)
+ delete_extent_node(pos);
continue;
}

- prev_end = start + len;
- cnt++;
- }
+ /*
+ * (5) Validate + attach in seq order. Surviving nodes stay
+ * on @group in seq order; failed nodes are removed.
+ */
+ list_for_each_entry_safe(pos, tmp, &group, list) {
+ if (cxl_validate_extent(mds, pos)) {
+ delete_extent_node(pos);
+ continue;
+ }

- if (!mds->add_ctx.region_extent) {
- dev_dbg(dev, "No valid extents in list; accept none\n");
- return 0;
- }
+ if (cxl_add_extent(mds, pos->extent)) {
+ dev_dbg(dev,
+ "Tag %pUb: failed to add extent DPA:%#llx LEN:%#llx\n",
+ &tag,
+ le64_to_cpu(pos->extent->start_dpa),
+ le64_to_cpu(pos->extent->length));
+ delete_extent_node(pos);
+ continue;
+ }
+ group_cnt++;
+ }

- pending_reg_ext = mds->add_ctx.region_extent;
- /* Ensure caches are clean prior onlining */
- rc = cxl_region_invalidate_memregion(pending_reg_ext->cxlr_dax->cxlr);
- if (rc)
- return rc;
+ /* (6) online + notify */
+ reg_ext = mds->add_ctx.region_extent;
+ if (!reg_ext) {
+ /* Every extent in the group was dropped */
+ continue;
+ }

- /* device model handles freeing region_extent */
- rc = online_region_extent(pending_reg_ext);
- if (rc)
- return rc;
+ rc = cxl_region_invalidate_memregion(reg_ext->cxlr_dax->cxlr);
+ if (!rc)
+ rc = online_region_extent(reg_ext);
+ if (rc) {
+ dev_warn(dev,
+ "Tag %pUb: failed to online region_extent (%d)\n",
+ &tag, rc);
+ /*
+ * region_extent was not onlined; the allocation
+ * failed. Drop its extents so we do not mis-report
+ * acceptance to the device.
+ */
+ list_for_each_entry_safe(pos, tmp, &group, list)
+ delete_extent_node(pos);
+ } else {
+ rc = cxlr_notify_extent(reg_ext->cxlr_dax->cxlr,
+ DCD_ADD_CAPACITY, reg_ext);
+ if (rc)
+ region_rm_extent(reg_ext);
+ /* Keep accepted extents for the response */
+ list_splice_tail_init(&group, &accepted);
+ total_accepted += group_cnt;
+ }
+
+ /* Next tag's group gets a fresh add_ctx slot */
+ mds->add_ctx.region_extent = NULL;
+ }

- rc = cxlr_notify_extent(pending_reg_ext->cxlr_dax->cxlr,
- DCD_ADD_CAPACITY,
- pending_reg_ext);
/*
- * The region device was briefly live but DAX layer ensures it was not
- * used
+ * Response payload: all accepted extents, grouped by tag (in the
+ * tag's first-appearance order), each group ordered by
+ * shared_extn_seq. pending_extents is empty at this point since
+ * every tag group was extracted; splice the accepted list in so
+ * cxl_send_dc_response() can walk a single list.
*/
- if (rc)
- region_rm_extent(pending_reg_ext);
-
- /* Restore remaining extents to original order and send rsp */
- list_sort(NULL, &mds->add_ctx.pending_extents, idx_compare);
+ list_splice(&accepted, pending);
return cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
- &mds->add_ctx.pending_extents, cnt);
+ pending, total_accepted);
}

static int add_to_pending_list(struct list_head *pending_list,
struct cxl_extent *to_add)
{
- struct cxl_extent_list_node *node, *prev;
+ struct cxl_extent_list_node *node;
struct cxl_extent *extent;

node = kzalloc(sizeof(*node), GFP_KERNEL);
@@ -1420,18 +1740,6 @@ static int add_to_pending_list(struct list_head *pending_list,

node->extent = extent;
list_add_tail(&node->list, pending_list);
-
- /*
- * List is sorted by DPA when adding. Save original index to restore
- * order when sending DC rsp, as required by the spec.
- */
- if (list_is_first(&node->list, pending_list)) {
- node->idx = 0;
- } else {
- prev = list_prev_entry(node, list);
- node->idx = prev->idx + 1;
- }
-
return 0;
}

diff --git a/include/cxl/event.h b/include/cxl/event.h
index fbd95e381e414..fa3cd895f656f 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -156,7 +156,6 @@ struct cxl_extent {
struct cxl_extent_list_node {
struct cxl_extent *extent;
struct list_head list;
- int idx;
int rid;
};

--
2.53.0