Re: [PATCH v9 12/19] cxl/extent: Process dynamic partition events and realize region extents
From: Ira Weiny
Date: Thu Mar 05 2026 - 16:57:32 EST
Anisa Su wrote:
> On Sun, Apr 13, 2025 at 05:52:20PM -0500, Ira Weiny wrote:i
> A few notes while going through and removing sparse dax semantics and plumbing
> for fs-dax mode:
> > A dynamic capacity device (DCD) sends events to signal the host for
> > changes in the availability of Dynamic Capacity (DC) memory. These
> > events contain extents describing a DPA range and meta data for memory
> > to be added or removed. Events may be sent from the device at any time.
> >
> > Three types of events can be signaled, Add, Release, and Force Release.
> >
> > On add, the host may accept or reject the memory being offered. If no
> > region exists, or the extent is invalid, the extent should be rejected.
> > Add extent events may be grouped by a 'more' bit which indicates those
> > extents should be processed as a group.
> >
> > On remove, the host can delay the response until the host is safely not
> > using the memory. If no region exists the release can be sent
> > immediately. The host may also release extents (or partial extents) at
> > any time.
> Partial release is no longer valid for tagged release iirc from the calls
Tags were not supported in this version:
if (!uuid_is_null((const uuid_t *)extent->uuid)) {
dev_err_ratelimited(dev,
"DC extent DPA %pra (%pU); tags not supported\n",
&ext_range, extent->uuid);
return -ENXIO;
}
>
> > Thus the 'more' bit grouping of release events is of less
> > value and can be ignored in favor of sending multiple release capacity
> > responses for groups of release events.
> >
> [snip]
> > +
> > +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode,
> > + struct xarray *extent_array, int cnt)
> > +{
> > + struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> > + struct cxl_mbox_dc_response *p;
> > + struct cxl_extent *extent;
> > + unsigned long index;
> > + u32 pl_index;
> > +
> > + size_t pl_size = struct_size(p, extent_list, cnt);
> > + u32 max_extents = cnt;
> > +
> > + /* May have to use more bit on response. */
> > + if (pl_size > cxl_mbox->payload_size) {
> > + max_extents = (cxl_mbox->payload_size - sizeof(*p)) /
> > + sizeof(struct updated_extent_list);
> > + pl_size = struct_size(p, extent_list, max_extents);
> > + }
> > +
> > + struct cxl_mbox_dc_response *response __free(kfree) =
> > + kzalloc(pl_size, GFP_KERNEL);
> > + if (!response)
> > + return -ENOMEM;
> > +
> > + if (cnt == 0)
> > + return send_one_response(cxl_mbox, response, opcode, 0, 0);
> > +
> > + pl_index = 0;
> I was wondering why xarray is used here instead of a list? I didn't see anywhere
> that we need to look up a specific index to benefit from the log complexity and
> afaict, simply used to iterate over all elements.
xarray was just easier than a list.
>
> > + xa_for_each(extent_array, index, extent) {
> > + response->extent_list[pl_index].dpa_start = extent->start_dpa;
> > + response->extent_list[pl_index].length = extent->length;
> > + pl_index++;
> > +
> > + if (pl_index == max_extents) {
> > + u8 flags = 0;
> > + int rc;
> > +
> > + if (pl_index < cnt)
> > + flags |= CXL_DCD_EVENT_MORE;
> > + rc = send_one_response(cxl_mbox, response, opcode,
> > + pl_index, flags);
> > + if (rc)
> > + return rc;
> > + cnt -= pl_index;
> > + pl_index = 0;
> > + }
> > + }
> > +
> > + if (!pl_index) /* nothing more to do */
> > + return 0;
> > + return send_one_response(cxl_mbox, response, opcode, pl_index, 0);
> > +}
> > +
[snip]
> > +static int validate_add_extent(struct cxl_memdev_state *mds,
> > + struct cxl_extent *extent)
> > +{
> > + int rc;
> > +
> > + rc = cxl_validate_extent(mds, extent);
> > + if (rc)
> > + return rc;
> > +
> > + return cxl_add_extent(mds, extent);
> > +}
> > +
> > +static int cxl_add_pending(struct cxl_memdev_state *mds)
> > +{
> > + struct device *dev = mds->cxlds.dev;
> > + struct cxl_extent *extent;
> > + unsigned long cnt = 0;
> > + unsigned long index;
> > + int rc;
> > +
> Also according to the spec:
> "In response to an Add Capacity Event Record, or multiple Add Capacity Event
> records grouped via the More flag (see Table 8-229), the host is expected to
> respond with exactly one Add Dynamic Capacity Response acknowledgment,
> corresponding to the order of the Add Capacity Events received. If the order
> does not match, the device shall return Invalid Input. The Add Dynamic Capacity
> Response acknowledgment must be sent in the same order as the Add Capacity
> Event Records."
hmmm... yea that might be wrong, I don't recall.
>
> Using xarray does not preserve the order of the extents, which requires a fifo
> queue.
It could if the index was the order.
But in the end I'm not opposed to using a list.
Ira
[snip]