Re: [PATCH v4 21/28] cxl/extent: Process DCD events and realize region extents
From: Li, Ming4
Date: Tue Oct 08 2024 - 21:56:47 EST
On 10/8/2024 7:16 AM, ira.weiny@xxxxxxxxx wrote:
> From: Navneet Singh <navneet.singh@xxxxxxxxx>
>
> A dynamic capacity device (DCD) sends events to signal the host for
> changes in the availability of Dynamic Capacity (DC) memory. These
> events contain extents describing a DPA range and meta data for memory
> to be added or removed. Events may be sent from the device at any time.
>
> Three types of events can be signaled, Add, Release, and Force Release.
>
> On add, the host may accept or reject the memory being offered. If no
> region exists, or the extent is invalid, the extent should be rejected.
> Add extent events may be grouped by a 'more' bit which indicates those
> extents should be processed as a group.
>
> On remove, the host can delay the response until the host is safely not
> using the memory. If no region exists the release can be sent
> immediately. The host may also release extents (or partial extents) at
> any time. Thus the 'more' bit grouping of release events is of less
> value and can be ignored in favor of sending multiple release capacity
> responses for groups of release events.
>
> Force removal is intended as a mechanism between the FM and the device
> and intended only when the host is unresponsive, out of sync, or
> otherwise broken. Purposely ignore force removal events.
>
> Regions are made up of one or more devices which may be surfacing memory
> to the host. Once all devices in a region have surfaced an extent the
> region can expose a corresponding extent for the user to consume.
> Without interleaving a device extent forms a 1:1 relationship with the
> region extent. Immediately surface a region extent upon getting a
> device extent.
>
> Per the specification the device is allowed to offer or remove extents
> at any time. However, anticipated use cases can expect extents to be
> offered, accepted, and removed in well defined chunks.
>
> Simplify extent tracking with the following restrictions.
>
> 1) Flag for removal any extent which overlaps a requested
> release range.
> 2) Refuse the offer of extents which overlap already accepted
> memory ranges.
> 3) Accept again a range which has already been accepted by the
> host. Eating duplicates serves three purposes. First, this
> simplifies the code if the device should get out of sync with
> the host. And it should be safe to acknowledge the extent
> again. Second, this simplifies the code to process existing
> extents if the extent list should change while the extent
> list is being read. Third, duplicates for a given region
> which are seen during a race between the hardware surfacing
> an extent and the cxl dax driver scanning for existing
> extents will be ignored.
>
> NOTE: Processing existing extents is done in a later patch.
>
> Management of the region extent devices must be synchronized with
> potential uses of the memory within the DAX layer. Create region extent
> devices as children of the cxl_dax_region device such that the DAX
> region driver can co-drive them and synchronize with the DAX layer.
> Synchronization and management is handled in a subsequent patch.
>
> Tag support within the DAX layer is not yet supported. To maintain
> compatibility legacy DAX/region processing only tags with a value of 0
> are allowed. This defines existing DAX devices as having a 0 tag which
> makes the most logical sense as a default.
>
> Process DCD events and create region devices.
>
> Signed-off-by: Navneet Singh <navneet.singh@xxxxxxxxx>
> Co-developed-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
>
Hi Ira,
I guess you missed my comments for V3, I comment it again for this patch.
> +static bool extents_contain(struct cxl_dax_region *cxlr_dax,
> + struct cxl_endpoint_decoder *cxled,
> + struct range *new_range)
> +{
> + struct device *extent_device;
> + struct match_data md = {
> + .cxled = cxled,
> + .new_range = new_range,
> + };
> +
> + extent_device = device_find_child(&cxlr_dax->dev, &md, match_contains);
> + if (!extent_device)
> + return false;
> +
> + put_device(extent_device);
could use __free(put_device) to drop this 'put_device(extent_device)'
> + return true;
> +}
[...]
> +static bool extents_overlap(struct cxl_dax_region *cxlr_dax,
> + struct cxl_endpoint_decoder *cxled,
> + struct range *new_range)
> +{
> + struct device *extent_device;
> + struct match_data md = {
> + .cxled = cxled,
> + .new_range = new_range,
> + };
> +
> + extent_device = device_find_child(&cxlr_dax->dev, &md, match_overlaps);
> + if (!extent_device)
> + return false;
> +
> + put_device(extent_device);
Same as above.
> + return true;
> +}
> +
[...]
> +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode,
> + struct xarray *extent_array, int cnt)
> +{
> + struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> + struct cxl_mbox_dc_response *p;
> + struct cxl_mbox_cmd mbox_cmd;
> + struct cxl_extent *extent;
> + unsigned long index;
> + u32 pl_index;
> + int rc;
> +
> + size_t pl_size = struct_size(p, extent_list, cnt);
> + u32 max_extents = cnt;
> +
> + /* May have to use more bit on response. */
> + if (pl_size > cxl_mbox->payload_size) {
> + max_extents = (cxl_mbox->payload_size - sizeof(*p)) /
> + sizeof(struct updated_extent_list);
> + pl_size = struct_size(p, extent_list, max_extents);
> + }
> +
> + struct cxl_mbox_dc_response *response __free(kfree) =
> + kzalloc(pl_size, GFP_KERNEL);
> + if (!response)
> + return -ENOMEM;
> +
> + pl_index = 0;
> + xa_for_each(extent_array, index, extent) {
> +
> + response->extent_list[pl_index].dpa_start = extent->start_dpa;
> + response->extent_list[pl_index].length = extent->length;
> + pl_index++;
> + response->extent_list_size = cpu_to_le32(pl_index);
> +
> + if (pl_index == max_extents) {
> + mbox_cmd = (struct cxl_mbox_cmd) {
> + .opcode = opcode,
> + .size_in = struct_size(response, extent_list,
> + pl_index),
> + .payload_in = response,
> + };
> +
> + response->flags = 0;
> + if (pl_index < cnt)
> + response->flags &= CXL_DCD_EVENT_MORE;
It should be 'response->flags |= CXL_DCD_EVENT_MORE' here.
Another issue is if 'cnt' is N times bigger than 'max_extents'(e,g. cnt=20, max_extents=10). all responses will be sent in this xa_for_each(), and CXL_DCD_EVENT_MORE will be set in the last response but it should not be set in these cases.
> +
> + rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> + if (rc)
> + return rc;
> + pl_index = 0;
> + }
> + }
> +
> + if (cnt == 0 || pl_index) {
> + mbox_cmd = (struct cxl_mbox_cmd) {
> + .opcode = opcode,
> + .size_in = struct_size(response, extent_list,
> + pl_index),
> + .payload_in = response,
> + };
> +
> + response->flags = 0;
> + rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> + if (rc)
> + return rc;
> + }
> +
> + return 0;
> +}
> +