Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.

From: Dave Jiang

Date: Thu Apr 30 2026 - 12:05:00 EST




On 4/29/26 9:39 PM, Sungwoo Kim wrote:
> On Tue, Apr 28, 2026 at 6:33 PM Dave Jiang <dave.jiang@xxxxxxxxx> wrote:
>>
>>
>>
>> On 4/28/26 1:28 PM, Sungwoo Kim wrote:
>>> Dear Dave, thank you for sharing the patch that doesn't use the workqueue.
>>>
>>> Claude suggests not using wq, since it's simpler. I agree that it's
>>> simple, but it's overly tailored to fix a specific bug.
>>> Actually, v1[1] proposed a similar patch. So let me bring a patch and
>>> discussion from v1:
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index 08fa3deef70ab..7ade9aa2aeecc 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
>>> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>>> struct cxl_port *port = to_cxl_port(dev->parent);
>>> struct cxl_region *cxlr;
>>> + int err;
>>>
>>> cxlr = cxl_find_region_by_name(cxlrd, buf);
>>> if (IS_ERR(cxlr))
>>> return PTR_ERR(cxlr);
>>>
>>> - devm_release_action(port->uport_dev, unregister_region, cxlr);
>>> + err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
>>> + cxlr);
>>> + if (err) {
>>> + put_device(&cxlr->dev);
>>> + return err;
>>> + }
>>> + unregister_region(cxlr);
>>> put_device(&cxlr->dev);
>>>
>>> return len;
>>>
>>> However, v1 has not been merged. Dan[2] commented that "No, that is
>>> not an acceptable or comprehensive fix. A subsystem should never try
>>> to double unregister a device." Also in the following thread[3], "The
>>> patch was technically correct but it relies on a design that requires
>>> depending on a double free semantic."
>>>
>>> I respect this design decision. Then, I need to execute
>>> devm_release_[action|all]() only once, which requires a device lock,
>>> guard(device)(port->uport_dev). Under a lock, I can check a flag to
>>> execute devm_release_[action|all]() only once.
>>> To use the lock, a clean work without a prior lock is required. That's
>>> a reason this patch ended up in wq.
>>>
>>> I hope I've explained the rationale for using wq. What do you think?
>>
>> Right I went back and also read what Dan proposed. I just wonder if we are over complicating things now and introducing more issues on top by doing that. Obviously we have to address the issues sachiko brought up in v3. Below is what claude suggested to fix the Sashiko issues in v3 patches. Some of the comments may be excessive but help reading through the changes.
>
> I (and Claude) don't have a better solution than using wq, although I
> agree that not using wq is simpler.
> Also, I'm not yet experienced enough to decide which is better for
> CXL, so I'm happy with both directions. Would you prefer the version
> without wq?

Looks like Dan is working on something [1]. So maybe we wait and see what he comes up with.

[1]: https://lore.kernel.org/linux-cxl/c65851c1-4fca-46ba-8dde-fa10b7cb9cd3@xxxxxxx/T/#mdd1ab49c012321fcd3dc34bd0cb7c0846cf1d1f9

DJ