Re: [PATCH] cxl/region: Fix a race bug in delete_region_store

From: Sungwoo Kim

Date: Mon Mar 09 2026 - 14:02:17 EST


On Mon, Mar 9, 2026 at 8:00 AM Jonathan Cameron
<jonathan.cameron@xxxxxxxxxx> wrote:
>
> On Sun, 8 Mar 2026 14:59:58 -0400
> Sungwoo Kim <iam@xxxxxxxxxxxx> wrote:
>
> > A race exists when two concurrent sysfs writes to delete_region specify
> > the same region name. Both calls succeed in cxl_find_region_by_name()
> > (which only does device_find_child_by_name and takes a reference), and
> > both then proceed to call devm_release_action(). The first call atomically
> > removes and releases the devres entry successfully. The second call finds
> > no matching entry, causing devres_release() to return -ENOENT, which trips
> > the WARN_ON.
> >
> > Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
> > followed by a manual call to unregister_region(). devm_remove_action_nowarn()
> > removes the devres tracking entry and returns an error code.
>
> Naive question (or just me being lazy). Why can't we take the
> write lock on cxl_rwsem.region?

Thanks for your review. I've just tested your suggestion, but it
caused an ABBA deadlock:

task 1:
create_pmem_region_store
__device_attach() ...dev_lock()
cxl_region_can_probe() ...lock(cxl_rwsem.region)

task 2:
delete_region_store() ...lock(cxl_rwsem.region)
unregister_region()
device_del() ...dev_lock()

One way to avoid a deadlock might be to not add an additional lock.