Re: [PATCH] cxl/region: Fix a race bug in delete_region_store

From: Jonathan Cameron

Date: Mon Mar 09 2026 - 14:14:44 EST


On Mon, 9 Mar 2026 13:56:33 -0400
Sungwoo Kim <iam@xxxxxxxxxxxx> wrote:

> On Mon, Mar 9, 2026 at 8:00 AM Jonathan Cameron
> <jonathan.cameron@xxxxxxxxxx> wrote:
> >
> > On Sun, 8 Mar 2026 14:59:58 -0400
> > Sungwoo Kim <iam@xxxxxxxxxxxx> wrote:
> >
> > > A race exists when two concurrent sysfs writes to delete_region specify
> > > the same region name. Both calls succeed in cxl_find_region_by_name()
> > > (which only does device_find_child_by_name and takes a reference), and
> > > both then proceed to call devm_release_action(). The first call atomically
> > > removes and releases the devres entry successfully. The second call finds
> > > no matching entry, causing devres_release() to return -ENOENT, which trips
> > > the WARN_ON.
> > >
> > > Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
> > > followed by a manual call to unregister_region(). devm_remove_action_nowarn()
> > > removes the devres tracking entry and returns an error code.
> >
> > Naive question (or just me being lazy). Why can't we take the
> > write lock on cxl_rwsem.region?
>
> Thanks for your review. I've just tested your suggestion, but it
> caused an ABBA deadlock:
>
> task 1:
> create_pmem_region_store
> __device_attach() ...dev_lock()
> cxl_region_can_probe() ...lock(cxl_rwsem.region)
>
> task 2:
> delete_region_store() ...lock(cxl_rwsem.region)
> unregister_region()
> device_del() ...dev_lock()
>

Thanks for chasing that down. (I was indeed just being lazy!)

Let's wait a few days to get inputs from others on this.

One horrible option would just be to have a single purpose lock to
serialize handling writes to the sysfs file. I don't much like
that solution however!

Thanks,

Jonathan

> One way to avoid a deadlock might be to not add an additional lock.