Re: [PATCH] cxl/region: Fix a race bug in delete_region_store

From: Sungwoo Kim

Date: Thu Apr 02 2026 - 15:11:46 EST

On Wed, Apr 1, 2026 at 12:51 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> Sungwoo Kim wrote:
> > On Tue, Mar 10, 2026 at 6:54 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> > >
> > > Sungwoo Kim wrote:
> > > > Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
> > > > followed by a manual call to unregister_region(). devm_remove_action_nowarn()
> > > > removes the devres tracking entry and returns an error code.
> > >
> > > No, that is not an acceptable or comprehensive fix. A subsystem should
> > > never try to double unregister a device. Keep in mind the decoder sysfs
> > > stays alive while the devres_release_all() actions are running for
> > > port->uport_dev. As a result, yes, Davidlohr is right. You can
> > > theoretically observe the unregister_region() release action firing
> > > while delete_region_store() is running.
> >
> > I appreciate your review and suggestion of a new patch. It seems more
> > complicated than I expected!
>
> Yes, the CXL subsystem is unfortunately... complicated.
>
> > I'm new to CXL and device drivers, so I would like to confirm that my
> > understanding is correct.
> > - CXL tolpology could be: [upstream] host -> switch(s) -> endpoints(s)
> > [downstream].
> > - The host can configure a unique address range (= region) to access
> > each device.
> > - The bug happened because releasing the same regions can race.
> > - Although the prior patch handled this, it's insufficient because it
> > can also race with releasing the device (what Davidlohr had reported).
>
> The patch was technically correct but it relies on a design that
> requires depending on a double free semantic. Apparently Rust needs it
> to enforce a higher level semantic, but in C we need to handle that
> semantic directly.
>
> > > The only comprehensive fix I currently see to that problem is to indeed
> > > get it back synchronized under the device lock. This prevents multiple
> > > requesters from colliding, and from devres_release_all() deleting an
> > > action that delete_region_store() already committed to handling.
> > >
> > > This looks something like schedule the devres action to be released in
> > > workqueue under guard(device)(&port->udev).
> >
> > I assume you meant guard(device)(port->uport_dev).
>
> Correct. When I say "something like" it almost always means "be careful
> what I am about to say is full of bugs" so, good catch.
>
> > > It is ugly, but it may be the case that this wants to synchronously
> > > delete the region, and asynchronously cleanup the release action.
> > >
> > > I.e. unregister_region() grows a new:
> > >
> > > if (!test_and_set_bit(CXL_REGION_F_DELETE, ...)
> > >
> > > ...flag to enforce only one path successfully deletes.
> > > delete_region_store() calls unregister_region() directly. Then the
> >
> > To fix this, you suggested:
> > - Add a flag to make unregister_region() idempotent.
> > - Make delete_region_store() call unregister_region() directly.
> >
> > Now unregister_region() is race-free. But we need to release an action.
> >
> > > workqueue's job is to trigger the release action under the lock only if
> > > port->uport_dev is still driver-attached by the time the workqueue runs.
> > >
> >
> > What I'm missing is this part.
> > How could asynchronous cleanup solve the race? In the worst case, the
> > workqueue could start immediately after we schedule it. In this case,
> > it's the same as the synchronous execution.
>
> Asynchronous cleanup gets us into a context where we are safe to acquire
> locks. delete_region_store can not hold guard(device)(port->uport_dev).
> You can try it to see the full lockdep splat. It is setup by fact that
> unregistering the root decoders is performed under that lock.
> Unregistering root decoder flushes userspace out of sysfs attribute
> handlers. If those are holding the lock themselves it is an ABBA
> deadlock.

I really appreciate your detailed comments. I understood that the
async work can safely wait for and attain the lock.
The lock is required to ensure that port->uport_dev is still driver-attached.
The previous patch was working but it was over-tailored for preventing
double-free.
So, using the workqueue, we can perform devm_release_action() without
losing a certain abstraction level (no over-tailoring).

>
> I circled back to this because the CXL accelerator patches also have a
> reason to autocleanup regions so this would be another source of
> cxl_region deletion to rendezvous.

Okay. other CXL types could be another reason for not depending on the
double-free semantics.
I'll add a workqueue for devm_release_action() in V3.

Thank you again,
Sungwoo.