Re: [PATCH v2 1/1] cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()
From: Alison Schofield
Date: Wed May 27 2026 - 22:10:19 EST
On Wed, May 27, 2026 at 09:32:54PM +0800, Li Ming wrote:
>
> 在 2026/5/27 07:05, Alison Schofield 写道:
> > On Sun, May 24, 2026 at 05:56:20PM +0800, Li Ming wrote:
> > > In cxl_cancel_auto_attach(), it assumes cxled->pos is a valid index for
> > > accessing p->targets[]. However, cxled->pos can be set to negative errno
> > > in cxl_region_sort_targets() if cxl_calc_interleave_pos() fails. This
> > > causes the driver to use a negative index to access p->targets[],
> > > resulting in out-of-bounds access.
> > >
> > > Fix it by walking p->targets[] instead of using cxled->pos directly.
> > >
> > > Fixes: 87805c32e6ad ("cxl/region: Fix use-after-free from auto assembly failure")
> > > Signed-off-by: Li Ming <ming.li@xxxxxxxxxxxx>
> > > ---
> > > Changes from v1:
> > > - Update commit log. (Dave)
> > > - Update comment in cxl_region_sort_targets(). (Dave)
> > > - Fix potential 'hole' issue. (Alison & Sashiko review)
> > The switch in v2 to walk using p->interleave_ways instead of
> > p->nr_targets fixes the 'hole' issue inside the cancel walker
> > but there are more places where a corrupt target array will
> > cause problems.
> >
> > cxl_region_attach_auto() and cxl_region_sort_targets() both assume
> > that p->targets[] has no NULL gap. We need to compact that array on
> > cancel so that nr_targets remains the index of the first free slot,
> > with no gap below it.
>
> I think I got your point, if cxl_region_attach_auto() is called after
> cxl_cancel_auto_attach(), It will rewrite p->targets[p->nr_targets] with the
> new cxled, the previous cxled in the slot will be dropped.
>
Yes, here's a sample 'arrival plan' that goes bad:
4-way auto region with decoder A B C D arriving one at a time
arrive A, B, C targets=[A, B, C,] nr==3
B memdev unbinds and cxl_cancel_auto_attach() removes B
targets=[A, NULL, C] nr==2 hole, and C at index 2
B memdev rebinds and cxl_region_attach_auto()
targets[2] = B nr is 2 and that overwrites C
targets=[A, NULL, B] nr==3, C orphaned
arrive D targets=[A, NULL, B, D] nr==4 == interleave_ways
Let's go sort because nr now equal interleave_ways:
i=0 that's A and OK
i=1 that's NULL, dereferencing cxled leads to panic.
>
> > If you do want to fix both things in one patch, then uplevel the subject
> > line to something like:
> > cxl/region: Fix corruption of p->targets[] on auto-attach cancellation
> > and have the body describe and address both the OOB access and the compaction.
> >
> > BTW I don't feel too strongly about combined vs separate patches, but I
> > don't think this OOB patch should add in a partial fixup for the hole issue.
>
> If my understanding is correct, I think they are two different issues, split
> into two separate patches would be better.
I would like to see this patch go to v3 without this change:
- Fix potential 'hole' issue. (Alison & Sashiko review) That
means don't change your walker to use interleave_ways, go back
to how it was in v1. That was a partial fix for hole that had
nothing to do w the OOB on -ENXIO issue.
Let's get that merged. Then do the hole fixup separate.
The hole fixup is small, just the compaction in the cancel walk
but it's a separate corruption of the target list, so separating
the fixup is good.
-- Alison
>
> Ming
>
> >
> > > ---
> > > drivers/cxl/core/region.c | 40 +++++++++++++++++++--------------------
> > > 1 file changed, 19 insertions(+), 21 deletions(-)
> > >
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index e50dc716d4e8..8956b049c462 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -2011,8 +2011,9 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
> > > cxled->pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range);
> > > /*
> > > * Record that sorting failed, but still continue to calc
> > > - * cxled->pos so that follow-on code paths can reliably
> > > - * do p->targets[cxled->pos] to self-reference their entry.
> > > + * cxled->pos so that cxl_calc_interleave_pos() emits its
> > > + * dev_dbg() for every member. which is useful for auto
> > > + * discovery debug.
> > > */
> > > if (cxled->pos < 0)
> > > rc = -ENXIO;
> > > @@ -2202,18 +2203,30 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> > > return 0;
> > > }
> > > -static int cxl_region_by_target(struct device *dev, const void *data)
> > > +static int cxl_region_remove_target(struct device *dev, void *data)
> > > {
> > > - const struct cxl_endpoint_decoder *cxled = data;
> > > + struct cxl_endpoint_decoder *cxled = data;
> > > struct cxl_region_params *p;
> > > struct cxl_region *cxlr;
> > > + int i;
> > > if (!is_cxl_region(dev))
> > > return 0;
> > > cxlr = to_cxl_region(dev);
> > > p = &cxlr->params;
> > > - return p->targets[cxled->pos] == cxled;
> > > + for (i = 0; i < p->interleave_ways; i++) {
> > > + if (p->targets[i] == cxled) {
> > > + p->nr_targets--;
> > > + cxled->state = CXL_DECODER_STATE_AUTO;
> > > + cxled->pos = -1;
> > > + p->targets[i] = NULL;
> > > +
> > > + return 1;
> > > + }
> > > + }
> > > +
> > > + return 0;
> > > }
> > > /*
> > > @@ -2222,25 +2235,10 @@ static int cxl_region_by_target(struct device *dev, const void *data)
> > > */
> > > static void cxl_cancel_auto_attach(struct cxl_endpoint_decoder *cxled)
> > > {
> > > - struct cxl_region_params *p;
> > > - struct cxl_region *cxlr;
> > > - int pos = cxled->pos;
> > > -
> > > if (cxled->state != CXL_DECODER_STATE_AUTO_STAGED)
> > > return;
> > > - struct device *dev __free(put_device) =
> > > - bus_find_device(&cxl_bus_type, NULL, cxled, cxl_region_by_target);
> > > - if (!dev)
> > > - return;
> > > -
> > > - cxlr = to_cxl_region(dev);
> > > - p = &cxlr->params;
> > > -
> > > - p->nr_targets--;
> > > - cxled->state = CXL_DECODER_STATE_AUTO;
> > > - cxled->pos = -1;
> > > - p->targets[pos] = NULL;
> > > + bus_for_each_dev(&cxl_bus_type, NULL, cxled, cxl_region_remove_target);
> > > }
> > > static struct cxl_region *
> > > --
> > > 2.43.0
> > >