Re: [PATCH] cxl/region: Fix NULL pointer within p->targets[]

From: Dave Jiang

Date: Thu Jun 04 2026 - 11:56:13 EST




On 6/4/26 6:28 AM, Li Ming wrote:
>
> 在 2026/6/4 06:40, Alison Schofield 写道:
>> On Sat, May 30, 2026 at 12:24:40PM +0800, Li Ming wrote:
>>> cxl_region_remove_target() leaves a NULL pointer in the slot of the
>>> removable endpoint decoder in p->targets array. However, p->targets
>>> array replies on p->nr_targets to determine validity, which means when
>>> p->nr_targets == p->interleave_ways, driver assumes all elements from
>>> index 0 to (p->nr_targets - 1) are valid. The stale NULL pointer
>>> violates this assumption and causes the driver to treat a NULL pointer
>>> as a valid endpoint decoder.
>>>
>>> To fix this issue, when a endpoint decoder is removed by
>>> cxl_region_remove_target(), always swap the last valid endpoint decoder
>>> pointer into the slot of removal endpoint decoder to ensure all pointers
>>> before p->targets[p->nr_targets] are valid.
>>>
>>> Fixes: 809ccef5385f ("cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()")
>>> Suggested-by: Alison Schofield <alison.schofield@xxxxxxxxx>
>>> Signed-off-by: Li Ming <ming.li@xxxxxxxxxxxx>
>>> ---
>>>   drivers/cxl/core/region.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index e90c024c8036..54018db87a4c 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
>>>               p->nr_targets--;
>>>               cxled->state = CXL_DECODER_STATE_AUTO;
>>>               cxled->pos = -1;
>>> -            p->targets[i] = NULL;
>>> +
>>> +            /*
>>> +             * Swap the last valid target into the slot to
>>> +             * ensure no invalid target in p->nr_targets range.
>>> +             * The targets array will be re-sorted during the
>>> +             * last endpoint decoder attaching again.
>>> +             */
>>> +            p->targets[i] = p->targets[p->nr_targets];
>>> +            p->targets[p->nr_targets] = NULL;
>>>                 return 1;
>>>           }
>> Hi Ming,
>>
>> I'm replying to top post here, but I have read the Sashiko response
>> and your response to that.
>>
>> I'm offering review on the target list holes, but deferring on the
>> issue with cxl_rr_free_decoder because I think it's a narrow window
>> and it would not belong in *this* patch. (and maybe I'm running
>> out of steam too ;))
>>
>> For the target list holes. I think there may be a single change
>> that can fix both the site you've fixed in this patch, and the
>> decoder detach site that Sashiko calls out.
>>
>> Rather than add compaction at each removal site, make the AUTO
>> 'appender' insert the decoder in the first free slot instead of
>> blindly at p->targets[p->nr_targets].
>>
>>      /* Use first free slot. Do not assume nr_targets is dense */
>>      for (pos = 0; pos < p->interleave_ways; pos++)
>>              if (!p->targets[pos])
>>                      break;
>>      ...
>>      p->targets[pos] = cxled;
>>      cxled->pos = pos;
> Yes, I think it can solve the problem, I will take a try.
>>
>>
>> My reasoning, that you'll need to prove -
>>
>> - Removal only leaves a hole. The damage happens later in the appender
>> Fix the appender and the hole becomes harmless no matter who created it.
>>
>> - It covers the cancel-auto site because a hole left by the staging
>> cancel is skipped by the next append, so the swap-compaction in this
>> patch is no longer needed.
>
> Yes, if we use finding the first free slot for a decoder attachment, we will not need swap-compaction. But we should reuse p->interleave_ways instead of p->nr_targets for walking p->targets array in cxl_region_remove_target(), consecutive detach will have problem without that. Like this:
>
> p->nr_targets = 4
>
> [-1, 1, 2, 3]    # __cxl_decoder_detach() called for pos 1, remove it.
>
>
> p->nr_targets = 3
>
> [-1, NULL, 2, 3] # __cxl_decoder_detach() called for pos 3, but it have no chance to be accessed by cxl_region_remove_target().
>
>
> Maybe Dave could drop the OOB fix, I will send out a patchset that includes both the "insert decoder in the first free slot" and the OOB fix.
>

OOB fix dropped from cxl/next

DJ

>
>>
>> - It covers the decoder detach site similarly (per Sashiko). The NULL
>> left by __cxl_decoder_detach() is filled on re-attach instead of being
>> appended past. Your reproduce seems like it would verify that.
>>
>> - It needs no manual-vs-auto special case. An AUTO region has exactly
>> interleave_ways slots and interleave_ways members, so first-free-slot
>> keeps the array dense whenever it is full. The manual path is
>> untouched.
>>
>> I'd probably rename to something like:
>> cxl/region: Fill first free targets[] slot during auto-discovery
> Will do
>>
>> BTW the fixes tag is not the OOB fix but is the original commit,
>> the same one you used in the OOB fix. 87805c32e6ad
>
> Sure, Will fix it, thanks.
>
>
> Ming
>
>>
>> -- Alison
>>
>>
>>> ---
>>> base-commit: 809ccef5385fa1779c7db3de43272f3fc6a87a45
>>> change-id: 20260530-fix_null_in_targets_array-124303a8ba0f
>>>
>>> Best regards,
>>> -- 
>>> Li Ming <ming.li@xxxxxxxxxxxx>
>>>
>>>