Re: [PATCH] cxl/ras: Fix match_memdev_by_parent() pointer type mismatch

From: Bowman, Terry

Date: Tue Jun 09 2026 - 14:18:51 EST


On 6/9/2026 12:10 PM, Alison Schofield wrote:
> On Mon, Jun 08, 2026 at 05:43:19PM -0500, Terry Bowman wrote:
>> bus_find_device() passes its data argument directly to the match
>> function as a const void *. match_memdev_by_parent() compares
>> dev->parent against this pointer:
>>
>> dev->parent == uport
>>
>> cxlmd->dev.parent is set in cxl_memdev_alloc() as:
>>
>> dev->parent = cxlds->dev; /* cxlds->dev == &pdev->dev */
>>
>> So cxlmd->dev.parent holds a struct device * pointing to &pdev->dev.
>> However, bus_find_device() is called with pdev (struct pci_dev *)
>> rather than &pdev->dev (struct device *). Since struct pci_dev does
>> not begin with struct device, the two pointer values differ, causing
>> the comparison to always evaluate false.
>>
>> As a result, cxl_cper_handle_prot_err() silently drops every CPER
>> error report for CXL endpoint devices -- bus_find_device() always
>> returns NULL and the function returns early without emitting any
>> kernel trace event.
>>
>> Fix by passing &pdev->dev instead of pdev.
>>
>> Fixes: 3c70ec71abda ("cxl/ras: Fix CPER handler device confusion")
>> Reported-by: Sashiko <sashiko@xxxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
>
> Hi Terry,
>
> The commit log is burying the lead- no endpoint errors reported.
>
> There is no need for the full struct layout analysis in the
> changelog. The important part in the functional regression
> and the pointer mismatch as root cause.
>
> Please reframe the commit message along the lines of background,
> problem, cause, fix, and validation. Something like-
>
> CXL endpoint CPER protocol errors are processed by ...
>
> Following commit 3c70ec71abda, endpoint CPER protocol errors are
> silently dropped and no trace events are emitted. This happens
> because bus_find_device() is called with the wrong pointer type,
> so the memdev parent match never succeeds.
>
> Fix it by ...
>

Ok.

>
> How do we know it works now?
>
> -- Alison
>
>

I have not tested this patch yet.

- Terry

>
>
>> ---
>> drivers/cxl/core/ras.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index 006c6ffc2f56..7ec2dab152a7 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -94,8 +94,7 @@ void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
>> if (!pdev->dev.driver)
>> return;
>>
>> - struct device *mem_dev __free(put_device) = bus_find_device(
>> - &cxl_bus_type, NULL, pdev, match_memdev_by_parent);
>> + struct device *mem_dev __free(put_device) = bus_find_device(&cxl_bus_type, NULL, &pdev->dev, match_memdev_by_parent);
>> if (!mem_dev)
>> return;
>>
>> --
>> 2.34.1
>>