Re: [PATCH] cxl/ras: Fix match_memdev_by_parent() pointer type mismatch
From: Alison Schofield
Date: Tue Jun 09 2026 - 13:15:48 EST
On Mon, Jun 08, 2026 at 05:43:19PM -0500, Terry Bowman wrote:
> bus_find_device() passes its data argument directly to the match
> function as a const void *. match_memdev_by_parent() compares
> dev->parent against this pointer:
>
> dev->parent == uport
>
> cxlmd->dev.parent is set in cxl_memdev_alloc() as:
>
> dev->parent = cxlds->dev; /* cxlds->dev == &pdev->dev */
>
> So cxlmd->dev.parent holds a struct device * pointing to &pdev->dev.
> However, bus_find_device() is called with pdev (struct pci_dev *)
> rather than &pdev->dev (struct device *). Since struct pci_dev does
> not begin with struct device, the two pointer values differ, causing
> the comparison to always evaluate false.
>
> As a result, cxl_cper_handle_prot_err() silently drops every CPER
> error report for CXL endpoint devices -- bus_find_device() always
> returns NULL and the function returns early without emitting any
> kernel trace event.
>
> Fix by passing &pdev->dev instead of pdev.
>
> Fixes: 3c70ec71abda ("cxl/ras: Fix CPER handler device confusion")
> Reported-by: Sashiko <sashiko@xxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
Hi Terry,
The commit log is burying the lead- no endpoint errors reported.
There is no need for the full struct layout analysis in the
changelog. The important part in the functional regression
and the pointer mismatch as root cause.
Please reframe the commit message along the lines of background,
problem, cause, fix, and validation. Something like-
CXL endpoint CPER protocol errors are processed by ...
Following commit 3c70ec71abda, endpoint CPER protocol errors are
silently dropped and no trace events are emitted. This happens
because bus_find_device() is called with the wrong pointer type,
so the memdev parent match never succeeds.
Fix it by ...
How do we know it works now?
-- Alison
> ---
> drivers/cxl/core/ras.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 006c6ffc2f56..7ec2dab152a7 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -94,8 +94,7 @@ void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
> if (!pdev->dev.driver)
> return;
>
> - struct device *mem_dev __free(put_device) = bus_find_device(
> - &cxl_bus_type, NULL, pdev, match_memdev_by_parent);
> + struct device *mem_dev __free(put_device) = bus_find_device(&cxl_bus_type, NULL, &pdev->dev, match_memdev_by_parent);
> if (!mem_dev)
> return;
>
> --
> 2.34.1
>