Re: [PATCH] cxl/ras: Fix match_memdev_by_parent() pointer type mismatch
From: Alison Schofield
Date: Tue Jun 09 2026 - 16:34:53 EST
On Tue, Jun 09, 2026 at 01:17:32PM -0500, Bowman, Terry wrote:
> On 6/9/2026 12:10 PM, Alison Schofield wrote:
> > On Mon, Jun 08, 2026 at 05:43:19PM -0500, Terry Bowman wrote:
> >> bus_find_device() passes its data argument directly to the match
> >> function as a const void *. match_memdev_by_parent() compares
> >> dev->parent against this pointer:
> >>
> >> dev->parent == uport
> >>
> >> cxlmd->dev.parent is set in cxl_memdev_alloc() as:
> >>
> >> dev->parent = cxlds->dev; /* cxlds->dev == &pdev->dev */
> >>
> >> So cxlmd->dev.parent holds a struct device * pointing to &pdev->dev.
> >> However, bus_find_device() is called with pdev (struct pci_dev *)
> >> rather than &pdev->dev (struct device *). Since struct pci_dev does
> >> not begin with struct device, the two pointer values differ, causing
> >> the comparison to always evaluate false.
> >>
> >> As a result, cxl_cper_handle_prot_err() silently drops every CPER
> >> error report for CXL endpoint devices -- bus_find_device() always
> >> returns NULL and the function returns early without emitting any
> >> kernel trace event.
> >>
> >> Fix by passing &pdev->dev instead of pdev.
> >>
> >> Fixes: 3c70ec71abda ("cxl/ras: Fix CPER handler device confusion")
> >> Reported-by: Sashiko <sashiko@xxxxxxxxxxxxxxxxxxx>
> >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> >
> > Hi Terry,
> >
> > The commit log is burying the lead- no endpoint errors reported.
> >
> > There is no need for the full struct layout analysis in the
> > changelog. The important part in the functional regression
> > and the pointer mismatch as root cause.
> >
> > Please reframe the commit message along the lines of background,
> > problem, cause, fix, and validation. Something like-
> >
> > CXL endpoint CPER protocol errors are processed by ...
> >
> > Following commit 3c70ec71abda, endpoint CPER protocol errors are
> > silently dropped and no trace events are emitted. This happens
> > because bus_find_device() is called with the wrong pointer type,
> > so the memdev parent match never succeeds.
> >
> > Fix it by ...
> >
>
> Ok.
>
> >
> > How do we know it works now?
> >
> > -- Alison
> >
> >
>
> I have not tested this patch yet.
I am intentionally being a pest on the commit message, however I am
not intentionally being a pest on the testing of this patch, because
it is obviously wrong code and obvious that the errors cannot be
reported unless this is fixed.
I was just after confirmation that we now see the errors once again,
and it's not something else that is broken.
--Alison
>
> - Terry
>
> >
> >
> >> ---
> >> drivers/cxl/core/ras.c | 3 +--
> >> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> >> index 006c6ffc2f56..7ec2dab152a7 100644
> >> --- a/drivers/cxl/core/ras.c
> >> +++ b/drivers/cxl/core/ras.c
> >> @@ -94,8 +94,7 @@ void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
> >> if (!pdev->dev.driver)
> >> return;
> >>
> >> - struct device *mem_dev __free(put_device) = bus_find_device(
> >> - &cxl_bus_type, NULL, pdev, match_memdev_by_parent);
> >> + struct device *mem_dev __free(put_device) = bus_find_device(&cxl_bus_type, NULL, &pdev->dev, match_memdev_by_parent);
> >> if (!mem_dev)
> >> return;
> >>
> >> --
> >> 2.34.1
> >>
>