Re: [PATCH] cxl/core/port: defer probe when memdev fails to find correct port

From: Gregory Price
Date: Tue Oct 01 2024 - 11:21:18 EST


On Wed, Sep 18, 2024 at 01:59:07PM +0200, Gregory Price wrote:
> On Fri, Sep 13, 2024 at 09:32:48PM -0700, Dan Williams wrote:
> > Gregory Price wrote:
> > > Depending on device/hierarchy readiness, it can be possible for the
> > > async probe process to attempt to register an endpoint before the
> > > entire port hierarchy is ready. This currently fails with -ENXIO.
> > >
> > > Return -EPROBE_DEFER to try again later automatically (which is
> > > what the local comments already say we should do anyway).
> >
> > I want to make sure this is not papering over some other issue. Can you
> > post the final topology when this works (cxl list -BPET)? My working
> > theory is that you have 2 devices that share an intermediate port.
> > Otherwise, I am having a hard time understanding why the
> > cxl_bus_rescan() in cxl_acpi_probe() does not obviate the explicit
> > EPROBE_DEFER.
> >

just reporting back with a fully functional layout - they do not appear to
share an intermediate port, unless you consider the root a shared port.

I see your concern about this papering over another issue, but it's not
clear what to look for or how to look for it at this point.

For what it's worth - another group observed the same issue with different
hardware and produced the same patch.

"ports:root0":[
{
"port":"port1",
"host":"pci0000:e0",
"depth":1,
"decoders_committed":1,
"nr_dports":4,
"dports":[
{
"dport":"0000:e0:07.2",
"alias":"device:16",
"id":114
},
{
"dport":"0000:e0:01.1",
"alias":"device:02",
"id":0
},
{
"dport":"0000:e0:01.3",
"alias":"device:05",
"id":2
},
{
"dport":"0000:e0:07.1",
"alias":"device:0d",
"id":113
}
],
"endpoints:port1":[
{
"endpoint":"endpoint5",
"host":"mem0",
"parent_dport":"0000:e0:01.1",
"depth":2,
"decoders_committed":1
}
]
},
{
"port":"port3",
"host":"pci0000:c0",
"depth":1,
"decoders_committed":2,
"nr_dports":1,
"dports":[
{
"dport":"0000:c0:01.1",
"alias":"device:c3",
"id":0
}
],
"endpoints:port3":[
{
"endpoint":"endpoint6",
"host":"mem1",
"parent_dport":"0000:c0:01.1",
"depth":2,
"decoders_committed":2
}
]
},
]

> > So, devA is dependendent on devB to create a common port, but devA loses
> > that race after cxl_bus_rescan() has already run. Then EBPROBE_DEFER is
> > the right answer to trigger devA to try again.