Re: [PATCH 0/2] cxl/pci: Inactive downstream port handling

From: Dan Williams
Date: Fri Mar 07 2025 - 15:58:51 EST


Robert Richter wrote:
> Hi Dan,
>
> thanks for review.
>
> On 05.03.25 11:06:52, Dan Williams wrote:
> > Robert Richter wrote:
> > > A small series with individual patches to handle inactive downstream
> > > ports during enumeration. First patch changes downstream port
> > > enumeration to ignore those with duplicate port IDs. Second patch only
> > > enables active downstream ports with the link status up.
> > >
> > > Patches are independent each and can be applied individually.
> > >
> > > Robert Richter (2):
> > > cxl/pci: Ignore downstream ports with duplicate port IDs
> > > cxl/pci: Check link status and only enable active dports
> >
> > Both of these problems are to addressed by work in progress patches to
> > delay dport enumeration until a cxl_memdev is registered beneath that
> > leg of CXL topology.
> >
> > I would prefer to focus on that solution and skip these band-aids in
> > the near term unless there is an urgent need that makes it clear that
> > waiting for v6.16 is not tenable.
>
> Port ids could be set only by hardware and there will be no other way
> then, than the driver to handle the duplicates. Relaxing the check
> looks reasonable to prevent the whole port being shut down. There are
> other cases where dport enumeration also just continues, e.g. if the
> link capablity cannot be read or the component registers do not exist.
>
> The delayed port enumeration series will hide duplicates too (not yet
> tested), but since this is marked RFC and 'long term fix', how about
> having those patches first and then update them with the delayed port
> enumeration patches? The duplicate port handling could be changed
> again or even made obsolete. Otherwise, until then, the kernel fails
> to enumerate CXL devices.

Right, but this is something that has apparently be tolerable for
several kernel cycles. So the question is one more kernel cycle to
develop the comprehensive solution, or put in a band-aid that still
leaves dports in a broken state relative to hotplug.

I would prefer to not need to onboard short-term debt, unless this there
is another mitigating factor that increases the priority. Something
like, "on platform X the kernel hits this much more frequently than
platform Y". Essentially, clarify the end user impact of not addressing
this immediately with the half-step solution.