Re: [PATCH V13 8/9] cxl/port: Retry reading CDAT on failure

From: Dan Williams
Date: Thu Jul 14 2022 - 16:44:37 EST


Ira Weiny wrote:
> On Thu, Jul 14, 2022 at 01:05:47PM -0700, Ira wrote:
> > On Thu, Jul 14, 2022 at 09:27:04AM -0700, Dan Williams wrote:
> > > ira.weiny@ wrote:
> > > > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> > > >
> > > > The CDAT read may fail for a number of reasons but mainly it is possible
> > > > to get different parts of a valid state. The checksum in the CDAT table
> > > > protects against this.
> > >
> > > I don't know what "different parts of a valid state" means.
> >
> > This text is stale but given what I know about how other entities may be
> > issuing queries without the kernel knowledge I'm not 100% sure that the data
> > read back will always be valid.
> >
> > Regardless, this has already caught a bug in QEMU.
> >
> > So I'm inclined to leave this check in because the checksum is there and should
> > can be validated if only to detect broken hardware.
> >
> > I can update the commit message to clarify this.
>
> Oh wait I thought this was the 'is valid' patch.
>
> I can remove the retries if that was all you were concerned about.
>

I was concerned that this patch was trying to accommodate CDAT changes
while the retrieval is running which should be obviated by not allowing
set-partition while the CDAT retrieval is running. So I want to see
single-shot CDAT retrieval underneath set-partition protection.